Fighting Fire with Fire: Scalable Oral Exams with Voice AI

technology education artificial-intelligence strategy

From: https://www.behind-the-enemy-lines.com/2025/12/fighting-fire-with-fire-scalable-oral.html

The traditional take-home exam is dead. When student submissions start looking like professional memos but the authors cannot explain their own logic during cold calls, the assessment model has failed. Panos Ipeirotis and Konstantinos Rizakos addressed this by deploying a Voice AI agent to conduct scalable, personalized oral exams.

The Core Experiment

Using ElevenLabs and a “Council of LLMs” for grading, the instructors transformed a 30-hour manual grading task into a $15 automated process.

“If you cannot defend your own work live, then the written artifact is not measuring what you think it is measuring.”

Key Takeaways

  • Scalability: The system cost $0.42 per student and handled 36 exams over 9 days without instructor fatigue.
  • The Council Method: Grading was handled by three models (Claude, Gemini, GPT). While they initially disagreed, a “consultation” round where they viewed each other’s evidence led to high consistency.
  • Instructional Mirror: The data revealed that students failed “Experimentation” questions consistently, highlighting a specific gap in how the professors had taught the material.
  • Human Factors: Students found the AI voice “intimidating” and “condescending,” proving that the persona of the AI is as important as its logic.

Grading Convergence

MetricRound 1 (Independent)Round 2 (Consultation)
Within 1 point0%62%
Within 2 points23%85%
Mean max difference3.93 pts1.41 pts

The experiment suggests that while AI created the cheating problem, it also provides the only scalable way to return to high-integrity oral examination.[1]


  1. Panos Ipeirotis is a Professor at NYU Stern. ↩︎


View this page on GitHub.