The Human Element in Chatbot Exams
From: https://ploum.net/2026-01-19-exam-with-chatbots.html
Lionel Dricot (Ploum), teaching “Open Source Strategies” at École Polytechnique de Louvain, recently experimented with a radical exam format. He allowed students full internet access and the choice to use chatbots, provided they followed strict accountability rules.
“You can use chatbots, but you will be held accountable for it. Mistakes made by chatbots will be considered more important than honest human mistakes, resulting in the loss of more points.”
The Experiment and Results
Out of 60 students, 57 chose not to use chatbots. Lionel identified four distinct clusters among those who declined:
| Cluster | Performance Range (out of 20) | Motivation |
|---|---|---|
| Personal Preference | 15 – 19 | Pride in their own work; perceived AI as a time-sink for verification. |
| Never Use | ~13 | Active dislike for the interaction style of LLMs. |
| Pragmatic | 12 – 16 | Judged the specific exam as unsuitable for AI assistance. |
| Heavy Users | 8 – 11 | Afraid of the accountability constraints and missing AI errors. |
Key Observations
- The Accountability Paradox: When students are forced to justify AI output and are penalised more heavily for AI-generated errors, most abandon the tool.
- The Stream of Consciousness: Lionel introduced a “stream of consciousness” text file where students recorded their thoughts in real-time without editing. This provided a “glimpse inside the minds” of students, helping to save those who understood the material but struggled with oral articulation due to stress.
- Generational Fear: Students expressed a significant fear of “cheating,” even regarding standard collaborative practices. Many were panicked by Google’s forced AI overviews, fearing they would be accused of academic dishonesty.
“Can chatbots help? Yes, if you know how to use them. But if you do, chances are you don’t need chatbots.”
Comparison to Automated Oral Exams
This method contrasts sharply with the Voice AI approach used by Panos Ipeirotis.[1] While the Voice AI model focuses on scalability and high-integrity automated testing, Lionel’s method focuses on human interaction and the “stream of consciousness” to mitigate stress. Lionel’s approach is deeply personal—averaging 26 minutes per student—making it difficult to scale to large cohorts, whereas the Voice AI model handles dozens of students for pennies.
See the previous discussion on Scalable Oral Exams with Voice AI. ↩︎