Watch a good middle-school math teacher work with a student stuck on a problem. The first thing they do is almost always wrong, in the sense that it isn’t useful — they ask the student a question. What have you tried? Where did you get stuck? What does the problem actually want?
This is not stalling. The teacher is doing diagnostic work. The student doesn’t yet know which kind of help they need, and neither does the teacher. The question is the only way to find out.
Then, only after the diagnosis, comes the help. And the help is rarely the answer. It’s the smallest possible nudge that lets the student make the next step on their own. If the nudge is too small, they stay stuck. If the nudge is too big, they finish the problem without learning. The art of teaching, in this moment, is finding the exact size of the next hint.
There’s a name for this in the cognitive science literature: the zone of proximal development, originally from Vygotsky. The space between what a student can do alone and what they could do with help. Good teachers spend their entire careers developing intuition for where this zone is for each student in front of them.
There’s also a name for the technique that traverses the zone: the hint ladder. Start with the smallest possible scaffold. Watch what happens. Climb one rung at a time until the student can take over. Then step back.
Most AI tutors cannot do this. Some are getting closer. None of the consumer products do it well, and the ones that don’t, by default, do something specific and wrong: they give the answer.
What ChatGPT does when a student asks for help
A 7th grader sits down with a math problem. In a class of 32 students, the ratio of boys to girls is 3 to 5. How many boys are in the class?
They open ChatGPT and paste the problem.
ChatGPT, if not specifically prompted otherwise, will respond with a complete solution. It will explain the concept of ratios, set up the proportion, walk through the algebra, and arrive at the answer: 12 boys.
This is genuinely impressive. The math is correct. The explanation is clear. A student who reads it carefully might even learn something.
But almost no student will read it carefully. They will copy the answer, write 12 on their homework, and move on. The next day, when a similar problem appears on the quiz, they will have learned nothing. The cognitive load that produces actual learning — the mental work of constructing the solution — has been outsourced.
This isn’t ChatGPT being bad at tutoring. This is ChatGPT being good at what it was designed to do: answer questions efficiently. The product was built for adults asking adult questions. The “give the user what they asked for” optimization that makes it useful in most contexts is the exact behavior that destroys learning in this one.
What a good tutor does instead
The same problem, the same student, with a real tutor — human or AI built to behave like one.
The first response is a question, not an answer. Before I help, what do you think the problem is asking? In your own words.
If the student says I don’t know what a ratio is, the tutor knows the gap. They can address that gap directly with a small example, then return to the original problem.
If the student says I think it means 3 boys for every 5 girls but I don’t know what to do with the 32, the tutor knows the conceptual understanding is solid; the gap is procedural. Different intervention.
If the student says I tried setting up 3/5 = x/32 but it’s not working, the tutor knows the student is approaching it as a proportion problem, which is reasonable but creates an arithmetic difficulty. They can validate the approach, point out the specific issue (32 is not a multiple of 5+3=8 in the way the proportion needs), and offer a different framing.
In each case, the next step is the smallest possible hint that addresses the actual gap. Can you draw a picture of one group of 3 boys and 5 girls? How many people is that? How many of those groups would fit into 32?
The student does the work. The student learns.
This is hard to do well. It’s hard to do consistently. It requires the tutor to model what the student is thinking based on limited information and adjust in real time. Most human tutors aren’t great at it either — it’s a skill that takes years to develop and a constant willingness to suppress the urge to just explain.
Why this is a pedagogy problem, not a model problem
When AI products fail at tutoring, the conversation usually turns to the model. We need a better model. The model isn’t smart enough. The next generation will fix this.
This is the wrong frame. The current generation of frontier models is more than capable of the cognitive work involved in tutoring. The problem is that they’re not asked to do it.
The default behavior of a chat-tuned language model is to be maximally helpful in the sense of giving the user what they explicitly asked for. The default behavior of a tutor is to be maximally helpful in the sense of giving the user what they actually need, which is often the opposite of what they asked for.
These two definitions of “helpful” are in direct conflict. You cannot tutor someone who is asking for the answer by giving them the answer. You can only tutor them by refusing to give the answer in a way that doesn’t feel like refusal.
This is a pedagogy encoding problem, not a capability problem. It’s about teaching the model to behave like a tutor, which is a different thing than teaching it to be smart. The smart part is already there.
The encoding can happen in several places. The system prompt can establish tutoring behavior. The training data can include tutoring dialogues. The product UI can prevent the model from giving direct answers and require it to ask diagnostic questions first. The right answer probably involves all three.
What it doesn’t involve is waiting for a smarter model. The current models are already smarter than most middle-school math teachers in the narrow sense of being able to solve the problems. The bottleneck is pedagogical design, which is a human skill applied to AI products, not an AI capability that emerges from scale.
What teachers tell us about this
Spend an hour with a middle school math teacher and the topic of AI cheating will come up. They know. They’ve seen the homework that’s too perfect. They’ve watched students hand in essays in fonts they don’t use. They’ve adjusted their grading rubrics to weight in-class work more heavily.
Ask those same teachers what would actually help, and the answer is rarely “more powerful AI.” It’s almost always something more specific: I want a tool that helps the student learn, not a tool that helps the student finish. The distinction is everything. A tool that helps a student finish accelerates their progress through assignments and decelerates their actual learning. A tool that helps a student learn does the opposite — they may finish less, but what they finish, they understand.
Teachers know which side of this line their students need. They can usually tell you which students need more encouragement to struggle and which need more support. They have favorite scaffolds and questioning techniques they’ve refined over years. What they don’t have is a way to give every student the right kind of help in real time, because there is one of them and 28 students and 45 minutes.
That gap is what AI in education should solve. Not by replacing the teacher’s work but by extending it — making available, to every student, the kind of careful pedagogical attention that the teacher can only give to a few at a time.
This is what well-designed AI tutoring looks like. It’s not a chatbot. It’s not even particularly impressive in a demo. It looks, from the outside, like a tool that talks slowly, asks more questions than it answers, and occasionally seems to refuse to help. From the inside — from the perspective of the student using it — it looks like having a patient older sibling who actually understood the math and wouldn’t just do it for you.
The takeaway
When you evaluate an AI tutoring product — for your school, for your child, for your investment thesis — try this test. Ask it a homework question. See if it gives you the answer.
If it does, it’s not a tutor. It’s a homework helper, which is a different product category and not necessarily a bad one. But don’t let it call itself a tutor.
A real tutor, the first response is a question.
Next week: a step back from the technical and into the political. We’ll look at why the AI literacy debate in schools is being framed entirely wrong, and what that costs us.