What 'standards-aligned' actually means — and why most AI products fail the test

The phrase “standards-aligned” appears in the marketing copy of nearly every AI product sold to K–12 schools. TEKS aligned. Common Core aligned. NGSS aligned. State standards aligned. The phrase has the quiet authority of a regulatory term — it sounds like something audited, something certified, something specific.

It almost never is. In practice, “standards-aligned” most often means: at some point during product development, someone read the standards document and made sure the product’s content vaguely covered the topics. There is no audit. There is no certification. There is no agreed-upon definition of what alignment actually requires.

For a school administrator buying these products, this gap is consequential. A product that’s genuinely aligned to your state’s standards is going to fit into your curriculum. A product that’s “aligned” in the loose marketing sense will produce content that might be relevant to what you’re teaching, might be at the right grade level, and might meet the depth of knowledge requirements your standards specify. Three mights, in a context where alignment is supposed to be a yes-or-no question.

This piece is an attempt to make the question yes-or-no. Here’s what real alignment looks like, in five concrete tests, and why most AI products fail at least three of them.

Test one: standard granularity

A real state standard is not “ratios in 6th grade.” It’s something more like Texas Essential Knowledge and Skills (TEKS) §111.26(b)(4)(B): “apply qualitative and quantitative reasoning to solve prediction and comparison of real-world problems involving ratios and rates.”

The level of granularity matters because a product can claim to teach “ratios” while completely missing the specific cognitive operations the standard requires — qualitative reasoning, quantitative reasoning, prediction, comparison, real-world contextualization. A product that drills students on equivalent-ratio computation is not aligned to this standard. It’s adjacent to it.

The test: ask the vendor to show you, for any specific standard in your state’s framework, exactly which content in their product addresses it. Not the topic. The specific cognitive operations the standard names. If they can’t, the alignment claim is rhetoric.

Test two: depth of knowledge

State standards specify not just what students should learn but at what cognitive depth. The Webb’s Depth of Knowledge framework, used by most state assessments, distinguishes four levels: recall, basic application, strategic thinking, and extended thinking. A standard might specify that students should be able to apply a concept (DOK 2) or analyze multiple approaches to a problem (DOK 3).

Most AI tutoring products operate primarily at DOK 1 and 2. They can ask recall questions and walk students through procedural application. They struggle to engage students in the strategic thinking and extended reasoning that higher DOK levels require, because doing so requires the AI to evaluate the quality of student reasoning rather than just the correctness of student answers.

The test: ask the vendor to demonstrate a student interaction at DOK 3 or 4 for one of your priority standards. Watch what happens. Does the AI engage the student in genuine reasoning, or does it default to walking through a procedure?

Test three: prerequisite chains

Standards do not exist in isolation. A 6th grade ratio standard depends on prior mastery of fractions (5th grade), multiplication and division fluency (4th grade), and basic operations (earlier). Every standards document includes implicit or explicit prerequisite relationships that determine when a concept can be taught and what students need to know first.

Real alignment means the AI product respects these prerequisite chains. When a student struggles with a 6th grade ratio problem, the system should be able to identify whether the difficulty is conceptual (they don’t understand what a ratio is) or prerequisite (their multiplication fluency is shaky and the arithmetic is the obstacle). These require different interventions.

Most AI products don’t model prerequisite chains explicitly. They treat the standard as a bounded topic and address only that topic. The student who fails because of an upstream gap gets the same intervention as the student who fails because of the current concept — which is to say, they get an intervention that doesn’t actually help.

The test: present the vendor with a 6th grade student who’s struggling with a ratio problem because of a 4th grade multiplication gap. Does the product diagnose this correctly? Does it route the intervention to the actual gap, or does it stay at the surface level?

Test four: cultural and contextual alignment

The standards your state has adopted often include implicit assumptions about cultural context, language, and example domains. A standard about “real-world problems” presumes a definition of what counts as relevant to the students in question. Products designed for one context can fail in another not because the math is wrong but because the framing is.

A simple example: a ratio problem about hockey scores will land differently in Minnesota than in Texas. A word problem set in a “subway station” assumes urban context. A reading passage about farming assumes rural context. None of these are wrong, but a product that doesn’t adapt to the population it’s serving misses an alignment dimension that good teachers attend to constantly.

This test is harder to formalize, but the question to ask is: can the vendor show you content that’s been adapted to the cultural and economic context of your specific student population? If every example looks the same regardless of where the product is deployed, alignment is happening at the topic level but not at the engagement level.

Test five: assessment equivalence

The hardest alignment test, and the one most often skipped, is whether the product’s internal assessment of student understanding matches what the state’s actual assessment will measure.

If a state’s end-of-year math test measures conceptual understanding through multi-step word problems, an AI product that primarily assesses students through procedural drill will produce data that doesn’t predict test performance. Students will look proficient in the product and underperform on the test. This isn’t the AI’s fault — the AI is doing what it was designed to do. But the alignment claim is doing dishonest work, suggesting that the product prepares students for the assessment when it doesn’t.

The test: ask the vendor to compare their assessment items to released items from your state’s assessment. Are the cognitive demands the same? The format? The complexity?

What good looks like

A genuinely standards-aligned AI product, evaluated against these five tests, would be able to:

Show you, for any specific standard, the exact content that addresses it.

Demonstrate engagement at the depth of knowledge the standard requires, not just at procedural levels.

Diagnose and route around prerequisite gaps when students struggle.

Adapt to the cultural and contextual norms of the population it’s serving.

Assess students in ways that predict performance on the actual state assessment.

Not many products clear this bar. The ones that do are typically built by teams with deep curriculum expertise, often including former teachers and educational researchers. They tend to be expensive. They tend to be deployed slowly because the alignment work is genuinely hard.

The ones that don’t clear the bar are still being sold, often successfully. They claim alignment because the market has accepted weak alignment as the norm. As long as administrators don’t push back, vendors don’t have to do the harder work.

The buyer’s leverage

The most useful thing administrators can do for the field is push back. Not by becoming hostile to AI products, but by raising the standard for what alignment claims should mean. Insist on the five tests. Ask for evidence. Compare answers across vendors. Reward the ones doing real alignment work and pass on the ones doing rhetoric.

The vendors who can clear the bar will appreciate the discrimination. The ones who can’t will either improve or exit. Both outcomes are good for students.

This is one of the under-discussed parts of why education technology underperforms its promises so consistently — buyers accept loose claims because evaluating tight claims is hard work, and the loose-claim products win contracts because they’re easier to buy. The fix isn’t from the supply side; the supply side is responding to demand. The fix is from the demand side, asking better questions.

Schools that get good at this become better customers. Better customers attract better vendors. Over time, the market gets healthier. This is not happening fast enough, but it’s the only mechanism by which it can happen.

A short note: the next post will be a different format. The video that prompted this whole question came from a YouTube series I’ve been following on cognitive science applied to learning. I’m going to summarize the most useful piece in the series, in the format I’ve been using for my own notes, and publish it as the first of what’ll be an occasional series of YouTube summaries. The goal is the same as everything else here — make the field smarter, including me.