Artificial intelligence is embedded in modern classrooms, promising personalized tutoring. A new study reveals a flaw: AI tutors change the quality of their feedback based on a student's perceived race, gender, and academic ability.
What Happened
According to a recent preprint study from Stanford University, large language models (LLMs) acting as AI tutors do not treat all students equally. Researchers tested AI responses across simulated student profiles and found inconsistencies in instructional focus. Students identified as White or high-achieving received detailed, development-focused feedback, such as critiques on argumentation and prompts to expand their reasoning.
Conversely, the same AI models shifted their focus toward basic grammar and spelling for Hispanic students and English language learners, ignoring higher-order thinking skills. Students labeled with learning disabilities faced "feedback withholding bias," receiving shallow positive reinforcement instead of actionable, critical guidance. Researchers also noted that models generated responses linked to cultural or gender stereotypes when evaluating non-White or female students.
The Bigger Picture
This uneven feedback is part of a problem with evaluation algorithms in educational technology. An estimated 73% of educational AI systems exhibit bias, yet only 23% of school administrators evaluate these tools for bias before classroom implementation. In one assessment, AI scoring systems rated essays from African American students 0.3 points lower than identical essays attributed to White students.
Adoption moves faster than regulation. While 85% of teachers and 86% of students used AI tools in the 2024–2025 school year, nearly 80% of school districts operate in a policy vacuum without clear guidelines. Additionally, 96% of K-12 teachers report having no formal AI training.
Government bodies are trying to catch up. The UK's Department for Education recently published safety standards requiring platforms to prove educational purpose and provide teacher supervision tools. Scottish government guidelines state that AI must support rather than replace human teacher judgment. However, as we previously reported, schools often adopt third-party platforms like Discovery Education and Otus to automate assessments before establishing technical guardrails.
What This Means for Families
Parents and educators cannot assume that automated grading or AI tutoring is objective. While students often perceive AI feedback positively because of its immediate availability, research shows that high-quality AI feedback does not automatically lead to better student revisions compared to traditional teacher feedback.
When AI tools misinterpret cultural nuances as linguistic errors, minority and language-learning students are penalized. A multi-layered approach to feedback is necessary: students use AI tools for basic clarity, but rely on human teachers for relevant suggestions and essential emotional support.
What You Can Do
- Demand transparency. Ask your school district if they use model policy guardrails to evaluate AI vendors for bias and data privacy before purchasing new software.
- Monitor the feedback. Review the automated feedback your child receives on assignments. Look for patterns where the AI focuses on mechanics—like spelling and grammar—rather than offering prompts to improve core arguments.
- Advocate for teacher oversight. Ensure your school's digital technology policies require human teachers to review and adjust AI-generated grades and feedback before they become final.