🎁

How to Prove Your Essay Is Human-Written: A Guide for ESL Students

AI detectors flag non-native English writing at dramatically higher rates than native writing. If your university uses Turnitin or GPTZero, you need an authorship defense strategy before you submit — not after you get accused. Here is the step-by-step approach, grounded in peer-reviewed research and recent court rulings.

Alex Zhovnir

12 min read

May 2026

How to Prove Your Essay Is Human-Written: A Guide for ESL Students

🎁

Need the full Diglot workflow?

Keep drafting, translation, grammar review, and rewriting in one place.

Start for free

You wrote your essay. You spent days on it — reading sources, outlining, drafting, revising. You submit it. Then your professor tells you Turnitin flagged it as AI-generated.

If English is not your first language, this is not a rare edge case. It is a structural problem with how AI detectors work, and it is now being challenged in federal courts.

This guide explains why ESL writing gets flagged, what the research shows, what courts have ruled so far, and — most importantly — exactly how to build your authorship defense before you ever hit "submit."

Why AI detectors flag ESL writing

AI detectors do not actually detect AI. They measure two statistical properties of text: perplexity, how predictable each word choice is, and burstiness, how much sentence length and structure vary. Non-native English writers naturally produce low perplexity and low burstiness because they use safer vocabulary and standard grammar patterns, which is exactly the signature detectors flag as machine-generated.

AI detectors do not actually detect AI. They measure two statistical properties of text:

Perplexity — how predictable each word choice is. Low perplexity means the text follows common patterns.
Burstiness — how much sentence length and structure vary. Low burstiness means the sentences are uniform.

Large language models produce low-perplexity, low-burstiness text because they are designed to pick the most probable next word and maintain consistent output. When a detector sees text with those properties, it flags it.

Here is the problem: non-native English speakers also produce low-perplexity text. If you are still building vocabulary, you use safer word choices. If you studied grammar from textbooks, you write in standard patterns. If you translate from your native language, the result tends to be structurally uniform. To a detector, your careful, correct English looks the same as machine output.

This is not speculation. A peer-reviewed study at Stanford tested it directly.

The numbers: 61% false positive rate

In 2023, researchers at Stanford (Liang, Zou, et al.) published a study in Patterns (Cell Press) titled "GPT detectors are biased against non-native English writers". They ran 91 human-written ESL essays (from TOEFL preparation materials) and 88 essays written by US 8th graders through seven commercial AI detectors.

The results:

61.22% of the human-written ESL essays were incorrectly flagged as AI-generated
97.8% of those essays were flagged by at least one of the seven detectors
19.8% were flagged unanimously by all seven detectors
The same detectors correctly identified the native English essays with near-perfect accuracy

To prove causation, the researchers ran a follow-up test. They took verified-human native English essays and asked ChatGPT to "simplify" the language to sound non-native. The false-positive rate spiked. They took the ESL essays and asked ChatGPT to "enhance" them to sound native. The false-positive rate dropped. The detectors were not finding AI. They were punishing linguistic simplicity.

A 2026 study by Hadra, Cambridge, and Mesbah in the International Journal for Educational Integrity (Springer) found that Turnitin achieved only 61% overall accuracy, calling the trade-off between detection power and false accusations a "structural mathematical limit, not an engineering flaw that can be patched."

What courts are saying

The legal landscape is shifting. Multiple lawsuits in 2025 and 2026 are testing whether schools can discipline students based on AI detector scores alone. The pattern across rulings is clear: where the school's case rests on a detector score by itself, judges are increasingly sceptical. Where independent evidence of cheating also exists, courts defer to educators.

The Palo Alto case — pending federal litigation

A family in California filed a federal lawsuit (Doe et al v. Palo Alto Unified School District et al., docket 5:25-cv-04202) after their child's essay was flagged at 76% by Turnitin. The student's grade dropped from a high B/low A to a C. The family submitted a 1,162-page evidentiary packet — revision history, drafts, Google Docs timestamps — showing the work was written by hand over days. The district declined to restore the grade. The case is now pending in the U.S. District Court for the Northern District of California, with claims filed under Title VI (national origin discrimination), Title IX, and Due Process.

Newby v. Adelphi — the student won

In January 2026, the New York Supreme Court ruled in Matter of Newby v. Adelphi University. Orion Newby, a freshman with Level 2 Autism Spectrum Disorder enrolled in Adelphi's "Bridges Program," was flagged at 100% by Turnitin on a history paper. The university upheld a failing grade. His family spent over $100,000 in legal fees.

The judge ruled the university's decision was "without valid basis and devoid of reason" and ordered the penalty rescinded and the record fully expunged. This ruling establishes — at least at the state-court level in New York — that an AI detector score alone is not a valid basis for academic punishment.

Doe v. Yale — ESL and Title VI, pending

An Executive MBA student at Yale was suspended for a year after a final exam was flagged by GPTZero. The student, a non-native English speaker, filed suit in U.S. District Court for the District of Connecticut arguing that the detector's algorithm has implicit bias against ESL writers — the exact mechanism the Stanford study documented — and that using it to discipline a non-native speaker amounts to national-origin discrimination under Title VI. The case is pending.

The counter-example: Hingham

Not every case goes the student's way. In Massachusetts, a federal court denied a student's injunction after the school found AI-hallucinated citations in the student's work — including a non-existent book. The court reasoned that the school's case rested on independent evidence of cheating, not the detector alone.

The pattern across cases is clear: when the accusation rests solely on a detector score, courts are increasingly skeptical. When the detector is corroborated by other evidence, courts defer to educators. For ESL students doing their own work, the implication is straightforward: your strongest defense is preserved evidence of your writing process.

Universities dropping AI detection

While courts adjudicate, institutions are making their own calls. Waterloo, Vanderbilt, MIT, and Curtin have publicly disabled or restricted Turnitin's AI detector citing unreliability and bias against non-native English speakers. American University, Boston University, UC Berkeley, Georgetown, NYU, and many others have quietly turned the feature off through their learning management systems.

The University of Waterloo formally discontinued Turnitin's AI detection in September 2025, stating that the tools are "unreliable" and "biased toward students whose first language is not English." Vanderbilt disabled the detector citing false positive risks and ESL bias. MIT published internal teaching guidance titled "AI Detectors Don't Work. Here's What to Do Instead." Curtin University in Australia announced it will disable Turnitin AI detection starting in 2026.

Several other institutions have quietly disabled or restricted AI detection — among them American University, Boston University, UC Berkeley, Colorado State, DePaul, Georgetown, Michigan State, NYU, and the University of Cape Town. Most did this without press releases.

Even Turnitin itself has shifted. In February 2026, CEO Chris Caren announced a pivot "from detection to transparency." The company's own documentation acknowledges an uncertainty band of roughly plus or minus 15 percentage points and states the AI indicator "should not be the sole basis for punitive action."

The 8-step authorship defense

The legal trends are encouraging but will not help you in next Tuesday's meeting. What matters in practice is the evidence you have before anyone questions your writing. The eight steps below cover what to set up before you write, what to track while writing, and how to respond calmly with evidence if a detector flag arrives after submission.

Before you write

1. Choose a writing tool that keeps history. Google Docs preserves a full version history by default — every edit, paste, deletion, and timestamp. So does Microsoft Word (with AutoSave on OneDrive), Notion, and most modern editors. If your tool does not keep revision history, switch to one that does. This is the foundation of any authorship defense.

2. Keep your native-language drafts. If you outline or draft in your first language before translating into English, keep that original. A paper trail showing "here is my outline in Mandarin, here is the English translation, here are four rounds of revision" is evidence that no detector can contradict.

3. Save research notes separately. Keep a running document (or folder) of your sources, quotes, notes, and reasoning. If someone asks how you arrived at a particular argument, you can show the path from source to draft.

While you write

4. Write in sessions, not in one sitting. Revision history that shows writing spread over multiple days — with breaks, revisions, deleted paragraphs, restructured sections — is the strongest evidence of human authorship. A document that appears fully formed in one session (even if you genuinely wrote it that way) is harder to defend.

5. Do not paste large blocks from external sources. If you draft sections in a separate document and paste them into your final paper, the version history shows a sudden appearance of fully-formed text. Draft directly in the submission document when possible, or at minimum keep both documents with their own version histories.

After submission — if you are flagged

6. Do not apologize or admit fault. The instinct when a professor asks "did you use AI?" is to over-explain. Lead with evidence instead. Calmly present your revision history, native-language drafts, research notes, and timestamps. Reference the Stanford study if you are an ESL writer — it is peer-reviewed and directly on point.

7. Ask specific procedural questions. Request in writing: which detector was used, what score triggered the flag, what your institution's formal procedure is for handling AI accusations, and what your appeal options are. The strongest cases against schools involve procedural due process violations — institutions that did not give students fair notice, a chance to present counter-evidence, or a clear appeal path.

8. Know your institution's policy. Read your school's academic integrity policy before any dispute arises. If it does not address AI specifically — or does not specify what evidence counts as a defense — that is an argument worth raising. Many institutions adopted detector tools faster than they updated their integrity codes.

What ESL writers are doing right now

The NEA (National Education Association) — representing 3 million educators — published Five Principles for AI in Education stating that "biased AI cheating detection applications have incorrectly flagged students for misconduct" and specifically naming "emergent multilingual learners" who "have been falsely accused." The American Federation of Teachers passed a resolution urging safeguards for individual rights in AI deployment.

At the state level, Idaho enacted SB 1227 — a statewide AI-in-education framework requiring human-centered oversight and prohibiting high-stakes automated decisions. California's AB 1159, which passed its first chamber, would prohibit ed-tech vendors from training models on student submissions. Maryland and Illinois have similar bills pending.

The direction is clear: the era of treating a detector score as a verdict is ending. But the transition is slow, and individual students bear the risk in the meantime.

How Diglot helps ESL students

Diglot is a bilingual writing workspace built for the exact workflow that gets ESL students flagged: thinking in one language, writing in another, refining until the English is clean. The Authorship Certificate signs every edit and AI-assist in your document and produces a public verification URL anyone can open to see a tamper-proof record of how it was written.

Diglot is a bilingual writing workspace built for the exact workflow that gets ESL students flagged: thinking in one language, writing in another, and refining until the English is clean.

The Authorship Certificate tracks every edit, paste, and AI-assist on your document and signs the resulting event chain with a cryptographic key. The result is a public verification URL that anyone — your professor, your department, an appeals committee — can open in any browser to see a tamper-proof record of how the document was actually written. No Diglot account required to verify.

If you are an ESL student who drafts in your native language and publishes in English, the Certificate captures that entire journey: the first draft, the translation, every round of revision. When someone questions whether you wrote your own work, you do not need to argue. You send a link.

Read the companion articles for more context: AI detection lawsuits in 2026, why AI detectors misread non-native English, how to make your English writing sound natural, and what to do if a client flags your writing.

Start a free draft with Diglot

Frequently asked questions

The questions below cover what ESL students ask most often about false AI flags: whether expulsion is realistic, how accurate detectors actually are on non-native English writing, whether grammar and paraphrasing tools increase risk, what evidence holds up in an appeal, and what to do specifically if your university still relies on Turnitin's AI detector to flag essays.

Can I get expelled for a false AI detection?

Penalties vary by institution — from a warning to academic misconduct charges. However, courts are increasingly skeptical of punishments based solely on detector scores. In Newby v. Adelphi, a judge overturned the penalty entirely. Your best protection is preserved evidence of your writing process.

Are AI detectors accurate for ESL writing?

No. A peer-reviewed Stanford study found that seven commercial AI detectors incorrectly flagged 61.22% of human-written ESL essays as AI-generated. The bias is structural: detectors measure text predictability, and non-native English writing is naturally more predictable due to constrained vocabulary and standard grammar patterns.

Does using Grammarly or a paraphrasing tool make my essay look like AI?

Grammar and paraphrasing tools can reduce perplexity (word predictability) and burstiness (sentence variation) — the same signals detectors flag. Using them does not mean you cheated, but it can increase your false-positive risk. Keep records of your drafts before and after tool use.

What evidence do I need to prove I wrote my essay?

The strongest evidence includes: revision history with timestamps (Google Docs, Word, Notion), native-language drafts or outlines, research notes and source links, and intermediate draft versions showing the work evolving over multiple sessions.

My university still uses Turnitin. What should I do?

Write in a tool that preserves revision history. Keep your research notes. If you draft in your native language, save those drafts. Build the proof trail before you submit — not after a flag. Read the full legal landscape article for context on how universities are responding.

Sources

This guide draws on peer-reviewed research from Stanford and Springer, federal and state court rulings, public disabling statements from major universities, position papers from the NEA and AFT, and Turnitin's own disclosures. Each citation is linked below. Where rulings are still active or institutions have not yet finalised policy, we will revise this page as material changes land.

Liang, W., Zou, J., et al. (2023). "GPT detectors are biased against non-native English writers." Patterns, Cell Press. arxiv.org/abs/2304.02819
Hadra, Cambridge, Mesbah (2026). "AI Detectors Fail Diverse Student Populations." International Journal for Educational Integrity, Springer.
Doe et al v. Palo Alto Unified School District et al., docket 5:25-cv-04202, N.D. Cal. Hoodline coverage
Matter of Newby v. Adelphi University (2026). law.justia.com
University of Waterloo. "Discontinuing AI detection in Turnitin"
Vanderbilt University. "Guidance on AI Detection"
NEA. "Five Principles for AI in Education" (2025)
AFT. "Resolution on Artificial Intelligence"