AI Copyright and Fair Use: New Study Shows Readers Prefer AI-Generated Literary Works
Original Paper: Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers
Authors: Tuhin Chakrabarty1, Jane C. Ginsburg2, Paramveer Dhillon3,4, 1Department of Computer Science and AI Innovation Institute, Stony Brook University. 2Columbia Law School. 3School of Information Science, University of Michigan. 4MIT Initiative on the Digital Economy., Corresponding authors: [email protected], [email protected], [email protected]
Original Paper: Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers Authors: Tuhin Chakrabarty, Jane C. Ginsburg, Paramveer Dhillon
Executive Summary
A landmark study from researchers at Stony Brook, Columbia Law, and the University of Michigan provides powerful empirical evidence that AI models, when fine-tuned on an author’s complete body of work, can generate new text that expert readers prefer over human-written emulations. This research directly challenges a central defense argument in copyright litigation—that AI outputs are not market substitutes for original works—by demonstrating a clear and quantifiable potential for market harm.
What the Research Shows
The preregistered study conducted a series of blind, pairwise evaluations comparing the writing of three frontier AI models (ChatGPT, Claude, and Gemini) against that of expert human writers with Master of Fine Arts (MFA) degrees. The task was to emulate the distinct literary styles of 50 acclaimed authors. The evaluation pool included both expert readers (MFA candidates) and lay readers.
Initially, when using standard in-context prompting, AI-generated text was overwhelmingly disfavored by expert readers on measures of both stylistic fidelity and overall writing quality. This finding, however, was completely reversed when the researchers fine-tuned a model (ChatGPT) on the complete works of individual authors. In this scenario, expert readers showed a dramatic and statistically significant preference for the AI-generated text, favoring it for both its faithfulness to the author’s style (Odds Ratio = 8.16, p<10⁻¹³) and its general quality (Odds Ratio = 1.87, p=0.010).
Critically, these fine-tuned AI outputs were nearly impossible to distinguish from human writing, with state-of-the-art AI detectors flagging them only 3% of the time. The study’s mediation analysis reveals that fine-tuning eliminates the tell-tale stylistic quirks (e.g., cliché density) that make standard AI outputs detectable, allowing the model to produce a more authentic emulation. The median cost for this process was a mere $81 per author, highlighting the economic viability of creating market-supplanting content at scale.
Why This Matters for Your Case
This study provides plaintiff attorneys with a powerful evidentiary tool to dismantle the fair use defense, particularly its first and fourth factors. The defense’s central claim often rests on the idea that AI-generated content is either transformative or does not harm the potential market for the original copyrighted work. This research directly refutes both assertions. It establishes that fine-tuned AI does not merely create a derivative work; it can generate a market substitute that consumers may actually prefer to new works from human authors writing in a similar style.
The findings provide concrete, empirical data to support claims of market harm (the fourth fair use factor). The argument is no longer theoretical. You can now demonstrate that an AI trained on your client’s work can produce new content that directly competes for the same audience and economic opportunities, thereby diminishing the value of the original work and the author’s brand. The low cost of production further amplifies the threat of market saturation and displacement. This evidence is crucial for establishing direct financial injury and the likelihood of future harm.
Litigation Strategy
In discovery, counsel should specifically seek information on whether an AI model was not just trained on, but fine-tuned on, a client’s corpus of work. This distinction is now paramount. The study’s methodology provides a clear road map for your own experts to replicate these findings with your client’s work, creating case-specific evidence of the model’s substitutive capabilities. This data can be presented to a judge or jury to make the abstract concept of market harm tangible and undeniable.
This research should be a cornerstone of motions opposing summary judgment on fair use grounds. It creates a genuine issue of material fact regarding market effect that cannot be dismissed as speculative. In pleadings and expert reports, frame the AI’s output not as a “remix” or “tool,” but as a direct economic competitor. This study gives you the scientific backing to argue that the AI’s function, in this context, is not transformative but supplanting, serving the exact same intrinsic purpose as the original author’s work.
Key Takeaway
The debate over AI’s impact on creative markets has advanced from the theoretical to the empirical. This study provides the first significant data showing that AI can produce literary content that is not only stylistically imitative but qualitatively superior in the eyes of expert readers. For plaintiff attorneys, this research is a game-changer, offering a clear, data-driven argument to prove market substitution and defeat the fair use defense that underpins much of the AI industry’s legal strategy.