Readers Prefer Outputs of AI Trained on Copyrighted
Original Paper: Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers
Authors: Tuhin Chakrabarty1, Jane C. Ginsburg2, Paramveer Dhillon3,4, 1Department of Computer Science and AI Innovation Institute, Stony Brook University. 2Columbia Law School. 3School of Information Science, University of Michigan. 4MIT Initiative on the Digital Economy., Corresponding authors: [email protected], [email protected], [email protected]
Machine Writing Preferred Over Human Experts After Deep Training
A new study, “Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers,” by Tuhin Chakrabarty, Jane C. Ginsburg, and Paramveer Dhillon, provides powerful empirical evidence for a central argument in the ongoing copyright litigation concerning writing software.
The core legal question in these lawsuits is whether using copyrighted works to train machine learning models results in a commercially competitive product that poses a real economic threat to human authors. This research suggests the answer is strongly affirmative, quantifying how competitive the machine-generated text can be when trained deeply on proprietary data.
Key Findings
- Simple instructions yield poor results: When the writing software was simply asked to imitate famous authors using general prompts, writing experts (MFA candidates) strongly disliked the output, finding it lacked the required style and quality.
- Deep training reverses the outcome: When the software was given specialized, intensive training focused specifically on a single author’s complete copyrighted works, the results flipped dramatically.
- Readers prefer the machine: After this specialized training, both expert readers and everyday readers significantly preferred the text generated by the machine over comparable samples written by expert human writers trained in creative writing.
- Mimicry works: Experts found that the deep-trained software was superior at mimicking the target author’s unique voice and style compared to human professionals attempting the same task.
Legal and Practical Impact
The legal test for copyright infringement often revolves around ‘fair use,’ which requires assessing, among other factors, whether the new work harms the market for the original. This research provides tangible proof that when proprietary books are used for deep training, the resulting technology can generate content that is commercially competitive—and often preferred—over human expertise.
This finding directly supports the argument that authors face a tangible economic threat. In courtrooms, this study could be used as evidence to demonstrate that the technology is not just transformative (a key fair use factor), but also superseding the market for human-created derivative works (such as stylistic imitations or commissioned works in the style of a deceased author).
The practical impact is that publishers and studios may see clear financial incentives to rely on deeply trained writing software for producing high-quality, stylistically accurate content, potentially displacing human writers who specialized in genre or stylistic mimicry.
Risks and Caveats
The study focuses only on the quality of raw output generated by the specialized training process. It does not account for the additional time, cost, and human effort needed to transform that raw output into a cohesive, publishable novel or essay.
Furthermore, the technology used in the most successful part of the study was specifically trained author-by-author, which is a specialized process. While the cost of this deep training was calculated to be extremely low (less than $100 per author), it remains unclear whether the same level of reader preference would hold true for materials generated using general, less-expensive training methods.
When it comes to writing, giving the machine specialized training creates a powerful new competitor to human professionals.