New Method Quantifies AI Stylistic Similarity for Copyright Infringement Claims
Original Paper: LLM one-shot style transfer for Authorship Attribution and Verification
Authors: Pablo Miralles-González, Javier Huertas-Tato, Alejandro Martín, David Camacho Department of Computer Systems Technical University of Madrid Madrid {pablo.miralles, javier.huertas.tato, alejandro.martin, david.camacho}@upm.es
**
Original Paper: LLM one-shot style transfer for Authorship Attribution and Verification Authors: Pablo Miralles-González, Javier Huertas-Tato, Alejandro Martín, David Camacho
Executive Summary
A new study from the Technical University of Madrid details an unsupervised method for measuring stylistic similarity between texts using a large language model’s (LLM) internal mechanics. This research provides plaintiff attorneys with a quantitative, data-driven tool to argue that an AI-generated work is substantially similar to an author’s copyrighted material, directly challenging the “transformative use” defense central to many AI copyright cases.
What the Research Shows
Computational stylometry—the quantitative analysis of writing style—has long been used in forensic and literary attribution. Traditional methods, however, often require large, curated datasets and struggle to distinguish an author’s unique style from the topic they are writing about. This can lead to spurious correlations that are easily challenged in a legal setting.
This paper introduces a novel approach that bypasses these limitations. The researchers developed an unsupervised method that leverages the internal log-probabilities of an LLM to measure “style transferability.” In simple terms, it quantifies how well the stylistic signature of one text can be imposed upon another. Because the method is unsupervised and focuses on the model’s internal representation of language, it effectively isolates style from topic, creating a more precise and defensible metric for stylistic similarity. The study found this method significantly outperforms previous approaches, providing a reliable way to link an output text to a specific author’s stylistic fingerprint.
Why This Matters for Your Case
This research directly targets the central pillar of the AI industry’s fair use defense: transformative use. Defense counsel argues that LLMs learn abstract concepts and styles, transforming ingested works into something fundamentally new. This paper provides the technical evidence to refute that claim. It demonstrates that LLMs do not merely learn in the abstract; they create a quantifiable, internal model of an author’s specific, copyrightable expression that can be measured and reproduced. This allows you to argue that the AI’s output is not a new creation but a derivative work, directly copied from the expressive elements of your client’s copyrighted material.
Furthermore, this methodology moves the argument of “substantial similarity” from the realm of subjective comparison to objective, data-driven analysis. Instead of relying solely on side-by-side textual comparisons and qualitative expert testimony, you can now present the court with numerical evidence. This method generates a score that represents the degree of stylistic similarity, providing a concrete metric to demonstrate that an AI’s output is not just coincidentally similar but is a direct product of the model ingesting and replicating your client’s work.
Litigation Strategy
Counsel should immediately engage technical experts capable of implementing and validating this log-probability methodology. The goal is to create compelling, data-backed evidence for motions for summary judgment and trial. Your expert can run the infringing AI output against your client’s corpus of work, generating a “style transferability” score. This same test can then be run against a control group of other authors’ works to demonstrate that the stylistic match to your client is a statistically significant outlier, not a random coincidence.
This research also opens a new front in discovery. You can now craft targeted discovery requests demanding access to the specific model’s log-probabilities related to the generation of the infringing text. While defendants will likely resist, arguing proprietary information, this paper provides a strong basis to compel production. You can argue that this data is the only way to prove the direct causal link between the model’s training on copyrighted works and the subsequent infringing output, making it essential evidence for your case.
Key Takeaway
The abstract legal debate over AI and fair use is now being grounded in forensic data. This paper provides plaintiffs with a powerful new weapon: a scientifically validated method to quantify stylistic copying. By translating the abstract concept of an author’s “style” into hard numbers, this research provides the evidentiary foundation needed to prove that an AI’s output is a derivative work, fundamentally undermining the transformative use defense and strengthening claims for copyright infringement.