One of the parallels between the generative AI boom of now and the Internet boom of the late 90s are the legal questions it raises. (As an aside, I ran into Larry Lessig in DC a couple of months ago, and he’s… traveled quite a distance from his “Code is Law” roots).
Wired wrote up Kadrey v. Meta recently which tests whether training AI on public material violates copyright or falls under fair use.
While the authors were heavily focused on the piracy element of the case, Chhabria spoke emphatically about his belief that the big question is whether Meta’s AI tools will hurt book sales and otherwise cause the authors to lose money. “If you are dramatically changing, you might even say obliterating, the market for that person's work, and you're saying that you don't even have to pay a license to that person to use their work to create the product that's destroying the market for their work—I just don't understand how that can be fair use,”
I don’t know if the “hurting the market for the person’s work” is the right test in this case—copyright law was built on balancing the desire for producing original work vs the public good of having it widely available—but if it is, merely training a model on a particular artist’s work will fail the test. Machine generated “Hazy UK Garage” is aural wallpaper, and it may improve Spotify’s margins but it will not impact artists. However, if someone copies a particular artists voice and style, then that is a violation but probably falls under trademark law.
The issue is one of impersonation, not IP theft, and the reason the plaintiff won in Thomson Reuters vs. Ross Intelligence was that the LLM produced verbatim non-public copy from Westlaw that it had trained on. This is more like a search engine indexing private data, and bypassing a paywall to do so, so it’s quite different from Kadrey. Tbh it feels more like Blurred Lines (Gaye vs Thicke) and Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith since it’s testing the limits of fair use.