This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.

Availability of inputs

March 27, 2025

In this letter to the White House (PDF link), OpenAI makes the case for its use of copy­righted mate­rial for AI training.

In a sec­tion titled Pro­moting the Freedom to Learn, the com­pany writes:

OpenAI’s models are trained to not replicate works for consumption by the public. Instead, they learn from the works and extract patterns, linguistic structures, and contextual insights. This means our AI model training aligns with the core objectives of copyright and the fair use doctrine, using existing works to create something wholly new and different without eroding the commercial value of those existing works.

That pro­viso (emphasis mine) is the crux of the issue, and its accu­racy is presently unknown. It might be true! The value of lan­guage models trained on Everything might indeed be dif­ferent from the value of all those books, articles, and databases. If lan­guage models turn out mostly to operate as capable abstract “reasoners” built into everything, then that will have been the case.

However, it might also: not be true! Lan­guage models might totally under­mine the global mar­kets for orig­inal books, articles, and databases — not by repli­cating them but, e.g., by pro­viding an infi­nite spigot of free alternatives. Lan­guage models now reach far beyond text, and this sce­nario already looms large in the realm of the image.

It’s just impor­tant to rec­og­nize that OpenAI’s argu­ment is far from a slam dunk. This issue deserves be widely debated and, ultimately, legislated.

And then there’s this:

The Euro­pean Union, for one, has cre­ated “text and data mining exceptions” with broadly applic­able “opt-outs” for any rights holder — meaning access to impor­tant AI inputs is less pre­dictable and likely to become more dif­fi­cult as the EU’s reg­u­la­tions take shape. Unpre­dictable avail­ability of inputs hin­ders AI innovation, par­tic­u­larly for smaller, newer entrants with lim­ited budgets.

What a euphemism: “unpre­dictable avail­ability of inputs”. Yes, your “inputs” will not be per­fectly docile. Yes, many people will opt out. Deal with it.

The pre­sump­tion is sort of obscene — that this incred­ible store­house of value, produced by so many people for so many reasons, is just there for the scraping.

You live in a society!

To the blog home page