Availability of inputs
In this letter to the White House (PDF link), OpenAI makes the case for its use of copyrighted material for AI training.
In a section titled Promoting the Freedom to Learn, the company writes:
OpenAI’s models are trained to not replicate works for consumption by the public. Instead, they learn from the works and extract patterns, linguistic structures, and contextual insights. This means our AI model training aligns with the core objectives of copyright and the fair use doctrine, using existing works to create something wholly new and different without eroding the commercial value of those existing works.
That proviso (emphasis mine) is the crux of the issue, and its accuracy is presently unknown. It might be true! The value of language models trained on Everything might indeed be different from the value of all those books, articles, and databases. If language models turn out mostly to operate as capable abstract “reasoners” built into everything, then that will have been the case.
However, it might also: not be true! Language models might totally undermine the global markets for original books, articles, and databases —
It’s just important to recognize that OpenAI’s argument is far from a slam dunk. This issue deserves be widely debated and, ultimately, legislated.
And then there’s this:
The European Union, for one, has created “text and data mining exceptions” with broadly applicable “opt-outs” for any rights holder —
meaning access to important AI inputs is less predictable and likely to become more difficult as the EU’s regulations take shape. Unpredictable availability of inputs hinders AI innovation, particularly for smaller, newer entrants with limited budgets.
What a euphemism: “unpredictable availability of inputs”. Yes, your “inputs” will not be perfectly docile. Yes, many people will opt out. Deal with it.
The presumption is sort of obscene —
You live in a society!
To the blog home page