This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.

Selective Temporal Training

August 23, 2025

Hayk Grigo­rian is training small lan­guage models with a corpus of Vic­to­rian text.

This kind of work was wide­spread in the 2016-2020 era, just before GPT-2. After that, the race was on to pile more text into the training corpora, basi­cally regard­less of provenance. That was the path to the emer­gent gen­eral capa­bil­i­ties of modern LLMs … at the expense of inter­esting, human-scale exper­i­ments like this one.

Hayk’s coinage of “Selective Tem­poral Training” is per­haps a bit puffed-up, and I love it. He writes:

[ … ] If I fine-tune some­thing like GPT-2, it’s already pre-trained and that infor­ma­tion won’t go away. If I train from scratch the lan­guage model won’t pre­tend to be old, it just will be. The Goal for this project right now is to create some­thing can reason exclu­sively using knowl­edge from London books pub­lished between 1800 and 1875.

I fear Hayk won’t get to the “reason” he’s after — that seems to depend on a much larger corpus, some of it synthetic — but/and the project might still pro­duce some inter­esting outputs. I wish more col­lege stu­dents were designing their own per­sonal corpora, rather than tum­bling into ChatGPT’s generic embrace.

I spent a lot of my time (too much) circa 2016-2019 training custom/weird AI models, and I can report that it was a fun and inter­esting activity!

To the blog home page