Training 100 models in public

One of the things that I’ve admired of individuals like Andrej Karpathy or Dwarkesh is that the work that they do is deeply public. Things like Mini GPT are both a way of advancing one’s own knowledge about the depth of craft with transformers, as well as a way of teaching and transmitting that knowledge so that others may gain as well.

For me, the value in teaching extends beyond the obvious – there is clarifying one’s own knowledge and learning to strip down the essence of an idea to the bare essentials. But beyond that, I think there is a deeply satisfying and positive-sum worldview that lies beneath it. The world is better when knowledge is open – and the world is a better place when people who know and are capable of doing a thing share that with others.

In the spirit of additional learning and teaching – I’ll be spending some time on this blog training 100 models in public on this site. I’ll be taking a deep dive starting from basics underlying modern LLMs such as transformers, through to other areas of curiosity that I’ve had of late, such as diffusion models, state space models, and different agent instrumentation layers.

There won’t be a guarantee of doing this in 100 days – I expect this to at least take a year or more to arrive at – but in the end I hope to get as much or more out of this process as you – the reader. I hope this can inspire others in the way that I’ve been inspired by some people like Karpathy, or originally with Visa with his original reflection of “do 100 things” and being prolific.


Comments

Leave a comment