An LLM catchup endeavour – The plan
I’ll be the first to admit that I’m more than a little behind the current state-of-the-art with machine learning. It’s not for lack of interest – I squeezed in an ML masters between 2020-2022 out of a desire to lift the proverbial hood & understand these growing behemoths.
Unfortunately – tech steamroller being what it is, what was state of the art in LLMs in 2020 (BERT) is now by comparison a museum piece. Or to quote Ian barber “LLMs are complicated now”
Attention might be all you need, but modern models certainly use a lot of different variants of it: query grouping, compressed, sparse, linear, sliding-window and more. Mixture-of-Experts added selective routing to feed-forward layers, and we have since started routing just about everything else too, from attention blocks to the residual stream. Vision and audio encoders have gone from bolted on to mixed-in, and models have scaled to run at inference time across multiple GPUs, which throws comms ops in that add extra boundaries in the middle of your model.
So, where to begin?
Let’s outline some broad-stroke goals:
We’re going to focus on on LLM agents. It’d be a grave disservice to say that other ML fields haven’t achieved significant advances, but if we have to dust off our skill set in any area: LLMs are the dustiest.
So:
Outline & understand (at least conceptually) seminal LLM advances since 2022.
Build a local environment to test & reinforce understanding of these concepts.
Apply those concepts to personal use cases/projects.
From the above, adapt my skill-set from first principles of building/training models to real world application.
Let’s set some limitations
Locally hosted only
This isn’t an exercise to create state-of-the-art agents. The idea is to have a sandbox environment to explore, apply and test new concepts. Not a pyre to burn money on token allowance in the perpetual pursuit & funding of the next-best-model.
Secondly (and more importantly): I’m willing to wager that the knowledge in being able to locally host models, fine-tune them, understanding the tooling and building that out into locally hosted agents is more likely to be a far more valuable commodity in the next 3-7 years than reliance on third-party providers. I’m not alone[0][1]. I won’t delve into the rationale behind that wager because at some point it turns into me climbing atop my digital soapbox and delivering a sermon against the perils of over-reliance on ‘Big-AI’.
No building models from first principles (for now):
There is a tradeoff to be made here. With wanting to understand seminal breakthroughs in the field, the urge is to create a model from first principles: sourcing a data corpus, cleaning it, establishing our model, train it, and apply the concepts from scratch. Realistically (especially with data and computing limitations), even best efforts in this space are going to output a lame-duck model several leagues behind open models such as Qwen. I’d be spending time re-inventing a wonky wheel (and, realistically the majority of that time is going to be data sourcing/cleaning) at the cost of spending that time on learning how to host and employ a local LLM agent.
The hardware
For the time being, development will be limited to local hardware.
GPU: NVIDIA GeForce RTX 4080 SUPER, 16GB GDDR6RAM: 64GB (2x32GB) Corsair DDR5 VengeanceCPU: AMD Ryzen™ 9 7950X3D, 4.2GHz
As personal projects mature, I’ll cross the bridge of looking for a less transient hardware setup to host it upon.