Small Models, Big Impact

Copyright: Sanjay Basu Why Size Isn’t Everything in AI Small models matter — a lot. It’s easy to get dazzled by trillion-parameter giants that promise general intelligence, but I strongly believe the smaller 7-billion-parameter models like MPT-7B, the Llama family of 7B, Falcon 7B, and Mistral 7B are the real unsung heroes, especially when you’re dealing with multi-agent workflows. Why am I advocating for smaller models? Well, let’s talk practicality. On a single NVIDIA A100 40GB GPU — the workhorse we have at our fingertips — you can comfortably run inference for open-source models like Llama 2 7B, Mistral 7B, Phi-2, Falcon 7B, MPT 7B, and even some smaller instruction-tuned variants like FLAN-T5 (up to 3B parameters). Closed-source gems such as Claude 3 Haiku variants and early GPT models at 7B also fit neatly. With a bit of clever optimization (quantization at 4-bit or 8-bit), you can even squeeze in certain 13B models. Now, let’s consider fine-tuning. Full fine-t...