about
Hi, I'm Saurabh.
I'm an AI engineer focused on the parts of the LLM stack that decide whether a model is good and whether it ships: the systems that train and serve it, the data that shapes it, and the alignment work that makes it useful.
I got here through data engineering — years of building pipelines and infrastructure that had to work at scale, then a master's in Data Engineering from IIT Jodhpur to go deeper — and followed the stack downward into GPUs and upward into post-training. That path means I think about models end-to-end: a kernel-level optimization matters because it changes what experiments you can afford; a deduplication decision matters because it shows up in eval scores three weeks later.
This site is where I write it all down — deep dives with code, benchmarks, and the tradeoffs nobody mentions in the paper.
What I work on
Systems. Making training and inference fast and cheap: getting the most out of every GPU. [Kernels · Parallelism · Quantization · Activation checkpointing · CPU offloading · Inference]
Data. The quiet determinant of model quality: what goes in, what gets cut, and how it is measured. [Evaluation · Curation · Transformation · Filtering · Deduplication · Mixing]
Alignment. Turning base models into useful, reliable ones — and knowing when it worked. [Supervised fine-tuning · Reinforcement learning · Preference data · Synthetic data · Verifiers]
Work with me
I keep room for a small number of engagements at a time — consulting or embedded work on training efficiency, inference optimization, data pipelines, and post-training. The problems I find most interesting these days tend to live at frontier scale: making big training runs cheaper, evals more trustworthy, and alignment pipelines less brittle.
If you're working on something in that space, I'd like to hear about it: connect@saurabh.works.
Elsewhere
Code on GitHub, career history on LinkedIn, new articles via RSS or the newsletter below.