Roey Ben Chaim

Tinkerer, creator, and technology enthusiast. Drinking coffee and building stuff.

Blog Posts

Benchmarking LLM Inference: The Metrics That Actually Matter

September 25, 2025

People love to brag about “low latency” or “optimized inference,” but unless you’re clear about what you’re measuring, those numbers are basically mea...

From RAG to Agentic Retrieval

September 22, 2025

Large language models are great at reasoning, but terrible at remembering everything....

From DDP to ZeRO-2 and ZeRO-3

September 20, 2025

If you’ve ever trained or fine-tuned a large language model in PyTorch, you’ve probably started with Distributed Data Parallel (DDP)....

Distributed Data Parallel (DDP) for Training Models

September 18, 2025

Training large models on a single GPU can be painfully slow. PyTorch's Distributed Data Parallel (DDP) is the standard way to scale training across mu...

Training LLMs 101

September 17, 2025

Large Language Models (LLMs) don’t start out as friendly assistants. They begin as vast, raw systems trained on enormous datasets—powerful but unpolis...

Ray for LLM Inference

September 12, 2025

Ray is a distributed execution engine. Its job is to take a messy cluster of machines and make it feel like one giant computer....

vLLM: LLM Inference That Doesn’t Waste Your GPU

September 12, 2025

vLLM is a library for running LLMs on GPUs. It is designed to be fast and efficient....

Three Practical Ways to Detect Sensitive Data

September 11, 2025

Agents don’t just think — they move data between systems....

Evals: How to Evaluate Agents

September 9, 2025

Evaluating agents is messy. Traditional software is deterministic — same input, same output. Agents don’t work that way. They reason in loops, call to...

Why Multi-Agent Systems Matter

September 7, 2025

MAS are emerging as a serious pattern for tackling the limits of single agents....

From Zero to Agent: ReAct, Reflection, and Planning

September 6, 2025

We've covered a lot of topics in the past few posts, but one thing that is missing is the concept of agents....

How Agents Remember: On Memory and the Art of Context Engineering

September 5, 2025

When we talk about memory in LLM agents, we’re not talking about neurons or synapses — we’re talking about tokens, context windows, and clever hacks t...

Structured Outputs in Practice: Instructor vs PydanticAI vs BAML

September 5, 2025

In part one, I wrote about why structured outputs matter and why just asking an LLM to “return JSON” doesn’t cut it....

Structured Output

September 4, 2025

When you build with LLMs, you quickly run into a recurring issue:...

Engineering Books

August 9, 2025

Listing some technical books that I higly recommend (and actually read)....