Skip to main content
Roey's Memoji

Roey Ben Chaim

Agents. Benchmarks. Systems. Repeat.

Benchmarking LLM Inference: The Metrics That Actually Matter

7 min read

People love to brag about “low latency” or “optimized inference,” but unless you’re clear about what you’re measuring, those numbers are basically mea...

inferencebenchmarks

From RAG to Agentic Retrieval

6 min read

Large language models are great at reasoning, but terrible at remembering everything.

ragretrieval

From DDP to ZeRO-2 and ZeRO-3

4 min read

If you’ve ever trained or fine-tuned a large language model in PyTorch, you’ve probably started with Distributed Data Parallel (DDP).

trainingdistributed

Distributed Data Parallel (DDP) for Training Models

4 min read

Training large models on a single GPU can be painfully slow. PyTorch's Distributed Data Parallel (DDP) is the standard way to scale training across mu...

trainingdistributed

Training LLMs 101

5 min read

Large Language Models (LLMs) don’t start out as friendly assistants. They begin as vast, raw systems trained on enormous datasets—powerful but unpolis...

trainingfundamentals

Ray for LLM Inference

5 min read

Ray is a distributed execution engine. Its job is to take a messy cluster of machines and make it feel like one giant computer.

inferencedistributed

vLLM: LLM Inference That Doesn't Waste Your GPU

5 min read

vLLM is a library for running LLMs on GPUs. It is designed to be fast and efficient.

inferenceperformance

Three Practical Ways to Detect Sensitive Data

4 min read

Agents don’t just think — they move data between systems.

securityprivacy

Evals: How to Evaluate Agents

5 min read

Evaluating agents is messy. Traditional software is deterministic — same input, same output. Agents don’t work that way. They reason in loops, call to...

evalsagents

Why Multi-Agent Systems Matter

8 min read

MAS are emerging as a serious pattern for tackling the limits of single agents.

agentsarchitecture

From Zero to Agent: ReAct, Reflection, and Planning

7 min read

We've covered a lot of topics in the past few posts, but one thing that is missing is the concept of agents.

agentsreasoning

How Agents Remember: On Memory and the Art of Context Engineering

6 min read

When we talk about memory in LLM agents, we’re not talking about neurons or synapses — we’re talking about tokens, context windows, and clever hacks t...

agentsmemory

Structured Outputs in Practice: Instructor vs PydanticAI vs BAML

3 min read

In part one, I wrote about why structured outputs matter and why just asking an LLM to “return JSON” doesn’t cut it.

structured outputtools

Structured Output

4 min read

When you build with LLMs, you quickly run into a recurring issue:

structured outputfundamentals

Engineering Books

2 min read

Listing some technical books that I higly recommend (and actually read).

bookslearning