/blog
RAG7 min

A practical RAG chunking playbook

Chunking is the single highest-leverage knob in a RAG system. Here is how we choose chunk size, overlap, and structure for real corpora.

May 30, 2026

Retrieval quality is mostly a chunking problem in disguise. Embeddings and rerankers can only work with the pieces you hand them.

The three questions

  1. What is one answer-sized unit in this corpus? A paragraph? A function? A row?
  2. What context does the reader need around that unit to make sense of it?
  3. What is the smallest chunk that still contains both?

Your chunk size is the answer to question 3.

Defaults that usually work

  • Prose docs: 500–800 tokens, 10–15% overlap, split on headings first.
  • Code: split by symbol (function, class), never by line count.
  • Tables: keep the header row with every chunk of the body.
  • Chat logs: chunk by conversation turn, not by token window.

Things that quietly break retrieval

  • Splitting mid-sentence. Always snap to a sentence or block boundary.
  • Throwing away the document title. Prepend it to every chunk.
  • Mixing languages in one index without a language tag in metadata.

Metadata is half the system

{
  "doc_id": "handbook-2026",
  "section": "Engineering > On-call",
  "heading_path": ["On-call", "Escalation"],
  "updated_at": "2026-04-12"
}

Filter on metadata first, embed second. It is faster and almost always more accurate.

Measure, don't vibe

Build a small eval set of 50 real questions with known correct passages. Re-run it every time you change chunking. If recall@5 doesn't move, neither should your config.