A practical RAG chunking playbook

Retrieval quality is mostly a chunking problem in disguise. Embeddings and rerankers can only work with the pieces you hand them.

The three questions

What is one answer-sized unit in this corpus? A paragraph? A function? A row?
What context does the reader need around that unit to make sense of it?
What is the smallest chunk that still contains both?

Your chunk size is the answer to question 3.

Defaults that usually work

Prose docs: 500–800 tokens, 10–15% overlap, split on headings first.
Code: split by symbol (function, class), never by line count.
Tables: keep the header row with every chunk of the body.
Chat logs: chunk by conversation turn, not by token window.

Things that quietly break retrieval

Splitting mid-sentence. Always snap to a sentence or block boundary.
Throwing away the document title. Prepend it to every chunk.
Mixing languages in one index without a language tag in metadata.

Metadata is half the system

{
  "doc_id": "handbook-2026",
  "section": "Engineering > On-call",
  "heading_path": ["On-call", "Escalation"],
  "updated_at": "2026-04-12"
}

Filter on metadata first, embed second. It is faster and almost always more accurate.

Measure, don't vibe

Build a small eval set of 50 real questions with known correct passages. Re-run it every time you change chunking. If recall@5 doesn't move, neither should your config.