RAG Is Not a Vector Database

Teams spend weeks benchmarking pgvector vs. Pinecone vs. Qdrant and ship the exact same mediocre answers, because the vector store was never the bottleneck.

The actual stack

Chunking — semantic, not fixed-size. Respect document structure.
Query rewriting — a small LLM turns a vague question into 2–3 search queries.
Hybrid retrieval — BM25 + dense, then fuse.
Reranker — a cross-encoder over the top 50.
Context packing — fit the budget, cite the source.

Swap any vector store out and your answers move by single-digit percent. Add a reranker and they move by 20+.