Teams spend weeks benchmarking pgvector vs. Pinecone vs. Qdrant and ship the exact same mediocre answers, because the vector store was never the bottleneck.
The actual stack
- Chunking — semantic, not fixed-size. Respect document structure.
- Query rewriting — a small LLM turns a vague question into 2–3 search queries.
- Hybrid retrieval — BM25 + dense, then fuse.
- Reranker — a cross-encoder over the top 50.
- Context packing — fit the budget, cite the source.
Swap any vector store out and your answers move by single-digit percent. Add a reranker and they move by 20+.