01 / Query Interface
02 / Retrieval Pipeline
How It Works
01
Chunking
Documents split into 512-token overlapping chunks via recursive character splitting
02
Dual Index
BM25Okapi for keyword matching + ChromaDB vectors for semantic search
03
Hybrid Fusion
Reciprocal Rank Fusion merges both retrieval signals into unified ranking
04
Reranking
Cross-encoder ms-marco-MiniLM rescores top candidates for precision
05
Generation
Ollama LLM generates cited answers grounded in retrieved context
03 / System Status
—
Documents Indexed
—
Total Chunks
—
Embedding Model
—
Reranker
—
LLM Engine
—
Health Status
Technology Stack
Embeddings
all-MiniLM-L6-v2
Local
Vector Store
ChromaDB
Persistent
Keyword Search
BM25Okapi
In-Memory
Reranker
ms-marco-MiniLM-L-6-v2
Cross-Encoder
LLM
Ollama / phi3:mini
Local
Framework
FastAPI
Async
04 / Retrieval Methods
Approach Comparison
| Method | Strength | Weakness | Active |
|---|---|---|---|
| BM25 (Keyword) | Exact term matching, zero-model overhead | Misses synonyms and semantic similarity | Active |
| Vector Search | Semantic understanding, handles paraphrasing | Can miss exact terms, embedding-quality dependent | Active |
| Hybrid RRF | Best of both: keyword precision + semantic recall | Marginally higher latency from dual retrieval | Active |
| Cross-Encoder Rerank | High-precision reordering of candidate set | Slower than bi-encoder, runs on top-N only | Active |
| Naive RAG | Simple to implement | Lower recall, no precision refinement | Unused |