CloudAura RAG — Ask My Docs

01 / Query Interface

02 / Retrieval Pipeline

How It Works

01

Documents split into 512-token overlapping chunks via recursive character splitting

02

BM25Okapi for keyword matching + ChromaDB vectors for semantic search

03

Reciprocal Rank Fusion merges both retrieval signals into unified ranking

04

Cross-encoder ms-marco-MiniLM rescores top candidates for precision

05

Ollama LLM generates cited answers grounded in retrieved context

03 / System Status

—

Documents Indexed

—

Total Chunks

—

Embedding Model

—

Reranker

—

LLM Engine

—

Health Status

Embeddings all-MiniLM-L6-v2 Local

Vector Store ChromaDB Persistent

Keyword Search BM25Okapi In-Memory

Reranker ms-marco-MiniLM-L-6-v2 Cross-Encoder

LLM Ollama / phi3:mini Local

Framework FastAPI Async

04 / Retrieval Methods

Method	Strength	Weakness	Active
BM25 (Keyword)	Exact term matching, zero-model overhead	Misses synonyms and semantic similarity	Active
Vector Search	Semantic understanding, handles paraphrasing	Can miss exact terms, embedding-quality dependent	Active
Hybrid RRF	Best of both: keyword precision + semantic recall	Marginally higher latency from dual retrieval	Active
Cross-Encoder Rerank	High-precision reordering of candidate set	Slower than bi-encoder, runs on top-N only	Active
Naive RAG	Simple to implement	Lower recall, no precision refinement	Unused