Reranking
RAG (Retrieval Augmented Generation)Reranking is the second retrieval stage in RAG systems, where a cross-encoder model precisely re-sorts results initially retrieved by embeddings. Embeddings are fast but 'shallow'—they produce clustered similarity scores (e.g., 0.84–0.88), making it difficult to distinguish between 'very good' and 'ideal' matches.
A reranker provides much greater discrimination (e.g., 0.36–0.71), clearly highlighting the best results. It's SLOW and EXPENSIVE, so you use it ONLY on the top 10–100 results from embeddings, never on the entire database. Popular providers include JinaAI (multilingual, good for Polish), Cohere (reranking pioneer), ColBERT, and FlashRank (local, no API). In SEO, reranking is essential for precise internal linking: embeddings surface 100 candidates, but the reranker selects the 5–10 that are REALLY worth linking.
Think of it like job recruitment: embeddings are initial CV screening (from 1,000 candidates you pick 20), reranking is the interview stage (from 20 you choose the 3 best).