RETRIEVAL_DOCUMENT

Embeddings

RETRIEVAL_DOCUMENT is an embedding task type that optimizes vectors for representing documents — used on the document side in RAG systems.

RETRIEVAL_DOCUMENT is an embedding task type that optimizes vectors for representing documents in RAG systems — long, informational content. Used when indexing content: each article or chunk is vectorized with this type so the model better captures its informational content. Works in tandem with RETRIEVAL_QUERY (on the query side).

The key principle is that the same embedding model must be used for both indexing and searching — if you indexed content with Gemini using task type RETRIEVAL_DOCUMENT, you must vectorize queries with Gemini using RETRIEVAL_QUERY. Mixing models (e.g., indexing with Jina, searching with OpenAI) produces worthless results because vectors from different models live in different spaces. In practice, when indexing a large site (e.g., 1000 pages), generate embeddings in batches of 50-100 and save to CSV/database after each batch to avoid losing progress on API errors.

Source: AI Semantic SEO Expert, Robert Niechciał (sensai.io)