UMAP (dimensionality reduction)

Embeddings
UMAPUMAP compression
UMAP (dimensionality reduction) reduces embedding dimensions (e.g., from 768 to 5 dimensions) while preserving the most important relationships.

UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction method for embeddings that preserves the most important relationships between points in vector space. It's used for compression from 768 to 5 dimensions for visualization or data normalization.

UMAP is particularly needed for normalizing Gemini embeddings with 768 dimensions, which—unlike 3072-dimensional embeddings—are not normalized automatically, which can disrupt cosine similarity results. UMAP provides better preservation of data structure than PCA, especially for nonlinear data, because it models local neighborhoods instead of global variance directions.

For example, after reducing 768 dimensions to 2D using UMAP, thematic clusters (e.g., inheritance law vs labor law) are clearly separated in visualization, while PCA might 'merge' them. In practice, use UMAP for visualization and data exploration, but perform clustering on full dimensions—reduction can lose subtle semantic differences.

Source: AI Semantic SEO Expert, Robert Niechciał (sensai.io)