Nearest Neighbors
EmbeddingsNearest Neighbors (also called k-Nearest Neighbors or kNN) is an algorithm that finds the k closest points in vector space, where k represents the number of similar items to retrieve. Used to identify semantically similar documents based on embeddings, the algorithm measures cosine similarity to find neighbors for each vector representing content.
For each vector representing content, the algorithm finds k nearest neighbors—typically the 10 most similar pages. In SEO, it has three main applications: internal linking where nearest pages become link candidates, recommendation systems for related articles, and duplicate detection where neighbors with similarity above 0.99 indicate potential duplicates. The algorithm runs in vector databases like Qdrant or Supabase with pgvector, or in Python memory using scikit-learn. Approximate Nearest Neighbors (ANN) is a faster version for large datasets, trading slight precision for speed.
In practice, start with k=10 neighbors and a similarity threshold above 0.8 — this gives you a reasonable number of link candidates without creating excessive or irrelevant connections.