SEMANTIC_SIMILARITY

Embeddings

SEMANTIC_SIMILARITY is an embedding task type optimizing vectors for measuring similarity between texts, used in duplicate detection and cannibalization.

SEMANTIC_SIMILARITY is an embedding task type that optimizes vectors for measuring semantic similarity between text pairs. In SEO, it has two key applications: duplicate detection (similarity near 1.0, e.g., pagination pages with identical content) and cannibalization detection (similarity 0.9-0.99, e.g., 'What is SEO' vs 'SEO Basics').

It's also useful for building internal linking based on nearest neighbors, where you look for pages with cosine similarity above 0.75-0.8. Unlike CLUSTERING, this task type compares text pairs rather than grouping them into clusters. By choosing SEMANTIC_SIMILARITY for site audits, you can scan 10,000 URLs in minutes and find all duplicates and cannibalization — what would take weeks manually.

In practice, start by comparing page titles (title tags): they're short and capture the topic well.

Source: AI Semantic SEO Expert, Robert Niechciał (sensai.io)