K-means

Semantic Clustering

K-means algorithm

K-means is a clustering algorithm that divides data points into k groups based on distance from centroids—requires specifying cluster count upfront.

K-means is a clustering algorithm that divides datasets into groups of points (such as embeddings), where k is specified upfront. The algorithm iteratively assigns each point to the nearest centroid (cluster center) and moves the centroids until convergence.

In the semantic audit clustering pipeline, K-means is the primary algorithm for splitting keywords into thematic clusters—the keyword clustering skill uses it with Gemini embeddings for clustering tasks. Its main advantage is producing clean, evenly-sized clusters ideal for content planning.

However, its main limitation is that you must specify k upfront—choosing the wrong k value leads to clusters that are either too broad or too narrow. Choosing k is supported by Silhouette Score: you test k from 5 to 30 and pick the value with the highest score.

For example, 500 keywords with k=15 produces 15 clusters of approximately 33 keywords each, where each cluster represents a potential article.

In practice, start with k = number_of_keywords / 30 as a baseline, test k values within plus or minus 5 of that baseline, and select the k value with the highest Silhouette Score.

Source: AI Semantic SEO Expert, Robert Niechciał (sensai.io)