DBSCAN
Semantic ClusteringDBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that groups data points by their proximity in vector space. Unlike K-means, it automatically determines the number of clusters and explicitly identifies outliers—points that don't belong to any cluster. It operates with two key parameters: eps (maximum distance between points in the same cluster) and min_samples (minimum points required to form a cluster).
DBSCAN's primary advantage is eliminating the need to pre-specify cluster count, and outliers get explicitly labeled as -1. The drawback is choosing eps—which is tricky and requires experimentation. Small eps values fragment data into too many tiny clusters, while large values incorrectly merge distinct groups. In SEO, DBSCAN excels at exploratory analysis when you're unsure how many topics exist in a keyword set.
For example, applying DBSCAN to 500 keywords might yield 12 clusters plus 45 outliers (phrases belonging to no topic). A common workflow uses DBSCAN for initial exploration to determine cluster count. Then apply K-means for final clustering. DBSCAN reveals the optimal cluster count, then K-means can cleanly partition the data.