Semantic SEO Encyclopedia
265 terms in 15 categories
AI Search
50-Word Rule
50-Word Rule — Content optimization rule for AI Search: key answers must appear within the first 50 words of an article or H2 section.
AI SearchAgent Decision Optimization
Agent Decision Optimization optimizes content for the moment AI agents decide whether to cite a source, not just for search rankings.
AI SearchAI Mode / AI Overview
AI Mode and AI Overview are Google features that display synthetic AI answers directly in search results above organic listings.
AI SearchAI SEO Alignment Score
AI SEO Alignment Score measures how well content matches AI Search citability criteria: factors in information density, BLUF, and SRL.
AI SearchAPI-able Brand
API-able Brand is a brand optimized for consistent retrieval by APIs and AI systems through unified naming, structured data, and presence.
AI SearchBLUF (Bottom Line Up Front)
BLUF (Bottom Line Up Front) is placing the key answer at the very beginning of content, increasing citation probability to 62%.
AI SearchChatGPT (Bing + Broad Fanout)
ChatGPT (Bing + Broad Fanout) is OpenAI's AI Search platform using Bing's index with broad query fanout: prefers Wikipedia content.
AI SearchCited (Being Cited)
Cited: Second visibility level in AI Search where content gets cited as a source in AI responses with a link or brand name provided.
AI SearchClaude (Own Index)
Claude (Own Index): Anthropic's AI platform with its own web index. Unlike ChatGPT, it doesn't rely on Bing for search results.
AI SearchCompetitive Heuristic
Competitive Heuristic — The most effective AI Search content optimization heuristic (+1.61 weight), based on competitor comparisons (X vs Y).
AI SearchContent Freshness
Content Freshness measures how recently content was updated: AI Search favors current data and regular updates over outdated content.
AI SearchCross-Domain Web Entities
Cross-Domain Web Entities are unified brand representations that AI systems recognize across multiple platforms and domains.
AI SearchEarned Media vs Owned Media
Earned Media vs Owned Media distinguishes content on your own channels (owned) from mentions in external sources (earned).
AI SearchFeatured Snippet
Featured Snippet is a highlighted answer displayed at position zero in Google results, above organic listings—tables and lists have the highest chances.
AI SearchFramework Retrieved-Cited-Trusted
Framework Retrieved-Cited-Trusted is a three-stage AI Search visibility framework: being Retrieved from index, being Cited, and becoming Trusted.
AI SearchGemini (Knowledge Graph + YouTube)
Gemini is Google's AI Search platform leveraging Google Knowledge Graph and YouTube for unique entity understanding and multimodal capabilities.
AI SearchIntentional Decomposition
Intentional Decomposition — A query breakdown method that maps user decision-making stages to anticipate follow-up questions throughout the customer jou...
AI SearchKeyword Stuffing (in AI Context)
Keyword Stuffing (in AI Context) is the SEO technique of mechanically repeating keywords, which becomes ineffective against AI-powered search systems.
AI SearchLinguistic Hedging
Linguistic Hedging is using words that weaken statement certainty (maybe, perhaps, probably), reducing likelihood of AI citations.
AI SearchMinimalist Heuristic
Minimalist Heuristic: AI Search optimization strategy with negative impact (weight -1.66 in studies); reducing content depth decreases AI citation oppor...
AI SearchMultisourcing
Multisourcing is establishing brand presence across multiple independent platforms so AI systems can cross-verify information and consider it credible.
AI SearchPerplexity (Aggressive Fanout)
Perplexity is an AI search platform that generates multiple related queries and always shows source links with real traffic opportunities.
AI SearchQuery Fan-out
Query Fan-out is an AI Search mechanism that splits a single user query into 5–10 sub-queries, each searching the index independently.
AI SearchReasoning Gap (Question Coverage Matrix)
Reasoning Gap is a question coverage matrix identifying AI reasoning gaps: questions no content on the web answers well despite the need.
AI SearchRetrieved (Being Retrieved from the Index)
Retrieved (Being Retrieved from the Index) is content that has been pulled from search indexes into an AI system's context window.
AI SearchSemantic Decomposition
Semantic Decomposition is a type of query decomposition in AI Search that breaks questions into component parts by meaning — e.g.
AI SearchSub-queries
Sub-queries are smaller helper questions automatically generated by AI search systems, breaking complex queries into manageable parts.
AI SearchThin Content (AI-generated)
Thin Content (AI-generated) is low-quality content mass-produced by AI without unique value: detected by algorithms and lowering rankings.
AI SearchTrusted (Being a Trusted Source)
Trusted (Being a Trusted Source) is the third visibility level in AI Search where a brand becomes AI's default trusted source for queries.
AI SearchVerification Decomposition
Verification Decomposition breaks complex queries into verifiable sub-questions to ensure answer accuracy across multiple sources in AI Search.
AI SearchZero-Click Search
Zero-Click Search occurs when users get answers directly in search results or AI Search without clicking through to any source website.
Contextual Vector
Contextual Connections (links)
Contextual Connections are links between related content based on shared semantic attributes rather than matching keywords.
Contextual VectorContextual Hierarchy (H1-H2-H3)
Contextual Hierarchy (H1-H2-H3) is a heading structure that builds semantic context for AI Search — H1 defines the topic, H2 divides into chunks.
Contextual VectorContextual Structure (BLUF per H2)
Contextual Structure (BLUF per H2) is a principle where each H2 section starts with the key thesis in the first sentence: ready chunk for AI citation.
Contextual VectorContextual Vector
Contextual Vector is a contextual vector defining three optimization dimensions: heading hierarchy, BLUF structure, and internal linking.
E-E-A-T
Authority (autorytet)
Authority (autorytet): E-E-A-T component representing recognition of author/site as industry authority through credible backlinks and validation signals.
E-E-A-TCross-domain entity building
Cross-domain entity building is the practice of building entity recognition across multiple platforms simultaneously to strengthen E-E-A-T and Trust.
E-E-A-TExperience
Experience is the E-E-A-T component representing the author's first-hand knowledge of a topic, demonstrated through case studies and original content.
E-E-A-TExpertise
Expertise is an E-E-A-T component representing an author's professional knowledge, signaled by publications, certifications, and citations.
E-E-A-TTrust
Trust is the credibility component of E-E-A-T that measures how reliable and secure a website appears to users and search engines.
E-E-A-TTTS (Truthful Text Summarization)
TTS (Truthful Text Summarization) is a mechanism that verifies entity information consistency across sources—AI compares your data with network consensus.
Embeddings
Cannibalization (similarity 0.9–0.99)
Cannibalization (similarity 0.9–0.99) is when pages compete for the same queries, detected when semantic analysis shows 90-99% content similarity.
EmbeddingsCLUSTERING
CLUSTERING is an embedding task type that optimizes vectors for topical grouping and is used in keyword clustering pipelines.
EmbeddingsContent Pruning (outliers)
Content Pruning (outliers): identifying pages topically distant from a site's centroid (outliers) as candidates for removal or relocation.
EmbeddingsCosine Similarity (0–1)
Cosine Similarity (0–1) measures the angle between two vectors to determine content similarity — the standard metric for comparing embeddings in SEO.
EmbeddingsDuplicate Detection (similarity 1.0)
Duplicate Detection (similarity 1.0) detects identical content using embeddings and cosine similarity near 1.0 for SEO duplicate identification.
EmbeddingsEmbedding Cache (historical)
Embedding Cache (historical) is a mechanism that stores computed embeddings in the data/embeddings directory to avoid repeated API calls.
EmbeddingsEmbedding Normalization
Embedding normalization scales embedding vectors to uniform length: 3072-dimensional embeddings are normalized 'out of the box'.
EmbeddingsEuclidean Distance
Euclidean distance measures the straight-line distance between two points in vector space — smaller values indicate greater semantic similarity.
EmbeddingsGenerative Model
Generative Model is an AI that creates text from prompts (GPT-4, Claude, Gemini), unlike embedding models that convert text to vectors.
EmbeddingsInternal Linking (nearest neighbors)
Internal Linking (nearest neighbors) is an internal linking strategy based on embeddings and the nearest neighbors algorithm.
EmbeddingsMTEB Leaderboard
MTEB Leaderboard ranks embedding models on standardized benchmarks, helping SEO professionals choose the best embedding model for specific tasks.
EmbeddingsNearest Neighbors
Nearest Neighbors is an algorithm that finds the k closest points in vector space by measuring similarity between data points.
EmbeddingsRedirect Maps (migration)
Redirect Maps (migration) are automated systems that use embeddings to match old URLs to new ones during site migration based on content similarity.
EmbeddingsRETRIEVAL_DOCUMENT
RETRIEVAL_DOCUMENT is an embedding task type that optimizes vectors for representing documents — used on the document side in RAG systems.
EmbeddingsRETRIEVAL_QUERY
RETRIEVAL_QUERY is an embedding task type that creates vectors optimized for search queries and is used on the query side in RAG systems.
EmbeddingsSemantic Search Engine
Semantic Search Engine is a search system based on embeddings and cosine similarity that understands query intent rather than keywords.
EmbeddingsSEMANTIC_SIMILARITY
SEMANTIC_SIMILARITY is an embedding task type optimizing vectors for measuring similarity between texts, used in duplicate detection and cannibalization.
Embeddingst-SNE and PCA
t-SNE and PCA are dimensionality reduction methods that reduce high-dimensional vectors to 2D/3D for visualizing topical clusters.
EmbeddingsTask Type (parameter)
Task Type is an embedding model parameter that specifies the intended task (retrieval, classification, clustering) to optimize vector quality.
EmbeddingsTokenization
Tokenization splits text into smaller units called tokens that AI models can process, such as words, subwords, or characters.
EmbeddingsTransformer (architecture)
Transformer (architecture) is a neural network architecture created by Google that serves as the foundation for models like BERT and GPT.
EmbeddingsUMAP (dimensionality reduction)
UMAP (dimensionality reduction) reduces embedding dimensions (e.g., from 768 to 5 dimensions) while preserving the most important relationships.
EmbeddingsVector Quantization
Vector quantization compresses embedding vectors by reducing numeric precision, trading slight accuracy loss for major memory savings.
EmbeddingsVector Representation
Vector Representation encodes information as numerical arrays, letting AI and search engines mathematically compare text meanings.
EmbeddingsWord2Vec (2013)
Word2Vec (2013) is a pioneering embedding model that converts words to numeric vectors: learned from context but assigned each word one fixed vector.
Knowledge Graphs
Attribute Network (Semantic Hubs)
Attribute Network identifies attributes that connect multiple entities simultaneously, forming semantic hubs in graph analysis.
Knowledge GraphsBetweenness Centrality (Hub Page)
Betweenness Centrality measures how many shortest paths between node pairs pass through a given node: identifying Hub Pages that bridge topical clusters.
Knowledge GraphsContent Gaps (Graph vs Own Site)
Content Gaps (Graph vs Own Site) identifies missing content by comparing what the knowledge graph for a topic contains versus what your site covers.
Knowledge GraphsContextual Bridge
Contextual Bridge is a knowledge graph attribute linking two distant entities through shared context: e.g., 'taxation' bridging inheritance and donation.
Knowledge GraphsCore Unique (Graph Layer)
Core Unique (Graph Layer) is the top tier of a knowledge graph containing unique and root attributes that form an entity's topical foundation.
Knowledge GraphsDegree Centrality (Pillar Page)
Degree Centrality (Pillar Page) is a graph metric measuring node connections: high values indicate pillar page candidates.
Knowledge GraphsEAV-to-Graph Mapping
EAV-to-Graph Mapping transforms Entity-Attribute-Value triplets into knowledge graph structure: Entity and Value become nodes.
Knowledge GraphsEdge (Graph Edge)
Edge (Graph Edge) is a connection between two nodes in a knowledge graph representing a relationship—has a label (relationship type).
Knowledge GraphsGraph-based vs Lexical Linking
Graph-based vs Lexical Linking compares two internal linking approaches: lexical (keyword matching) vs graph-based (semantic relationships).
Knowledge GraphsGraphRAG (Microsoft)
GraphRAG (Microsoft) — Microsoft's approach combining knowledge graphs with RAG, searching graph structure instead of similar chunks.
Knowledge GraphsHelicopter View
Helicopter View: A knowledge graph-level perspective on an entire site or topic that reveals complete topical structure at a glance.
Knowledge GraphsHOP-PAA (Query Expansion Tag)
HOP-PAA is a query expansion tag marking sub-queries discovered by hopping through People Also Ask sections across multiple SERP levels.
Knowledge GraphsIterative Graph Expansion (MERGE)
Iterative Graph Expansion (MERGE) builds knowledge graphs by incrementally adding nodes and relationships using Neo4j's MERGE operation.
Knowledge GraphsJSON (Graph Transport)
JSON (Graph Transport) is a pattern that enables structure transfer between Neo4j, Python, and LLMs using universal JSON format.
Knowledge GraphsKnowledge Graph
Knowledge Graph is a data structure of nodes (entities) and edges (relationships) that represents knowledge about a topic.
Knowledge GraphsLLM-PREDICTED (Query Expansion Tag)
LLM-PREDICTED marks sub-queries generated purely by language models without SERP validation — hypotheses requiring verification.
Knowledge GraphsMERGE (Neo4j Operation)
MERGE (Neo4j Operation) creates nodes or relationships only if they don't already exist; otherwise, it updates their properties, preventing duplicates.
Knowledge GraphsNODE (Details)
NODE (Details) — Node properties in knowledge graphs: type (entity/attribute), name, URR classification, layer (Core/Strong/Relevant).
Knowledge GraphsNode (Graph Node)
Node (Graph Node) is a basic element of a knowledge graph representing an entity or attribute—a point from which edges connect to other nodes.
Knowledge GraphsNODE Page
NODE Page is a site page that corresponds to a specific node in a knowledge graph — 1:1 mapping between nodes and URLs enables automatic.
Knowledge GraphsQuery Expansion (20–30 Sub-queries)
Query Expansion broadens a seed query into 20–30 sub-queries using LLM and SERP data: builds complete topical coverage of the central entity.
Knowledge GraphsRelationship Label
Relationship Label names the connection between nodes in a knowledge graph, describing relationship types like HAS_ATTRIBUTE or SHARES_ATTRIBUTE.
Knowledge GraphsRelationship Strength
Relationship Strength is the numeric value measuring connection strength between knowledge graph nodes — determining internal linking priority.
Knowledge GraphsRelevant Contextual (Graph Layer)
Relevant Contextual is the lowest knowledge graph layer containing RARE attributes: contextual and supplementary information that supports main content.
Knowledge GraphsSEED (Sub-topics)
SEED (Sub-topics) is a set of sub-topics derived from a seed query that forms the foundation for building topical clusters and knowledge graphs.
Knowledge GraphsSEED Page
SEED Page is the starting page of a topical cluster corresponding to the central entity in the knowledge graph: the foundation for topical authority.
Knowledge GraphsSEED-PAA (Query Expansion Tag)
SEED-PAA marks sub-queries from the People Also Ask box that appears for the seed query — The first and highest-priority expansion level.
Knowledge GraphsSHARES_ATTRIBUTE (Shared Attributes)
SHARES_ATTRIBUTE is a knowledge graph relationship connecting entities that share the same attribute—determines internal linking strength between subpages.
Knowledge GraphsStrong Direct (Graph Layer)
Strong Direct is the middle knowledge graph layer containing ROOT attributes with high search volumes: the ranking backbone content.
Lexical Semantics
Antonyms (Comparisons)
Antonyms (Comparisons) — Lexical relationship between words with opposite meanings that activates comparison frames and strengthens content.
Lexical SemanticsBoolean (Is X?)
Boolean (Is X?) is a semantic frame question type asking 'is X a Y?' that requires a yes/no answer with justification: often appears in PAA.
Lexical SemanticsCo-occurrences
Co-occurrences measure how frequently terms appear together in texts: strong co-occurrences build algorithmic expectations about content.
Lexical SemanticsComparative (X vs Y?)
Comparative (X vs Y?) is a frame question type that requires comparison with criteria and shows high citation rates in AI search results.
Lexical SemanticsCost (How Much?)
Cost (How Much?) is a semantic frame that addresses pricing questions requiring specific numerical data; AI Search favors exact amounts with context.
Lexical SemanticsDefinitional (What is?)
Definitional (what is?) is a semantic frame question type asking 'what is X?' that requires an entity definition, typically covered at article start.
Lexical SemanticsDistributional Semantics
Distributional Semantics is a theory that word meaning comes from the contexts where words appear: the foundation of embeddings and semantic search.
Lexical SemanticsFrame Semantics
Frame Semantics is a theory that words activate holistic conceptual frames: 'purchase' triggers buyer, seller, price, and product elements.
Lexical SemanticsGrouping (What Types?)
Grouping is a semantic frame query type asking 'what are the types of X?' requiring taxonomy/classification; ideal format is lists or tables.
Lexical SemanticsHypernyms (Superordinate Categories)
Hypernyms are broader category terms in conceptual hierarchies (e.g., 'vehicle' for 'car') that help search engines understand taxonomic relationships.
Lexical SemanticsHyponyms (Entity Subtypes)
Hyponyms are entity subtypes in lexical relationships: 'espresso' is a hyponym of 'coffee'. They build topical depth and support Query Fan-out.
Lexical SemanticsMeronyms (Component Parts)
Meronyms (Component Parts): Lexical relationship indicating a component part of an entity (e.g., 'keyboard' is a meronym of 'laptop').
Lexical SemanticsPolysemy
Polysemy: when one word has multiple meanings (e.g., 'bank' = financial institution / riverbank) — an SEO challenge solved by contextual embeddings.
Lexical SemanticsProcess (How to?)
Process (How to?) is a frame question type that answers 'how to do X?' with step-by-step instructions that lower Cost of Retrieval.
Lexical SemanticsSynonyms (Broader Matching)
Synonyms (Broader Matching) are words with the same or similar meaning — using synonyms in content broadens matching with user queries.
Lexical SemanticsTerm Distribution in Articles
Term Distribution in Articles is the strategic placement of key terms across an article to maintain semantic relevance in all sections.
Macro-semantics (site level)
Breadcrumbs (Cluster Hierarchy)
Breadcrumbs (Cluster Hierarchy) show a page's hierarchical path within a site's cluster structure, helping Google understand site architecture.
Macro-semantics (site level)Broad Core Update and Semantic Distance
Broad Core Update and Semantic Distance is the phenomenon where Google's Core Updates alter semantic relationships between topics, shifting clusters.
Macro-semantics (site level)Content Consolidation
Content Consolidation merges semantically similar pages that cannibalize each other into one stronger, unified content piece.
Macro-semantics (site level)Embedding Centroid
Embedding Centroid is the center of gravity of all page embedding vectors on a site: the point that defines 'what the site is about' in semantic space.
Macro-semantics (site level)Faceted Navigation (Filters)
Faceted Navigation (Filters) is a filter system on e-commerce sites (color, size, price) generating dynamic URLs — requires careful configuration.
Macro-semantics (site level)Google Warehouse API (Documentation Leak)
Google Warehouse API (Documentation Leak) — Internal Google documentation leak revealing metrics like Site Focus Score and Site Radius.
Macro-semantics (site level)Navigation and Crawl Budget
Navigation and Crawl Budget describes how excessive navigation HTML consumes crawl budget, reducing resources available for content discovery.
Macro-semantics (site level)Pillar Page
Pillar Page is the main page of a topical cluster covering the topic completely and linking to supporting pages; every CORE cluster needs one.
Macro-semantics (site level)Server-Side Rendering (SSR)
Server-Side Rendering (SSR) generates HTML on the server instead of the browser, ensuring content is immediately available to search crawlers.
Macro-semantics (site level)Site Architecture
Site Architecture is the hierarchical structure of a website (ROOT > SEED > NODE) that determines how crawlers and users navigate the site.
Macro-semantics (site level)Site Focus Score
Site Focus Score is a metric from the Google Warehouse API leak measuring website topical coherence: higher focus means Google understands better.
Macro-semantics (site level)Site Radius
Site Radius is a metric from the Google Warehouse API leak measuring a site's topical spread—small radius means focused, large means scattered.
Macro-semantics (site level)Site-wide N-grams
Site-wide N-grams analyze the most frequent phrases across a website, revealing dominant topics and helping assess topical coherence.
Macro-semantics (site level)TTFB (Time to First Byte)
TTFB (Time to First Byte) measures the time from an HTTP request to the first server response byte—a key metric for crawl budget and Core Web Vitals.
Macro-semantics (site level)Two-wave Indexing
Two-wave Indexing is Google's two-phase indexing process: first wave analyzes raw HTML content, second wave processes JavaScript-rendered content.
Metrics & Audit
AI Citability Score (0-10)
AI Citability Score (0-10) measures how likely content is to be cited by AI systems — Evaluates chunk autonomy, BLUF, and atomic claims.
Metrics & AuditAI Presence Rate
AI Presence Rate measures how often a brand appears in answers generated by AI Search systems — the AI-era equivalent of Share of Voice.
Metrics & AuditBEFORE/AFTER (recommendations)
BEFORE/AFTER (recommendations) is an audit format that presents specific content changes with concrete before/after examples.
Metrics & AuditCitation Authority
Citation Authority measures the quality of sources citing a brand in AI Search. More credible sources mean higher citation authority.
Metrics & AuditContent Format Intelligence
Content Format Intelligence analyzes format preferences in top search results (FAQ, tables, lists) to complement traditional content gap analysis.
Metrics & AuditCQS (Content Quality Score 0-100)
CQS (Content Quality Score 0-100) is a composite quality metric: a weighted average of six dimensions including CSI, CoR, Density, SRL, TF-IDF, E-E-A-T.
Metrics & AuditCQS Formula (weighted average)
CQS Formula is a weighted scoring system combining six content quality dimensions: CSI (0.25), E-E-A-T (0.20), CoR (0.20), Density, SRL, and TF-IDF.
Metrics & AuditGoogle Quality Rater Guidelines
Google Quality Rater Guidelines: Google's official document describing how to evaluate website quality, distinguishing Main Content.
Metrics & AuditImpact × Effort (prioritization)
Impact × Effort prioritizes audit recommendations using Priority = Impact × (1/Effort), ranking from '1-NOW' (high impact, low effort) to '5-SKIP'.
Metrics & AuditSchema.org Markup
Schema.org markup is structured data that uses the Schema.org vocabulary to help search engines understand content type and structure.
Metrics & AuditShare of AI Conversation
Share of AI Conversation measures a brand's portion of AI-generated answers compared to competitors — the Share of Voice equivalent for AI Search.
Metrics & AuditURR Classification (Unique/Root/Rare)
URR Classification categorizes entity attributes into three tiers: UNIQUE (differentiators), ROOT (definitions), and RARE (extras).
Micro-semantics (passage level)
Agent (Action Performer)
Agent (Action Performer): Semantic role (SRL) denoting the action performer in a sentence ('who does it'); essential for unambiguous quotable content.
Micro-semantics (passage level)Atomic Claims (Verifiable)
Atomic Claims are indivisible, verifiable factual statements extracted from text: AI Search more easily cites fragments with Atomic Claims.
Micro-semantics (passage level)Beneficiary (Recipient)
Beneficiary (Recipient): Semantic role (SRL) denoting who receives an action's benefit ('for whom it is'); helps AI personalize responses.
Micro-semantics (passage level)Cost of Retrieval (CoR)
Cost of Retrieval (CoR): cost of extracting information from text fragments. The lower the better (BLUF, tables, lists, facts first).
Micro-semantics (passage level)Entity Salience
Entity Salience measures an entity's semantic prominence in text, based on SRL role and sentence position within content structure.
Micro-semantics (passage level)Fluff (Filler Words)
Fluff (filler words) are sentences with no informational value: generalities and rhetorical questions that lower Information Density.
Micro-semantics (passage level)Information Density
Information Density measures the ratio of citable information to text volume, with higher concrete data per paragraph increasing density.
Micro-semantics (passage level)Information Gain
Information Gain measures how much new, unique information a text fragment contributes compared to what already exists in the search engine's index.
Micro-semantics (passage level)Instrument (Tool)
Instrument is a semantic role (SRL) denoting the tool used to perform an action — 'with what'; specifies context and increases citability.
Micro-semantics (passage level)Location (Place)
Location (Place) is a semantic role denoting where an action takes place. Critical for local SEO and content precision in natural language.
Micro-semantics (passage level)Main Content (MC)
Main Content (MC) is the part of a page that directly helps the page achieve its purpose, as defined by Google's Quality Rater Guidelines.
Micro-semantics (passage level)Passage Embeddings
Passage Embeddings are vectors for individual page fragments, enabling Google to index and rank specific sections rather than whole documents.
Micro-semantics (passage level)Passage Ready (seated ready)
Passage Ready is the state where every sentence in a text fragment is self-contained and ready for AI citation without additional context.
Micro-semantics (passage level)Patient (Action Object)
Patient (Action Object) is the entity that receives or undergoes an action in semantic role labeling; precise identification reduces Cost of Retrieval.
Micro-semantics (passage level)Semantic Role Labels (SRL)
Semantic Role Labels (SRL) assigns semantic roles: Agent (who), Predicate (what), Patient (what receives action), Beneficiary, Instrument, Location.
Micro-semantics (passage level)Supplementary Content
Supplementary Content is additional page content (menus, links, ads, sidebars) that doesn't directly serve the page's main purpose.
RAG (Retrieval Augmented Generation)
Bi-encoder vs Cross-encoder
Bi-encoder vs Cross-encoder represents two distinct neural ranking approaches: fast separate vectors vs slow joint analysis for precision.
RAG (Retrieval Augmented Generation)Chunk Autonomy
Chunk Autonomy is a content optimization principle ensuring every H2 section can be understood independently without context from other sections.
RAG (Retrieval Augmented Generation)Chunking
Chunking splits documents into smaller fragments before vectorization in RAG systems, balancing semantic precision with context preservation.
RAG (Retrieval Augmented Generation)Knowledge Cut-Off
Knowledge Cut-Off is the training data boundary date for an AI model: information after this date requires RAG or web search.
RAG (Retrieval Augmented Generation)RAG (Retrieval Augmented Generation)
RAG (Retrieval Augmented Generation) combines information retrieval with AI text generation to provide fresh context on demand.
RAG (Retrieval Augmented Generation)Reranking
Reranking is the second retrieval stage in RAG where a cross-encoder re-sorts initially retrieved results with greater precision than embeddings alone.
RAG (Retrieval Augmented Generation)Supabase (Postgres + pgvector)
Supabase (Postgres + pgvector) is a database platform that stores and searches embeddings alongside traditional relational data in one service.
Semantic Audit Pipelines
Competitor Gap Analysis
Competitor gap analysis: A stage or pipeline comparing competitor content with your site at the semantic embedding level.
Semantic Audit PipelinesConsolidated Markdown
Consolidated Markdown — a file combining multiple content sources into one consolidated Markdown document serving as input for LLM analysis.
Semantic Audit PipelinesContent Brief
Content Brief — detailed article specification generated from pipeline data, containing goal, keywords, and H2/H3 structure.
Semantic Audit PipelinesContent Format Recommendations
Content Format Recommendations are format suggestions (article, FAQ, list, infographic) based on SERP analysis and query intent.
Semantic Audit PipelinesGraceful Degradation
Graceful Degradation is a pipeline design principle where individual step failures don't halt the entire process—it continues with reduced quality.
Semantic Audit PipelinesHuman in the Loop
Human in the Loop is a design pattern where humans verify and approve key decisions in AI-driven processes, balancing automation with quality control.
Semantic Audit PipelinesJina Reader (tool)
Jina Reader is a Jina AI tool that converts any web page to clean Markdown that strips HTML/CSS noise and produces text ready for AI analysis.
Semantic Audit PipelinesLLM for Reasoning, Python for Computation
LLM for Reasoning, Python for Computation is a principle that divides tasks between language models for reasoning and Python code for computation.
Semantic Audit PipelinesNoise Cleaning
Noise cleaning removes irrelevant data from SEO datasets by filtering out branded queries, duplicates, and low-quality keywords before analysis.
Semantic Audit PipelinesPatternless Frequency (irregular intervals)
Patternless Frequency is a publishing strategy that uses deliberately irregular intervals for content publication to avoid bot-like patterns.
Semantic Audit PipelinesPipeline Persistence
Pipeline Persistence is the practice of saving pipeline step outputs to disk, enabling resuming after failure, debugging, and auditing.
Semantic Audit PipelinesPipeline Resumability
Pipeline resumability enables continuing from the point of interruption: each step saves its output to files for recovery.
Semantic Audit PipelinesPlan-Audit-Improve Cycle
Plan-Audit-Improve Cycle is an iterative semantic audit workflow: plan content → audit existing → improve and fill gaps → plan next.
Semantic Audit PipelinesPublication Strategy
Publication Strategy is a content publishing plan based on Topical Map sequencing: SEED pages (pillars) first, then NODE pages (supporting content).
Semantic Audit PipelinesQuality Report
Quality Report is a validation checkpoint after each pipeline step that catches anomalies and decides whether to proceed or fix issues first.
Semantic Audit PipelinesSemantic Audit Pipeline
Semantic Audit Pipeline is a complete automated process from site crawling to audit report: combining embeddings, clustering, and gap analysis.
Semantic Audit PipelinesSERP Grounding (CONFIRMED/PREDICTED/SERP-ONLY)
SERP Grounding tags sub-queries by their source: CONFIRMED (LLM + SERP), PREDICTED (LLM only), SERP-ONLY (SERP data only) for reliability assessment.
Semantic Clustering
Attribute Types (Main / Derived / Minor)
Attribute Types (Main / Derived / Minor) classify attributes in a Topical Map by priority: Main (primary), Derived (secondary), Minor (peripheral).
Semantic ClusteringCanonical Query
Canonical Query — the main query representing a thematic cluster, equivalent to a 'primary keyword' but selected based on semantics.
Semantic ClusteringCascading Clustering (E-commerce)
Cascading Clustering (E-commerce) — Multi-level clustering for e-commerce where categories, subcategories and products form cascading clusters.
Semantic ClusteringCluster Naming (Central Entity)
Cluster Naming (Central Entity) identifies the Central Entity and Canonical Query for each cluster in the third step of clustering.
Semantic ClusteringCluster Validation (SERP Overlap)
Cluster Validation (SERP Overlap) validates keyword clusters by measuring SERP overlap between canonical queries and sampled cluster keywords.
Semantic ClusteringContent Gap Detection
Content Gap Detection — The final clustering pipeline stage identifying missing topics by comparing site content against the full topical map.
Semantic ClusteringContent Gap Prioritization (P1–P4)
Content Gap Prioritization (P1–P4) ranks content gaps by their impact: P1 (critical, high volume), P2 (important), P3 (supporting), P4 (nice-to-have).
Semantic ClusteringCorrelated Queries (Associations)
Correlated Queries are queries frequently searched by the same users, indicating semantic associations between content clusters on a site.
Semantic ClusteringDBSCAN
DBSCAN is a density-based clustering algorithm that automatically detects cluster count and identifies outliers without specifying k upfront.
Semantic ClusteringHierarchical Clustering
Hierarchical Clustering creates a tree (dendrogram) of cluster relationships—useful when you don't know the optimal number of groups upfront.
Semantic ClusteringK-means
K-means is a clustering algorithm that divides data points into k groups based on distance from centroids—requires specifying cluster count upfront.
Semantic ClusteringKeyword Clustering (Embeddings + K-means)
Keyword Clustering (Embeddings + K-means) — Second stage in clustering pipeline that groups keywords using embeddings and K-means algorithm.
Semantic ClusteringPeople Also Ask (PAA)
People Also Ask (PAA) is a Google SERP feature showing related user questions that provides real user questions for semantic clustering.
Semantic ClusteringQuery Paths (Search Sequences)
Query Paths are search sequences showing typical user query order — they signal the need for internal linking between clusters.
Semantic ClusteringQuery Semantics Google
Query Semantics Google analyzes search query patterns including Query Paths, Correlated Queries, and Sequential Queries for behavioral insights.
Semantic ClusteringRelated Searches
Related Searches is Google's 'Related searches' section at the bottom of SERPs: a source of real user query variations for semantic expansion.
Semantic ClusteringSemantic Clustering
Semantic Clustering groups keywords into thematic clusters using embeddings and K-means algorithms — the foundation of data-driven content strategy.
Semantic ClusteringSequential Queries (Query Series)
Sequential Queries are a series of searches performed in order within a single session, revealing user journey and guiding internal linking.
Semantic ClusteringSERP Coherence
SERP Coherence measures a cluster's internal consistency by comparing SERP results of random keywords from the cluster with the canonical query.
Semantic ClusteringSERP Enrichment
SERP Enrichment augments seed keywords with real Google SERP data (PAA, Related Searches, Refine Chips, Filter Sidebar) before LLM expansion.
Semantic ClusteringSERP Hop
SERP Hop is a keyword expansion technique that extracts Related Searches from Google SERPs and uses them as seeds for the next discovery round.
Semantic ClusteringSERP Intelligence
SERP Intelligence analyzes content formats preferred by Google based on SERP data — tells you WHAT KIND of content to create, not just WHAT TOPIC.
Semantic ClusteringSERP Overlap
SERP Overlap measures the percentage of shared results in the top 10 Google results for two queries—overlap above 50% indicates clusters should merge.
Semantic ClusteringSilhouette Score (Selecting k)
Silhouette Score is a clustering quality metric (scale -1 to 1) that automatically determines the optimal number of clusters k.
Semantic ClusteringToken Insertion
Token Insertion is a keyword expansion technique that inserts additional words or phrases into a seed phrase to generate long-tail variants.
Semantic ClusteringTopical Mapping (CORE/OUTER)
Topical Mapping (CORE/OUTER) assigns clusters to CORE (primary) or OUTER (supporting) sections based on attribute classification.
Special Strategies
Context Engineering
Context Engineering is the art of designing in-depth context for AI agents: covering config files, skills, structured data, and memory.
Special StrategiesDomain Classification by Google
Domain Classification by Google assigns domains to topical categories, influencing ranking potential for specific query types and authority assessment.
Special StrategiesLocal SEO (law firms)
Local SEO for law firms applies semantic SEO principles to local search optimization — Google My Business, NAP consistency, and client reviews.
Special StrategiesStructure.md (site map for agent)
Structure.md (site map for agent) — Descriptive file with site structure tree created by agent based on sitemap for informed linking.
Special StrategiesYMYL (finance)
YMYL (finance): 'Your Money Your Life' content category covering finance, health, law; Google applies elevated E-E-A-T criteria and verification.
Theoretical Foundations
Attribute (Atrybut)
Attribute is a property describing an entity in EAV: e.g., for entity 'coffee' the attribute is 'brewing method', value 'espresso' or 'drip'.
Theoretical FoundationsBM25 (Saturation and Length)
BM25 (Saturation and Length) is an advanced ranking algorithm accounting for term saturation and length penalties in document scoring.
Theoretical FoundationsCentral Entity (CE)
Central Entity (CE) is the main entity (central theme) of a website around which the entire attribute structure is built.
Theoretical FoundationsCentral Search Intent (CSI)
Central Search Intent (CSI) is the fundamental goal driving a user's search query—what the user truly wants to accomplish.
Theoretical FoundationsCORE Section (CSI Core)
CORE Section is the main part of a Topical Map, containing topics that represent the primary entity attributes related to Central Search Intent.
Theoretical FoundationsCrawl-Index-Rank (Three-Stage Process)
Crawl-Index-Rank is Google's three-stage process: crawling pages (collection), indexing (cataloging), and ranking by quality.
Theoretical FoundationsDepth (Content Depth)
Depth (Content Depth) measures how thoroughly each aspect of a topic is covered; one of the three pillars of Topical Authority.
Theoretical FoundationsEAV Model
EAV Model is a framework describing any topic as Entity-Attribute-Value triples, forming the foundation of semantic SEO content strategy.
Theoretical FoundationsEntity
Entity is a distinct thematic object (person, thing, concept) that search engines recognize as an independent unit of knowledge with attributes.
Theoretical FoundationsHistorical Data (User Signals)
Historical Data refers to user behavior metrics (CTR, time on page, returns) that Google uses to evaluate content quality and determine rankings.
Theoretical FoundationsHybrid Retrieval
Hybrid Retrieval combines lexical retrieval (BM25, word matching) with semantic retrieval (embeddings, meaning understanding) used by Google.
Theoretical FoundationsInformation Retrieval
Information Retrieval is the study of finding relevant information in large document collections, forming the foundation of search engines and AI Search.
Theoretical FoundationsInverted Index
Inverted Index is a data structure that maps each word to documents containing it — the foundation enabling fast search retrieval.
Theoretical FoundationsLexical Retrieval
Lexical Retrieval is a search method based on exact word and phrase matching (BM25, TF-IDF) — fast and cheap but can't understand synonyms.
Theoretical FoundationsMomentum (Publishing Pace)
Momentum (Publishing Pace): The pace and regularity of new content publication. Google rewards sites that actively expand topical coverage.
Theoretical FoundationsMulti-Entity
Multi-Entity is a situation where a website processes multiple related entities, each requiring separate EAV analysis and attribute coverage.
Theoretical FoundationsMUM (Multimodal)
MUM (Multimodal) is Google's Multitask Unified Model that processes text, images, and video content while supporting 75 languages.
Theoretical FoundationsNeural Matching (2018)
Neural Matching (2018) uses neural networks to match search queries with relevant pages based on conceptual meaning, not just keyword matching.
Theoretical FoundationsNLP (Natural Language Processing)
NLP (Natural Language Processing) is the AI field that enables machines to understand human language — the foundation of search engines and AI Search.
Theoretical FoundationsOUTER Section (Supporting Topics)
OUTER Section (Supporting Topics) — Part of Topical Map containing supporting topics indirectly related to Central Search Intent.
Theoretical FoundationsPassage Ranking
Passage Ranking is Google's mechanism that identifies and ranks specific page fragments independently from the overall document.
Theoretical FoundationsPopularity
Popularity measures search demand for attributes: High Popularity indicates an attribute warrants dedicated content that responds to real user needs.
Theoretical FoundationsPredicate (CSI Action Verb)
Predicate is the action verb that defines what action users want to perform with an entity — E.g., in 'how to brew coffee' the predicate is 'brew'.
Theoretical FoundationsProminence
Prominence is a metric that measures how often an attribute appears in top search results for queries related to a Central Entity.
Theoretical FoundationsRankBrain (2015)
RankBrain (2015) is Google's first machine learning system that interprets new queries by matching them to similar past searches.
Theoretical FoundationsRARE (Additional Value)
RARE (Additional Value) — Additional attribute (RARE) in EAV model — gives competitive advantage because most competitors skip it.
Theoretical FoundationsRelevance
Relevance is the primary criterion in attribute filtering that measures how strongly an attribute relates to the Central Search Intent in semantic SEO.
Theoretical FoundationsROOT (Entity Definition)
ROOT (Entity Definition) — An essential attribute type in the EAV model that defines core characteristics without which an entity loses meaning.
Theoretical FoundationsRPP Attribute Filtering
RPP Attribute Filtering selects entity attributes using three criteria: Relevance, Prominence, and Popularity—only those meeting all qualify.
Theoretical FoundationsSemantic Retrieval
Semantic Retrieval is a search method based on embeddings and meaning understanding — it matches documents based on concepts.
Theoretical FoundationsSemantic SEO
Semantic SEO optimizes websites by assigning meaning to entire site structure rather than targeting individual keywords.
Theoretical FoundationsSource Context (SC)
Source Context (SC) is the perspective from which a website presents its topic: influences classification of attributes as Main.
Theoretical FoundationsTF-IDF (Term Frequency)
TF-IDF (Term Frequency) measures word importance in a document: the more frequent in that document (TF) and rarer in the entire collection (IDF).
Theoretical FoundationsTopical Authority
Topical Authority measures a website's thematic expertise: Google's evaluation of whether a site fullly covers a topic (breadth + depth + pace).
Theoretical FoundationsTopical Coverage
Topical Coverage is how detailedly a website covers a topic: measured by breadth (Vastness), depth (Depth), and publishing pace (Momentum).
Theoretical FoundationsTopical Map
Topical Map: A content organization structure divided into CORE sections (main topics) and OUTER sections (supporting topics).
Theoretical FoundationsUNIQUE (Entity Differentiator)
UNIQUE (Entity Differentiator): An attribute that distinguishes an entity from similar ones, making it unique and recognizable.
Theoretical FoundationsValue
Value (Value) is the specific data assigned to an entity's attribute in the EAV model—e.g., 'brewing temperature' attribute has value '93°C'.
Theoretical FoundationsVastness (Coverage Breadth)
Vastness (Coverage Breadth): How many different aspects (attributes) of a topic are covered; one of three Topical Coverage pillars.
Theoretical FoundationsVocabulary Mismatch
Vocabulary Mismatch: problem where users and authors use different words for the same concept — a limitation of lexical retrieval.
Theoretical FoundationsWeb Entity
Web Entity is an entity existing in internet space: a topic or brand representation recognizable by Google and AI Search engines.
Tools & Environment
Agent Swarm
Agent Swarm is an AI architecture that coordinates multiple specialized agents through a central orchestrator instead of one multi-purpose agent.
Tools & EnvironmentBright Data Site Unlocker
Bright Data Site Unlocker is a proxy infrastructure service that enables web scraping by bypassing anti-bot protection to automate data collection.
Tools & EnvironmentCohere (reranking)
Cohere (reranking): AI platform offering reranking models (Cohere Rerank), specialized cross-encoder improving search result accuracy.
Tools & EnvironmentCypher (query language)
Cypher is Neo4j's graph database query language — the SQL equivalent for knowledge graphs, enabling relationship analysis and node centrality metrics.
Tools & EnvironmentDify (linear automation)
Dify is a no-code platform for building linear AI pipelines with visual drag-and-drop interface — Simpler than agent swarms for repeatable workflows.
Tools & EnvironmentEdge Functions (Supabase)
Edge Functions (Supabase) are serverless functions running on the network edge in Supabase: enabling embedding processing and RAG logic.
Tools & EnvironmentFlashRank
FlashRank is a lightweight, fast reranking model that runs locally — an alternative to Cohere Rerank when speed and API independence are priorities.
Tools & EnvironmentLLM Temperature
LLM Temperature controls response randomness: Temperature 0 produces deterministic results, higher values increase creativity and unpredictability.
Tools & EnvironmentMCP (tool integration)
MCP (Model Context Protocol) is an open protocol that enables AI agents to connect with external tools such as databases, APIs, and crawlers.
Tools & EnvironmentOpenAI Assistant API
OpenAI Assistant API is an interface for building AI assistants with built-in RAG, code interpreter, and tools—an alternative to custom RAG pipelines.
Tools & EnvironmentOpenRouter (multi-model routing)
OpenRouter (multi-model routing) is a routing platform that provides access to multiple AI models (GPT-4, Claude, Gemini, Llama) through one API.
Tools & EnvironmentOrchestrator
Orchestrator is the central component in agent swarm architecture that coordinates specialized agents, allocates tasks, and handles errors.
Tools & Environmentpgvector
pgvector: PostgreSQL extension adding vector support and nearest neighbors search, used in Supabase as embedding database for RAG and analysis.
Tools & EnvironmentQdrant (vector specialist)
Qdrant is a specialized vector database designed exclusively for storing and searching embeddings with optimized vector operations.
Tools & EnvironmentSemantic Crawling
Semantic Crawling is a web crawling technique that analyzes content meaning and context rather than just HTML structure for deeper insights.