What a ranking factor means in AI search
A ranking factor in AI search is a signal the citation engine uses to decide which sources to lift into the synthesized answer. Unlike classic Google SERP (one ranking system, one index), AI search runs on four engines with different architectures: ChatGPT browses Bing's index, Perplexity blends a custom crawler with Brave and Sonar, Claude uses live web fetch with no persistent index, and Gemini sits on Google's index with a multi-stage retriever. Across all four, four signal families dominate:
- Index foundation.Is the page in the engine's source corpus at all?
- Freshness. dateModified discipline and content cadence relative to query recency.
- Entity authority. sameAs ledger, schema-typed Person and Organization, AggregateRating with backing reviews.
- Answer-shape match. Definition-first capsule, FAQPage extractability, HowTo step parsing, ItemList ranking discipline.
A page can rank #1 on Google and never appear in a ChatGPT citation if it is not in the Bing index or has a malformed JSON-LD graph. AI search ranking is not classic SEO with a re-skinned re-ranker. It is a different architecture.
This guide maps the 4 engines, the 4 signal families, the 10 most common ranking myths to retire, and a 15-point audit checklist that covers every signal family. The companion AEO strategy guide covers measurement, the 30-day audit sprint, and per-LLM calibration. The companion structured data spoke covers the copy-pasteable JSON-LD that satisfies the answer-shape family.
The 4 engines: per-pipeline architecture
Every ranking factor that follows is filtered through the engine architecture below. Optimizing without knowing which pipeline you are optimizing for produces noise. Per-engine wins look like per-engine signals; cross-engine wins look like cross-pipeline invariants.
ChatGPT (browse + search).Uses Bing's production index as its source corpus. The browse tool issues queries to Bing, retrieves a top-N candidate set, then re-ranks with an internal model that weighs JSON-LD validity, dateModified, and answer-shape match. Pages absent from the Bing index are invisible regardless of Google ranking. IndexNow pings from the site directly populate Bing within minutes, which is why IndexNow-equipped sites surface in ChatGPT citations 6 to 12 hours after publishing where non-IndexNow sites take 48 to 96 hours.
Perplexity (hybrid index + Sonar). Runs a custom web crawler that feeds a proprietary index, blends results from the Brave Search API for breadth, and uses the Sonar model on the indexed-search layer. The re-ranker weighs citation lineage heavily; pages with sameAs entity wiring and verifiable claims (Dataset schema with a declared license, AggregateRating with backing Review nodes) rank higher. The Perplexity citation transparency layer reads Dataset.license directly, which is why CC-BY-licensed datasets cite at materially higher rates than unlicensed ones.
Claude (live web fetch).Does not maintain a persistent web index. The web fetch tool issues HTTP requests at query time against URLs surfaced by the model's internal knowledge graph or by sibling tools. Because there is no background re-crawl, dateModified on the live page is the dominant freshness signal. A clean Article graph with dateModified inside the last 90 days surfaces preferentially on time-sensitive analytical queries.
Gemini (Google index + multi-stage retriever). AI Overviews and Gemini chat both sit on Google's production index with a multi-stage retriever (BM25 retrieval to dense vector re-ranking to schema-aware extraction, per Google patent US-11769017). Backlinks dominate retrieval; schema dominates extraction. A page that ranks #3 organically can win the Overview citation against the page ranked #1 if its schema graph is materially more complete, because the Overview is generated by a separate Gemini-class model conditioned on the top-N retrieval and weighted toward extractable answer blocks.
The cross-engine invariant: every pipeline weighs the 4 signal families. The relative weighting differs, the families do not.
Signal family 1: Index foundation
Before any ranking factor matters, the page has to be in the engine's source corpus. Index foundation is the most frequently overlooked AEO precondition because most SEO teams treat "indexed on Google" as the universal proxy. It is not.
Bing index for ChatGPT. Verify with site:yourdomain.comin Bing. Expected page count should match Google's within 20 percent. If the gap is wider, investigate. Common Bing index gaps include: no XML sitemap submitted to Bing Webmaster Tools, IndexNow not configured (Bing weights IndexNow pings heavily for freshness), robots.txt blocking bingbot specifically, and slow server response times on first crawl.
Google index for Gemini and AI Overviews. Verify with Google Search Console URL Inspection on a representative page from each cluster. URL Inspection reports the indexed version (which may differ from the deployed version if the page has rendering issues or a stale cache). AI Overviews use the indexed version, not the live version, so a Vercel-side CDN transform that strips JSON-LD breaks Overview eligibility even if the deployed page is correct.
Brave + Perplexity custom index. Brave honors standard SEO conventions (sitemap, robots, canonical). The Perplexity custom crawler weights backlinks from technical communities (Hacker News, niche subreddits, GitHub README files) heavily for crawl priority. A page with a Hacker News front-page link or a GitHub repo with a stars-positive trend tends to enter the Perplexity index within hours.
Claude live fetch. No index to seed. The dependency is canonical URL stability. Claude follows internal links surfaced by the model and by tool calls. A page that changes URL on every deploy or that has a broken canonical chain gets fetched once and never re-cited.
Distribution levers that drive index inclusion: high-DR backlinks (Bing weights backlink graph for crawl priority more than Google does), X/Twitter velocity (Bing crawls high-velocity threads as freshness signals), Reddit thread surface (the Perplexity custom crawler treats Reddit as a curated source), and GitHub README inclusion (especially for technical tools and dev infrastructure).
Signal family 2: Freshness
Freshness is the most-misunderstood signal because each engine measures it differently. The shared rule: dateModified hygiene is mandatory; content cadence on the cluster matters; brand-new posts on stale clusters cite less than refreshed posts on active ones.
dateModified discipline. Bump dateModified whenever the content meaningfully changes. Add a new section, replace stale data, expand a table, swap an outdated screenshot. Do not bump on a typo fix or a cosmetic tweak. LLM crawlers cross-check modification frequency against content delta and de-rank pages that game the signal. Claude in particular discounts pages with a recent dateModified but unchanged content body, because the live fetch comparison is cheap.
Content cadence on the cluster. A guide cluster with three posts updated in the last 30 days cites preferentially over a cluster with one post updated and two posts untouched for a year. Bing and Perplexity both weight cluster freshness as a topical-authority proxy. The implication for blog cadence: a weekly refresh of an existing post does more for cluster authority than a new post on an unrelated topic.
The evergreen-rewrite multiplier. Refreshing an existing high-authority page (replacing the 2025 examples with 2026 examples, expanding the FAQPage block, adding a new HowTo step) lifts citation rate at roughly 3x the rate of publishing a net-new post on the same topic. The Ahrefs 2025 study and the FORKOFF audit ledger both confirm this pattern across cluster types. The mechanism: existing pages already carry backlinks, internal links, and entity authority; a refresh stacks freshness on top of those signals.
IndexNow as the freshness accelerator. IndexNow pings inform Bing and (transitively) ChatGPT of changes within minutes. A site with IndexNow configured surfaces in ChatGPT citations 6 to 12 hours after publish where a site without takes 48 to 96 hours. Configuration is one configuration file plus a webhook on every deploy. For the FORKOFF Website repo the IndexNow ping fires automatically on every Vercel deploy via the post-build hook.
Signal family 3: Entity authority
Entity authority is the AI-search equivalent of E-E-A-T in classic SEO. The engine answers two questions about every cited source: is the author a real, identifiable expert, and is the publisher a real, identifiable organization. The schema graph is how you answer those questions in machine-readable form.
sameAs ledger on Person and Organization. sameAs is the entity-disambiguation signal that links the on-site author/publisher node to off-site verifiable profiles. For Person: LinkedIn, X, GitHub, Crunchbase, Wellfound, conference speaker pages, podcast appearances, Featured.com expert profile. For Organization: LinkedIn company page, Crunchbase, Wellfound, GitHub org, official social handles, accelerator portfolio listings. The richer the sameAs array, the higher the entity authority signal.
Person schema with jobTitle and worksFor. Bare Person nodes with just a name do less than Person nodes with jobTitle, worksFor (linked to Organization @id), and knowsAbout (an array of topics). knowsAbout is the topical-expertise signal; it tells the engine which queries this expert is a credible source for.
AggregateRating with backing Review nodes. AggregateRating without backing Review entities reads as a thin claim. AggregateRating with linked Review nodes (each with author, datePublished, and reviewBody) is verifiable and weights materially higher. The same logic applies to client testimonials: schema-typed Review with author Person sameAs back to LinkedIn beats a bare quote in HTML by a wide margin.
The reviewedBy add for Article. Article schema supports a reviewedBy field pointing at a second Person @id. Distinct from author. The reviewedBy signal is the strongest single E-E-A-T lift available because it asserts a second verifiable human checked the content. On FORKOFF guides every spoke ships with author=cofounder and reviewedBy=founder for exactly this reason.
Cross-property entity consistency. If the Organization node on forkoff.xyz declares sameAs to a LinkedIn page, and the LinkedIn page declares the company website as forkoff.xyz, the engine resolves the entity once and cites consistently. If the LinkedIn page declares a different URL or the X handle does not back-link to the website, the engine may split the entity across two nodes and cite the wrong one.
Signal family 4: Answer-shape match
Answer-shape match is the most directly actionable signal family because it is entirely under the operator's control. Every engine's synthesis step assembles a typed answer; pages that ship the answer pre-shaped get lifted verbatim.
Definition-first answer capsules. The first 40 to 180 words after every H2 should answer the H2 question directly, in declarative form, with the canonical definition or number up front. The Princeton Generative Engine Optimization paper (arXiv 2024) measured a 30 to 40 percent citation lift across all engines when sources used quotation marks, statistics, and a definition-first lead versus a meandering intro. The FORKOFF V32 preflight gate enforces this on every page.
FAQPage schema with 5+ entries. The single highest-yield answer-shape add. Every Question + acceptedAnswer pair becomes a citation-ready snippet that LLMs lift verbatim into answers. The companion structured data spoke ships the copy-pasteable JSON-LD. FAQPage with fewer than 5 entries gets dropped from the FORKOFF V29 preflight gate; FAQPage with thin answers (1-line scrapes of the H1) cites poorly. Substantive 40 to 120 word answers per question is the floor.
HowTo schema for procedural content.When the user's query starts with "how do I" or "how to", the engine prefers HowTo-typed answers over Article prose on the same topic. Each HowToStep needs a position number, a name, and a text body. Optional but recommended: image (for rich result eligibility) and totalTime (for time-filter queries).
ItemList for ranked or comparative content. Vendor comparisons, top-N rankings, ranked playbooks. ItemList tells the engine the order is editorial, not arbitrary, which is critical when the LLM is composing a vendor-list answer from your page. Positions must be 1..N, unique, sequential.
Dataset for first-party data. Underused by 90+ percent of marketing sites, which is why a single well-marked Dataset page can out-cite an entire competitor blog on the same topic. The license field matters: Google AI Overviews and Perplexity both weight CC-BY-licensed datasets higher than unlicensed ones. The distribution field with a downloadable DataDownload signals the claim is verifiable.
Distribution signals that lift AI citation
The 4 signal families cover the page-level levers. Distribution signals cover the off-page levers that influence index inclusion and re-ranker confidence. Each distribution channel maps to a specific engine pipeline.
- High-DR backlinks → Bing index priority. Bing weighs backlink graph for crawl priority more than Google does. A new page with 5 backlinks from DR-70+ domains enters the Bing index within hours instead of days, which compresses the ChatGPT citation lag accordingly.
- X/Twitter velocity → Bing freshness. Bing crawls high-velocity X threads and surface-ranks pages linked from them in the freshness layer. The mechanism: Bing treats X engagement as an editorial-curation signal. A thread with 100+ replies linking back to a guide measurably accelerates Bing index inclusion and ChatGPT citation.
- Reddit thread surface → Perplexity custom crawler. The Perplexity custom crawler treats Reddit as a curated source. A page linked from a high-upvote thread on a niche subreddit (r/SEO, r/marketing, r/SaaS, r/startups) enters the Perplexity index quickly and ranks well on related queries.
- GitHub README inclusion → Perplexity + Claude. Especially for technical tools and dev infrastructure. README files are crawled by Perplexity and surfaced by Claude through its developer-tooling knowledge graph. A tool linked from a README with 500+ stars is materially more likely to be cited than a tool only linked from marketing pages.
- Hacker News front page → multi-engine spike. A front-page HN link drives a 24 to 72 hour citation spike across Bing, Perplexity, and Google. The mechanism is partly backlink graph and partly the editorial-curation signal that HN front-page implies. The spike decays unless the page captures meaningful backlinks during the window.
The pattern across distribution channels: an editorial-curation signal from a community the engine treats as authoritative. Paid placement, link farms, and engagement pods produce the opposite of the desired effect; modern re-rankers detect and discount them.
10 ranking myths to retire in 2026
Most of the AEO advice circulating on LinkedIn and on agency blogs is wrong about something material. The 10 below are the most common, the most consequential, and the most likely to waste a quarter of operator time.
- Myth: llms.txt is a ranking factor. Reality: llms.txt is an agent-readiness manifest. Useful for agentic crawlers and downstream tool catalogs. None of the four major AI search engines treat it as a ranking signal as of June 2026.
- Myth: Higher word count means better citation. Reality: The ALMcorp 325K-prompt study found the citation sweet spot is 800 to 1500 words. 3000+ word essays cite less because the answer is buried. The companion guide pattern at FORKOFF targets 2500 to 4000 because guides serve a buyer-research workflow distinct from citation, but for blog posts written purely for AI citation, 800 to 1500 wins.
- Myth: AI-generated content gets penalized. Reality: Google, Bing, and Perplexity all state content quality is judged independent of authorship. Low-quality AI content loses citations because it is low quality. High-quality AI-assisted content with verifiable claims, complete schema, and first-party data cites at the same rate as human-written.
- Myth: Exact-match domains help AI citation. Reality: AI engines re-rank on content quality and schema; domain-level signals matter much less than for classic SERP. EMD lift on AI surfaces is statistically indistinguishable from zero.
- Myth: Keyword density is a ranking factor. Reality: None of the four engines have keyword-density scoring. Semantic embedding handles topical match. Writing for keyword density actively degrades answer-shape match.
- Myth: Link velocity hacks accelerate citations. Reality: Bing's web spam team flags velocity anomalies; ChatGPT inherits the demotion. Sustained, organic backlink growth from editorial sources beats velocity hacks by a wide margin on every engine.
- Myth: More schema is better schema. Reality: Bad schema (duplicate FAQPage, malformed Dataset, ItemList with non-sequential positions) actively de-ranks. Schema quality dominates schema quantity. Validate every graph in both the Schema Markup Validator and Google Rich Results Test.
- Myth: AI Overviews use the same ranking as classic SERP. Reality: The multi-stage retriever uses different feature weights. Backlinks dominate retrieval; schema dominates extraction. A page can rank #1 organically and never appear in the Overview if its schema graph is thin.
- Myth: Engagement pods boost AI citation. Reality: False, and a penalty risk. LinkedIn pod detection runs at roughly 97 percent accuracy as of 2026. Even setting that aside, none of the four AI engines use LinkedIn engagement as a ranking input; pods produce zero citation lift even when undetected.
- Myth: Parasite SEO on Medium / LinkedIn Pulse still works. Reality: Dead. Google's Site Reputation Abuse policy (March 2024 onward, fully enforced by 2026-Q1) took LinkedIn Pulse from 25.8M to 3.9M monthly visits (a drop of 85 percent), with 92 percent of indexed Pulse pages de-indexed. Medium and Pulse no longer rank, and the AI engines downstream of those indexes no longer surface them.
The pattern across all 10 myths: they treat AI search as a re-skinned version of classic SEO with new buzzwords. AI search is a different architecture with different feature weights. Optimizing as if it were SEO 2018 produces the operator equivalent of doing aerobics in a swim meet.
15-point audit checklist
The checklist below maps each of the 4 signal families to concrete verification steps. Score is binary per item; total of 15 means the page is shipped against every known lever. Most guides score 6 to 9 on first audit. Most reach 13 to 15 within two refresh cycles.
Index foundation (4 checks)
site:yourdomain.com/path/to/pagein Bing returns the page within the first page of results.- Google Search Console URL Inspection reports the page as indexed and the indexed version matches the deployed version.
- IndexNow ping fires on every deploy (verifiable in Bing Webmaster Tools URL submission log).
- Canonical URL is stable and the canonical tag matches the self-URL.
Freshness (3 checks)
- dateModified on the page is within 90 days, or content is evergreen by design with a publish-date older than 1 year and no time-sensitive claims.
- The cluster (sibling pages) has at least 3 posts updated in the last 60 days.
- Any numeric claim in the page is dated (e.g. "as of June 2026") so the engine can match it to query recency.
Entity authority (4 checks)
- Article author is a Person node with @id, sameAs to LinkedIn + X (minimum), and jobTitle.
- Article reviewedBy is a second Person @id distinct from author.
- Organization node has sameAs to at least 5 verifiable off-site profiles (LinkedIn, Crunchbase, GitHub, X, Wellfound).
- AggregateRating, if rendered, is backed by at least 3 typed Review nodes with author and datePublished.
Answer-shape match (4 checks)
- Every H2 has a definition-first answer capsule of 40 to 180 words.
- FAQPage schema with at least 5 substantive Q-and-A entries.
- For procedural content: HowTo schema with at least 3 typed HowToStep nodes, each with position, name, and text.
- For ranked or comparative content: ItemList with unique, sequential position values.
To run this audit automatically against any URL, the FORKOFF AEO Checker scores each item and returns the patch list. The widget below is the production tool; it scans the URL you submit and returns the per-signal-family scorecard.
Where to go deeper inside FORKOFF
This guide is the technical reference. The strategic, schema, and engine-specific layers live on the adjacent pages below.
- The AEO strategy guide covers measurement, the 30-day audit sprint, and per-LLM calibration. Pair with this guide for the full stack.
- Structured data for AI search ships the copy-pasteable JSON-LD for Article, FAQPage, HowTo, Dataset, and ItemList.
- The ChatGPT citation guide covers the ChatGPT-specific browsing-tool behavior and the Bing index foundation in depth.
- AEO vs SEO: the differences explained covers the definitional and historical context for buyers evaluating AEO as a service line.
- AEO vs GEO: the difference explained disambiguates answer engine optimization from generative engine optimization by surface, tactic, and measurement.
- /guides hub lists every spoke in the AEO cluster.
- The agent-ready site audit covers the broader site-readiness checklist (llms.txt, sameAs ledger, entity graph).
- The 2026 AEO playbook covers the operational sprint shape and the founder-facing decisions.
- How to get cited by ChatGPT in 2026 covers the ChatGPT-specific browsing tool and recency signals.
- Best AI visibility tools vs FORKOFF methodology covers the tool-stack comparison for buyers evaluating vendors.
- How AI Overviews rank brands covers the Overviews-specific selection signals and the schema role inside them.
For the operator engagement see /services/answer-engine-optimization, /services/answer-engine-optimization, or the Perplexity-first lane where Dataset and ItemList schema drive disproportionate citation lift.
About the numbers in this guide
The per-engine ranking factor map (4 engines, 4 signal families) draws on three independent source layers:
- Vendor documentation. Bing Webmaster Guidelines, OpenAI ChatGPT Search system card, Perplexity Engineering blog (citation transparency layer), Anthropic Claude documentation (web fetch tool behavior), Google Search Central AI Features documentation, and Google patent US-11769017 on multi-stage retrieval.
- Independent quantitative studies. Princeton Generative Engine Optimization paper (arXiv:2405.20708, 2024) on per-engine sensitivity to citations, quotation marks, and statistics. Ahrefs 2025 AI Overviews ranking study (Patrick Stox) on the backlink correlation with Overview inclusion. ALMcorp 325K-prompt AI citation study (2026-Q1) on word-count sweet spot and freshness signals. Growtika LinkedIn Pulse traffic-loss study (2026-Q1) on the Site Reputation Abuse downstream effect.
- FORKOFF first-party data. The 2 to 3x FAQPage citation lift, the 6 to 12 hour IndexNow surfacing window, and the 15-point audit-checklist baseline come from the FORKOFF verified proof across 28 client engagements over the 12 months ending May 2026. Each engagement tracks per-LLM citation rate on a fixed query bank weekly. The aggregate numbers are operator observation across that proof, not a peer-reviewed study.
Reproducibility notes: the lift figures are sensitive to baseline schema completeness on the site, domain authority, and query-bank composition. Treat the figures as ranges, not point estimates. The relative ordering of signal-family importance (index foundation, then freshness, then entity authority, then answer-shape match) has been stable across the 12-month observation window; the absolute magnitudes vary.
Vendor-architecture details (Bing-as-index for ChatGPT, Perplexity custom crawler plus Brave plus Sonar, Claude live fetch, Gemini multi-stage retriever) are sourced from public vendor documentation and the named patent. Where the vendor does not publish a specific weighting magnitude, the guide flags the claim as inferred from observed behavior rather than asserting a precise figure.





