Methodology · BETA

How the analysis works.

Every Brandioz score is the direct sum of measurable signals extracted from your site's HTML. No black boxes. No ML guesswork. Every point is traceable to a specific signal.

Score components

Training sites

0.0%

PCA variance

Coherence rules

Plain EnglishWe check 11 things about your site — like whether it has a clear title, enough words, and whether an AI reading it progressively understands it better. Those signals feed directly into a score out of 100. A separate PCA model then identifies your site's dominant weakness across 4 dimensions. Hover any underlined term for a plain-English definition.

Score Architecture

Component weights

Max points per dimension — total 100

Plain EnglishStructure (how clearly the site identifies itself) carries the most weight at 33 pts. Content Depth and Semantic Quality are next at 22 pts each. All weights are fixed — the same formula runs for every site.

Structure33 pts

Content Depth22 pts

Semantic Quality22 pts

Value Proposition18 pts

Hierarchy5 pts

Signal sub-weights

Contribution of each signal to total score

TitleStructure

10pt

DescriptionStructure

10pt

Hero SectionStructure

8pt

HeadingsStructure

5pt

Word CountContent Depth

22pt

InformativenessSemantic Quality

12pt

ReadabilitySemantic Quality

4pt

Topic DiversitySemantic Quality

6pt

Value PropValue Proposition

18pt

HierarchyHierarchy

5pt

Scoring Pipeline

The heuristic scorer computes every point directly from HTML signals — title confidence, word count, semantic density, value proposition patterns, and understanding curve scores. No model, no approximation. The formula is deterministic and fully auditable.

Plain EnglishThink of it as a structured checklist run against your HTML. Each item has a fixed point value. The total is your score. A separate coherence engine then checks whether the score makes sense — if signals contradict each other, it caps the score and tells you why.

Pipeline

1. Render page → extract signals

2. Run heuristic formula → raw score

3. Apply coherence rules → final score

4. PCA transform → dominant weakness

Score components

structure to hierarchy

Coherence rules

internal consistency

PCA components

71.3% variance

Extraction caps

high / medium / low / failed

Signal importance in PCA

Which signals drive the most variance across sites

Plain EnglishThis shows which signals explain the most variation across the 71 training sites. deep_score_norm, scroll_score_norm, and imm_score_norm together account for nearly half of all PCA-explained variance.

deep_score_norm

18.4%

scroll_score_norm

16.2%

imm_score_norm

15.1%

hero_confidence

13.8%

value_prop_confidence

10.5%

title_confidence

9.2%

heading_confidence

6.7%

high_info_ratio

4.8%

semantic_density_norm

3.2%

word_count_norm

1.6%

description_confidence

0.5%

Understanding curve

Identity signals

Structural signals

Understanding curve scores (deep/scroll/imm) account for 49.7% of PCA variance. Sites that improve progressive clarity gain the most across all 4 dimensions.

Content depth curve

Points earned vs word count — climbs fast at first, then slows down

Plain EnglishShort pages are penalised heavily. A 300-word page earns 11/22 pts. You need 1,000 words to reach the max 22 pts. After that, adding more words doesn't help — the curve flattens. Note: only paragraph words count — schema and FAQ JSON-LD are excluded.

PCA Latent Dimensions

Plain EnglishPCA takes the 11 raw signals and finds 4 combined dimensions that best explain why sites differ from each other. Your score on each tells you which fundamental problem to fix first. The dimension with your lowest score is your dominant weakness.

PC037.9%

understanding_depth

Does the site get clearer the more you read?

How deeply an AI can understand the site across progressive reading depths.

↑ deep_score_norm, scroll_score_norm, imm_score_norm

PC113.7%

signal_imbalance

Are the site's strongest signals balanced or lopsided?

Tension between title clarity and hero/value-prop quality. Negative = hero and VP outweigh title.

↑ title_confidence, hero_confidence, value_prop_confidence

PC210.2%

density_vs_description

Is the site dense with facts, or clear in its descriptions?

Trade-off between information density and descriptive meta content.

↑ semantic_density_norm, description_confidence, word_count_norm

PC39.4%

value_prop_speed

How fast does an AI grasp what the site offers?

How quickly value proposition signals appear relative to understanding curve scores.

↑ value_prop_confidence, high_info_ratio, title_confidence

4 components → 71.3% variance explained

remaining 28.7% = site-specific noise

understanding_depth 37.9%

signal_imbalance 13.7%

density_vs_description 10.2%

value_prop_speed 9.4%

Scree plot — how much each component adds

We chose 4 components because the 5th barely adds anything (7.1%). The elbow is at 4.

Plain EnglishEach bar shows how much more variation the next component explains. The jump from bar 4 to bar 5 is small — that's the elbow where adding more components stops being meaningful.

Feature loadings matrix

How much each signal pushes each dimension up (+) or down (−)

Plain EnglishA loading close to +1 means the signal strongly drives that dimension up. Close to −1 = drives it down. Near 0 = little effect. Example: deep_score_norm has +0.460 on PC0 — it's the biggest driver of understanding_depth.

Signal	PC0understanding_depth	PC1signal_imbalance	PC2density_vs_description	PC3value_prop_speed
deep_score_norm	+0.460	-0.180	-0.140	-0.294
scroll_score_norm	+0.441	-0.190	-0.160	-0.270
imm_score_norm	+0.400	-0.200	-0.170	-0.294
hero_confidence	+0.338	-0.464	+0.120	+0.100
heading_confidence	+0.313	+0.200	-0.080	+0.150
title_confidence	+0.200	+0.528	-0.246	+0.343
value_prop_confidence	+0.180	-0.430	+0.157	+0.639
high_info_ratio	+0.150	+0.120	+0.100	+0.380
word_count_norm	+0.100	+0.200	-0.398	-0.150
semantic_density_norm	+0.090	+0.300	+0.625	+0.100
description_confidence	+0.080	+0.150	-0.567	+0.298

Understanding curve scores (imm/scroll/deep) dominate PC0 with loadings > 0.40 — the strongest PCA dimension is driven entirely by progressive AI comprehension, not metadata.

Site Landscape

Plain EnglishThis maps training sites by their two biggest PCA dimensions. Sites to the right have deep AI understanding across all reading depths. Sites higher up have better signal balance. Best sites are top-right.

High visibilityscore > 70

Mid visibilityscore 50–70

Low visibilityscore < 50

SiteScoreWeakness

Stripe85value prop speed

Basecamp83value prop speed

Linear82value prop speed

Sentry80signal imbalance

Mailchimp78signal imbalance

Plaid76density vs description

Brex76understanding depth

Zendesk72signal imbalance

Segment68understanding depth

Loom65density vs description

Framer58signal imbalance

Mistral53signal imbalance

Perplexity44understanding depth

Miro39understanding depth

Airbnb31understanding depth

PC0 vs PC1 — training site positions

understanding_depth (x-axis) vs signal_imbalance (y-axis)

Low PC0 = AI struggles even after full read. High PC1 = title dominates over hero/value-prop. Both axes matter for strong visibility.

Heuristic scores across the range

15 representative sites — from excellent to poor AI visibility

Plain EnglishEvery score is computed the same way — no site gets special treatment. The gap between Stripe (84.97) and Airbnb (31.2) comes entirely from measurable signal differences in the HTML.

75+ Excellent

50–74 Good

<50 Needs work

Scores above 75 typically have high-quality hero sections, clear meta descriptions, and strong progressive understanding. Sites below 50 usually have failed hero extraction or thin content.

Signal Comparison

Plain EnglishComparing Stripe (top scorer) vs Basecamp (strong narrative site). The bigger the shape, the better. Stripe leads on description and hero. Basecamp leads on title and headings — different strengths, both above 75.

Stripe vs Basecamp — signal radar

How two high-scoring sites compare signal by signal

Stripe · 84.97

Basecamp · 83.2

Stripe's meta description and hero are near-perfect. Basecamp compensates with very clear title and heading structure — showing multiple paths to a strong score.

Dominant weakness distribution

Most common PCA weakness across 15 representative sites

Plain EnglishFor each site, we find their lowest PCA dimension — the one they need to fix most. understanding_depth is the most common problem: AI can read the site but doesn't progressively get a clearer picture.

signal_imbalance

5 sites

understanding_depth

5 sites

value_prop_speed

3 sites

density_vs_description

2 sites

signal_imbalance

understanding_depth

value_prop_speed

density_vs_description

understanding_depth is the most common weakness — most sites have decent metadata but AI struggles to build a clear picture through progressive reading.

Signal Definitions

Plain EnglishEvery signal is extracted directly from your site's HTML — no guessing, no AI interpretation at score time. Here's exactly what each one measures and why it matters.

title_confidenceStructure

0–1

Is the page title descriptive enough for an AI to know what you do?

Word count, entity words (platform, tool, ai, clinic, school), benefit descriptors, and H1 compensation if title is brand-only.

description_confidenceStructure

0–1

Does the meta description explain your product clearly?

Scored for capability language, audience terms, sentence structure patterns, and optimal length (50–160 chars).

heading_confidenceStructure

0–1

Do the headings form a logical hierarchy and cover diverse topics?

H1 presence, no level skips, capability heading ratio, plus semantic diversity via TF-IDF cosine across all headings.

hero_confidenceStructure

0–1

Does the hero section explain what you do, or is it just a tagline?

Hero word count, entity and benefit language, compositional VP score. Penalises nav-polluted and repetitive text.

value_prop_confidenceValue Proposition

0–1

Can an AI identify your product type, who it's for, and why it's useful?

Pattern-matches product type, target audience, and benefits. Hero and title carry 3× weight vs full page text.

high_info_ratioSemantic Quality

0–1

What fraction of sentences actually say something useful vs filler?

Fraction of sentences above informativeness threshold. Scored on action verbs, numbers, AI phrases, unique word ratio. UI chrome filtered out.

breadth_scoreContent Depth

0–1

Does the page cover multiple distinct topics, or just repeat the same idea?

KMeans clustering on paragraph TF-IDF vectors. Counts distinct topic clusters, normalised to expected count.

avg_sentence_lengthSemantic Quality

words

Are sentences the right length? Too short = choppy. Too long = hard to parse.

Mean words per sentence. Ideal 15–20 words. Penalties applied above 25 and below 12.

imm_scoreUnderstanding

0–100

First impression — how much does an AI understand from just the hero section?

BFS traversal score at hero/top-level depth. Based on value prop signal presence and section quality.

scroll_scoreUnderstanding

0–100

Mid-read — does understanding improve as the AI reads further down the page?

BFS traversal score at mid-depth sections. Monotonically enforced — can only equal or exceed imm_score.

deep_scoreUnderstanding

0–100

Full-read — at peak comprehension, how well does an AI understand the site?

BFS traversal score at full depth. Single most important signal (18.4% of PCA variance).

No black boxes.

Every number in Brandioz is computable from the raw HTML of your site. The weights on this page are the exact weights in production. No ML model adjusting scores behind the scenes.

If a score is worth trusting, you should be able to understand exactly how it was computed — signal by signal, weight by weight.

Production formula

score = (
  title_conf     × 10   # structure
  + desc_conf    × 10
  + hero_conf    ×  8
  + heading_conf ×  5
  + log(words, 22)      # content depth
  + info_score   × 12   # semantic quality
  + sent_score   ×  4
  + diversity    ×  6
  + value_prop   × 18   # value prop
  + hierarchy    ×  5
  - cross_penalties     # internal consistency
  - curve_penalty       # understanding curve
)

# coherence engine applies up to 9 caps
# PCA identifies dominant_weakness