Aligning technology origins and adoption metric between the crux and the crawl data #42

max-ostapenko · 2025-01-08T21:28:00Z

The number seemed high because I anecdotally know WordPress market share of CMSs is around 75%, but we're only showing 3.2M WordPress origins in the CMS report.
So if there are 8.8M sites that use a CMS, that puts WordPress's market share at 36%, which is way too low.
The issue seems to be that the WordPress count is taken after joining with the CrUX dataset, and many sites have fallen out of CrUX.
Modifying your query to count WordPress sites in November:

SELECT
  COUNT(DISTINCT root_page)
FROM crawl.pages
WHERE date = '2024-11-01'
  AND client = 'mobile'
  AND 'WordPress' IN UNNEST(technologies.technology)

The result is 5785472, which gets us much closer to the expected market share: 65%.
So there are about 2.5M WordPress sites that we're counting in the category total but not in the technology total.
Open to suggestions on how to fix this. One idea is to remove the CrUX join (or do some sort of outer join) when calculating origin counts.
Yeah we subtly changed the name from "CWV Tech Report" to "HTTP Archive Tech Report" so that we could lean more heavily onto the adoption side, so joining forces makes a lot of sense

I see 2 issues here:

we use different URL sets in the report: November crawl is based on Oct CrUX, but we are JOINing it with Nov CrUX. It's either complete or timely. (0.6M discrepancy)
we are not using tablet and NULL clients from CrUX - so more unmatched origins (1.9M). No geo and rank available for aggregation.

A promising analysis logic

Calculate adoption with crawl data, as it's the original source.
This will help us to solve adoption with the most complete set of origins, including the CrUX's tablet and NULL clients.
But only the global ones, geo dimension is part of CrUX and thus unavailable. We could still use INNER JOIN there.

The text was updated successfully, but these errors were encountered:

tunetheweb · 2025-01-08T21:33:40Z

What would the adoption share be if we included CrUX data in the total? i.e. we insist on both to be in this dataset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aligning technology origins and adoption metric between the crux and the crawl data #42

Aligning technology origins and adoption metric between the crux and the crawl data #42

max-ostapenko commented Jan 8, 2025

tunetheweb commented Jan 8, 2025

Aligning technology origins and adoption metric between the crux and the crawl data #42

Aligning technology origins and adoption metric between the crux and the crawl data #42

Comments

max-ostapenko commented Jan 8, 2025

tunetheweb commented Jan 8, 2025