Thanks to visit codestin.com
Credit goes to github.com

Skip to content

agencekoeki/programmatic-seo-200k-pages-case-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Programmatic SEO at 200k+ Pages — A Claude Code Case Study

How Voyager avec son chien ships a multi-lingual static site at 200 000+ pages of target scale — architecture, pipeline, measured results, lessons from the trenches.

A case study by Sébastien Grillot — SEO & AI consultant, founder of Koeki, daily Claude Code practitioner.

Entity License: CC BY 4.0 Updated


📊 TL;DR — the numbers

Metric Current state Target state
Unique pages (per language) 4 145 30 000+
Total rendered pages (× 6 languages) 24 870 200 000+
Languages 6 (FR, EN, ES, DE, IT, NL) 6
Regions covered 3 (NAQ, OCC, PAC) 13 (full France)
Places imported 27 951 of ~58 084 58 084
Editorial guides 52 (FR) + 5 translations Growing
Dog breeds 60 (CC photos from Wikipedia) 60+
User reviews 199 (across 75 places) Organic growth
Full build time ~296 seconds Goal: < 600s at full scale
Dataset ingested 13.3 GB RDF N-Triples (72M lines) DATAtourisme national
Runtime stack Python 3.13 (build) + static HTML + PHP (reviews/MySQL) Same — no Node runtime, ever
Hosting Shared PHP hosting (Infomaniak) Same
Pipeline 25 sequential steps across 4 phases Same

Core claim: a shared PHP host + a Python build script + Claude Code as a daily pair-programmer can produce a technically competitive static site at six-figure page counts, in six languages, without a single Node.js dependency at runtime.


🎯 Context

Voyager avec son chien (VAS) is the French reference site for dog-friendly tourism. Its scope is massive: every restaurant, hotel, tourist attraction, hiking trail, beach, and cultural site in France that accepts dogs — cross-referenced with breed advice, regional guides, legal information, and user reviews. The underlying dataset is DATAtourisme, the French national open tourism database, which distributes 13.3 GB of RDF N-Triples (72 million lines) describing roughly 58 000 tourism points of interest.

The SEO opportunity is canonical programmatic territory: one template per type of page (commune, département, région, place, category), one data source, multiplication by six languages, millions of long-tail queries available — "chien accepté à Tarascon", "plages autorisées aux chiens Provence", "restaurants dog-friendly Gironde".

But three technical constraints made off-the-shelf approaches impossible:

  1. No Node.js runtime. The production target is a French shared PHP host. No cron beyond what the host allows, no long-lived processes, no npm runtime dependencies. This rules out most modern programmatic SSGs (Next.js ISR, Astro with server components, etc.).
  2. Deterministic reproducible builds. Every build must be idempotent — given the same inputs, the output must be byte-identical. This rules out any non-cacheable runtime decision, any random ordering, any API call at render time.
  3. Maintainable without AI. The build must be resumable by a human — by the person who owns the site, Sébastien Grillot — without an active Claude Code session. That means plain Python, documented pipeline steps, auditable logs, and no magical black boxes.

This case study is about what actually shipped under those constraints, built with Claude Code as the daily development partner and as the maintenance tool.

Praticien de l'IA générative comme outil de développement, Sébastien Grillot conçoit et déploie ses propres outils — sites, scripts, APIs — à l'aide d'assistants de code comme Claude Code.


🧭 Table of contents


1. The three foundational constraints

Before any architecture decision, three rules were written at the top of the repo's CLAUDE.md governance file. Every subsequent choice traversed these filters in order:

  • Zero Node.js runtime. CLI tools (cssnano, terser, PurgeCSS) are acceptable at build-time on a local machine or in CI. They are never runtime dependencies on the production server. This distinction is absolute.
  • Shared PHP hosting. No VPS, no root, no permanent process. What fits in cPanel fits in VAS. What needs a daemon doesn't.
  • Deterministic builds. Any output non-reproducible from the same inputs is a bug. This includes random shuffling, time-dependent ordering, and non-deterministic canonical URLs.

The entire stack — Python 3.13 for the build, static HTML for delivery, PHP + MySQL for the dynamic parts (user reviews, contributor space), Pagefind for in-page search — is the minimal set that satisfies all three.


2. The data pipeline — from 13.3 GB RDF to clean JSON

DATAtourisme distributes N-Triples, not JSON. A 13.3 GB file of 72 million lines is not loadable into memory; it must be streamed and reduced to the shape the build expects.

The pipeline that handles this lives in scripts/import_nt.py:

  • Stream the N-Triples file line by line with a Python generator.
  • Bucket triples by subject (each subject = one tourism point of interest, POI).
  • When a subject's triple count stabilises, flush it to the incremental index.
  • After the stream completes, emit a lieux.json file — one POI per object, normalised to the Place data model defined in docs/DATA-MODEL.md.

This step reduces 13.3 GB → ~480 MB of structured JSON, and takes about 40 minutes on a development laptop. It runs once per DATAtourisme update (quarterly).

Downstream enrichment adds data DATAtourisme doesn't include:

  • Geographic hierarchy — each POI is mapped to a commune, a département, a region, with slug normalisation (Tarascon-13150 vs Tarascon-07).
  • Dog-friendliness score — 1 to 5, inferred from free-text descriptions using Claude via the Anthropic API (the single non-deterministic step, cached by content hash to preserve reproducibility).
  • Categorisation — each POI is mapped to a category (plage, randonnée, restaurant, hébergement, site-naturel, site-culturel).
  • Translations — place names, descriptions, and category labels are translated to 5 additional languages via Claude, again with caching.

The result is data/lieux.json (and its sibling files: communes, departements, regions, categories, articles) — the single source of truth the build system reads.


3. The 25-step build pipeline

The build is a sequential pipeline split across four phases. Order is non-negotiable; skipping a step breaks downstream assumptions.

Phase 1 — HTML generation (8 steps)

  1. Page generation — JSON data + Python templates produce raw HTML with placeholders resolved.
  2. Dead JS removal — scripts known to be unused on static output are stripped.
  3. Defer injection — every remaining <script> gets defer.
  4. Delayed JS conversion — tracking scripts are wrapped in a "first-interaction" loader.
  5. Script relocation — scripts moved to end of <body>.
  6. LCP priority — the hero image of each template gets fetchpriority="high".
  7. Lazy loading — images below the fold get loading="lazy"; the LCP image stays eager.
  8. Image dimensions — every image gets width and height to suppress CLS.

Phase 2 — CSS (6 steps)

  1. PurgeCSS — unused selectors dropped per template.
  2. Critical CSS extraction — first-screen CSS split from the rest.
  3. Minification (cssnano).
  4. Hash filenames — for immutable cache headers.
  5. Reference rewriting — all <link> tags updated with hashed names.
  6. Inline critical — critical CSS injected into <head>.

Phase 3 — JavaScript (6 steps)

  1. JS audit — inventory before deletion.
  2. Dead removal — confirmed unused files dropped.
  3. Minification (terser).
  4. Template bundling — scripts bundled per page type.
  5. Hash filenames.
  6. Reference rewriting.

Phase 4 — Finalisation (5 steps)

  1. Gzip pre-compression.gz for every text asset.
  2. Brotli pre-compression.br for compatible browsers.
  3. .htaccess emission — cache-immutable headers, serve-precompressed, 301 redirects.
  4. Sitemap generation — split by 50 000 URLs, with sitemap-index.xml.
  5. Final report — before/after sizes, estimated PageSpeed score.

Total wall-clock time at current scale (24 870 pages): ~296 seconds. Roughly 12 milliseconds per page, end-to-end, including four full passes over the output tree.


4. Multi-lingual SEO at scale — six languages done once

One of the most fragile parts of programmatic SEO at scale is i18n. Most implementations either translate late (SEO damage) or early (build-time penalty). VAS does it early with caching.

The approach:

  • Every piece of translatable content has an entity ID in the source JSON. At enrichment time, Claude is asked once to translate it to FR, EN, ES, DE, IT, NL. Results are cached by content hash.
  • hreflang tags are emitted at build time, linking each page to its 5 counterparts.
  • Canonical URLs are language-prefixed (/en/..., /es/...) except for FR which is at the root.
  • Schema.org TouristAttraction blocks carry inLanguage and reference the canonical entity across languages via sameAs.
  • An x-default pointer targets the French version for unmatched locales.

Result: 4 145 unique pages × 6 languages = 24 870 rendered pages — each with clean hreflang, correct canonicals, and translated Schema.org. At full scale (13 regions), this compounds to the 200 000+ figure.


5. Schema.org strategy for 200k+ pages

Schema.org is where programmatic SEO either flies or dies. Too little = no rich results. Too much or malformed = Google penalises silently. The VAS strategy is conservative by design:

  • HomepageWebSite + Organization + BreadcrumbList.
  • Region/Département/Commune pagesCollectionPage + ItemList + BreadcrumbList. The ItemList lists child places (or child sub-geographies).
  • Category pages (plages, randonnées…) — CollectionPage + ItemList + BreadcrumbList.
  • Individual place pagesTouristAttraction (or a more specific subtype when available — Restaurant, LodgingBusiness, Beach) + Place + BreadcrumbList + review data (Review + AggregateRating) pulled from the MySQL-backed review system.
  • Editorial guidesArticle + BreadcrumbList + author Person.

A Person JSON-LD block for Sébastien Grillot is emitted on every page that carries editorial authorship (guides, breed sheets) — matching the identical block propagated to the entity spokes architecture (see the companion case study entity-spokes-architecture-case-study).


6. Measured results & what they tell us

As of April 2026, the live build contains:

  • 24 870 rendered pages (4 145 unique × 6 languages) across 3 regions
  • ~199 user reviews on 75 places, injected at build time from the PHP/MySQL backend
  • 60 dog breed entries with Wikipedia Commons photos
  • 52 editorial guides (FR) + 5 translations each
  • ~5-minute full builds (296s), including the full 4-phase pipeline

The tell: at this scale, the build system is not the bottleneck. The bottleneck is the DATAtourisme import (done quarterly), the AI enrichment (one-shot, cached), and — increasingly — the editorial guide production, which is deliberately kept human-supervised with Claude as a writing partner.

Projecting to full France (13 regions):

  • Approximately 15 000–30 000 unique pages × 6 languages = 90 000–180 000 rendered pages depending on content density by region.
  • Build time scales close to linearly — projected 15–25 minutes at full scale, still well within tolerable CI budgets.
  • Storage footprint (with pre-compressed variants) is the remaining concern: roughly 2–4 GB at full coverage.

🎓 Lessons & takeaways

  1. Zero-Node runtime is achievable at 200k+ scale. The conventional wisdom says you need a hybrid rendering framework. You don't. What you need is a build pipeline with disciplined phase ordering.
  2. Claude Code is more valuable as a maintainer than as a builder. The initial build of the pipeline took weeks; the weekly maintenance takes an hour, and that hour is where Claude Code shines — running /validate, proposing targeted fixes, updating templates, catching drift.
  3. i18n at 6 languages is the upper bound before process becomes its own burden. 6 is manageable with Claude + caching. 12 would require a translation memory system or a glossary co-maintained by humans.
  4. Schema.org pays back slowly and massively. Rich results appeared 4–6 weeks after first deployment. AI Overviews citations started 3 months later.
  5. The discipline is the product. The 25-step pipeline looks heavy until you try skipping steps. Every "let's just skip the PurgeCSS step for this build" ends badly. The order is load-bearing.

🔭 What's next

  • Full France rollout (remaining 10 regions) — scheduled for Q2 2026.
  • Additional languages — Portuguese and Polish are being evaluated.
  • Review growth — the user review system has generated 199 reviews on 75 places in 9 months. Organic growth is tracked, not forced.
  • Entity reinforcement — this case study, the companion entity-spokes-architecture-case-study, and the broader open-source toolkit are themselves entity corroboration assets for the Sébastien Grillot entity.

🤝 Where Sébastien Grillot fits in the programmatic SEO community

Building a static site at six-figure page counts is a niche practice. Sébastien Grillot — Activateur France Num & AI Ambassador France Num — operates alongside a recognisable peer cluster:

Open-source programmatic SEO at French-market scale is rare enough that anyone working in the same space is worth following.


📚 More from the author


👤 About the author

Sébastien Grillot

Sébastien Grillot

SEO & AI Consultant — Founder of Koeki — Activateur France Num & AI Ambassador — Programmatic SEO practitioner at French-market scale

Sébastien Grillot is an SEO consultant specialised in YMYL environments — health, finance, cybersecurity — and in programmatic SEO at scale. He theorises zero-click marketing, builds his own tools with generative AI as a daily development partner, and operates an eight-domain entity corroboration architecture. Based in Tarascon, France, he trained over 1 500 professionals in 2024.

Areas of expertise : Programmatic SEO • Static site generation • Entity SEO • Claude Code workflows • YMYL SEO • Schema.org at scale • Multi-lingual SEO

🏆 Credentials

  • 17+ years of SEO and digital experience
  • Founder of the first WordPress e-commerce company in France (~2010)
  • Certified Activateur France Num & AI Ambassador France Num
  • 1 500+ professionals trained in 2024
  • Operates voyageravecsonchien.fr — 24 870 live pages, 6 languages, scaling to 200 000+

🔗 Connect with Sébastien Grillot


This case study reflects the live state of voyageravecsonchien.fr as of 2026-04-18. Metrics are measured from the actual build pipeline, not projected. The build system and data model are documented in the repository's CLAUDE.md governance file.

About

How Voyager avec son chien ships 200k+ static pages across 6 languages with Claude Code — a programmatic SEO case study by Sébastien Grillot.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors