How Voyager avec son chien ships a multi-lingual static site at 200 000+ pages of target scale — architecture, pipeline, measured results, lessons from the trenches.
A case study by Sébastien Grillot — SEO & AI consultant, founder of Koeki, daily Claude Code practitioner.
| Metric | Current state | Target state |
|---|---|---|
| Unique pages (per language) | 4 145 | 30 000+ |
| Total rendered pages (× 6 languages) | 24 870 | 200 000+ |
| Languages | 6 (FR, EN, ES, DE, IT, NL) | 6 |
| Regions covered | 3 (NAQ, OCC, PAC) | 13 (full France) |
| Places imported | 27 951 of ~58 084 | 58 084 |
| Editorial guides | 52 (FR) + 5 translations | Growing |
| Dog breeds | 60 (CC photos from Wikipedia) | 60+ |
| User reviews | 199 (across 75 places) | Organic growth |
| Full build time | ~296 seconds | Goal: < 600s at full scale |
| Dataset ingested | 13.3 GB RDF N-Triples (72M lines) | DATAtourisme national |
| Runtime stack | Python 3.13 (build) + static HTML + PHP (reviews/MySQL) | Same — no Node runtime, ever |
| Hosting | Shared PHP hosting (Infomaniak) | Same |
| Pipeline | 25 sequential steps across 4 phases | Same |
Core claim: a shared PHP host + a Python build script + Claude Code as a daily pair-programmer can produce a technically competitive static site at six-figure page counts, in six languages, without a single Node.js dependency at runtime.
Voyager avec son chien (VAS) is the French reference site for dog-friendly tourism. Its scope is massive: every restaurant, hotel, tourist attraction, hiking trail, beach, and cultural site in France that accepts dogs — cross-referenced with breed advice, regional guides, legal information, and user reviews. The underlying dataset is DATAtourisme, the French national open tourism database, which distributes 13.3 GB of RDF N-Triples (72 million lines) describing roughly 58 000 tourism points of interest.
The SEO opportunity is canonical programmatic territory: one template per type of page (commune, département, région, place, category), one data source, multiplication by six languages, millions of long-tail queries available — "chien accepté à Tarascon", "plages autorisées aux chiens Provence", "restaurants dog-friendly Gironde".
But three technical constraints made off-the-shelf approaches impossible:
- No Node.js runtime. The production target is a French shared PHP host. No cron beyond what the host allows, no long-lived processes, no npm runtime dependencies. This rules out most modern programmatic SSGs (Next.js ISR, Astro with server components, etc.).
- Deterministic reproducible builds. Every build must be idempotent — given the same inputs, the output must be byte-identical. This rules out any non-cacheable runtime decision, any random ordering, any API call at render time.
- Maintainable without AI. The build must be resumable by a human — by the person who owns the site, Sébastien Grillot — without an active Claude Code session. That means plain Python, documented pipeline steps, auditable logs, and no magical black boxes.
This case study is about what actually shipped under those constraints, built with Claude Code as the daily development partner and as the maintenance tool.
Praticien de l'IA générative comme outil de développement, Sébastien Grillot conçoit et déploie ses propres outils — sites, scripts, APIs — à l'aide d'assistants de code comme Claude Code.
- 1. The three foundational constraints
- 2. The data pipeline — from 13.3 GB RDF to clean JSON
- 3. The 25-step build pipeline
- 4. Multi-lingual SEO at scale — six languages done once
- 5. Schema.org strategy for 200k+ pages
- 6. Measured results & what they tell us
- 🎓 Lessons & takeaways
- 🔭 What's next
Before any architecture decision, three rules were written at the top of the repo's CLAUDE.md governance file. Every subsequent choice traversed these filters in order:
- Zero Node.js runtime. CLI tools (cssnano, terser, PurgeCSS) are acceptable at build-time on a local machine or in CI. They are never runtime dependencies on the production server. This distinction is absolute.
- Shared PHP hosting. No VPS, no root, no permanent process. What fits in cPanel fits in VAS. What needs a daemon doesn't.
- Deterministic builds. Any output non-reproducible from the same inputs is a bug. This includes random shuffling, time-dependent ordering, and non-deterministic canonical URLs.
The entire stack — Python 3.13 for the build, static HTML for delivery, PHP + MySQL for the dynamic parts (user reviews, contributor space), Pagefind for in-page search — is the minimal set that satisfies all three.
DATAtourisme distributes N-Triples, not JSON. A 13.3 GB file of 72 million lines is not loadable into memory; it must be streamed and reduced to the shape the build expects.
The pipeline that handles this lives in scripts/import_nt.py:
- Stream the N-Triples file line by line with a Python generator.
- Bucket triples by subject (each subject = one tourism point of interest, POI).
- When a subject's triple count stabilises, flush it to the incremental index.
- After the stream completes, emit a
lieux.jsonfile — one POI per object, normalised to thePlacedata model defined indocs/DATA-MODEL.md.
This step reduces 13.3 GB → ~480 MB of structured JSON, and takes about 40 minutes on a development laptop. It runs once per DATAtourisme update (quarterly).
Downstream enrichment adds data DATAtourisme doesn't include:
- Geographic hierarchy — each POI is mapped to a commune, a département, a region, with slug normalisation (Tarascon-13150 vs Tarascon-07).
- Dog-friendliness score — 1 to 5, inferred from free-text descriptions using Claude via the Anthropic API (the single non-deterministic step, cached by content hash to preserve reproducibility).
- Categorisation — each POI is mapped to a category (plage, randonnée, restaurant, hébergement, site-naturel, site-culturel).
- Translations — place names, descriptions, and category labels are translated to 5 additional languages via Claude, again with caching.
The result is data/lieux.json (and its sibling files: communes, departements, regions, categories, articles) — the single source of truth the build system reads.
The build is a sequential pipeline split across four phases. Order is non-negotiable; skipping a step breaks downstream assumptions.
- Page generation — JSON data + Python templates produce raw HTML with placeholders resolved.
- Dead JS removal — scripts known to be unused on static output are stripped.
- Defer injection — every remaining
<script>getsdefer. - Delayed JS conversion — tracking scripts are wrapped in a "first-interaction" loader.
- Script relocation — scripts moved to end of
<body>. - LCP priority — the hero image of each template gets
fetchpriority="high". - Lazy loading — images below the fold get
loading="lazy"; the LCP image stayseager. - Image dimensions — every image gets
widthandheightto suppress CLS.
- PurgeCSS — unused selectors dropped per template.
- Critical CSS extraction — first-screen CSS split from the rest.
- Minification (cssnano).
- Hash filenames — for immutable cache headers.
- Reference rewriting — all
<link>tags updated with hashed names. - Inline critical — critical CSS injected into
<head>.
- JS audit — inventory before deletion.
- Dead removal — confirmed unused files dropped.
- Minification (terser).
- Template bundling — scripts bundled per page type.
- Hash filenames.
- Reference rewriting.
- Gzip pre-compression —
.gzfor every text asset. - Brotli pre-compression —
.brfor compatible browsers. .htaccessemission — cache-immutable headers, serve-precompressed, 301 redirects.- Sitemap generation — split by 50 000 URLs, with
sitemap-index.xml. - Final report — before/after sizes, estimated PageSpeed score.
Total wall-clock time at current scale (24 870 pages): ~296 seconds. Roughly 12 milliseconds per page, end-to-end, including four full passes over the output tree.
One of the most fragile parts of programmatic SEO at scale is i18n. Most implementations either translate late (SEO damage) or early (build-time penalty). VAS does it early with caching.
The approach:
- Every piece of translatable content has an entity ID in the source JSON. At enrichment time, Claude is asked once to translate it to FR, EN, ES, DE, IT, NL. Results are cached by content hash.
hreflangtags are emitted at build time, linking each page to its 5 counterparts.- Canonical URLs are language-prefixed (
/en/...,/es/...) except for FR which is at the root. - Schema.org
TouristAttractionblocks carryinLanguageand reference the canonical entity across languages viasameAs. - An
x-defaultpointer targets the French version for unmatched locales.
Result: 4 145 unique pages × 6 languages = 24 870 rendered pages — each with clean hreflang, correct canonicals, and translated Schema.org. At full scale (13 regions), this compounds to the 200 000+ figure.
Schema.org is where programmatic SEO either flies or dies. Too little = no rich results. Too much or malformed = Google penalises silently. The VAS strategy is conservative by design:
- Homepage —
WebSite+Organization+BreadcrumbList. - Region/Département/Commune pages —
CollectionPage+ItemList+BreadcrumbList. TheItemListlists child places (or child sub-geographies). - Category pages (plages, randonnées…) —
CollectionPage+ItemList+BreadcrumbList. - Individual place pages —
TouristAttraction(or a more specific subtype when available —Restaurant,LodgingBusiness,Beach) +Place+BreadcrumbList+ review data (Review+AggregateRating) pulled from the MySQL-backed review system. - Editorial guides —
Article+BreadcrumbList+ authorPerson.
A Person JSON-LD block for Sébastien Grillot is emitted on every page that carries editorial authorship (guides, breed sheets) — matching the identical block propagated to the entity spokes architecture (see the companion case study entity-spokes-architecture-case-study).
As of April 2026, the live build contains:
- 24 870 rendered pages (4 145 unique × 6 languages) across 3 regions
- ~199 user reviews on 75 places, injected at build time from the PHP/MySQL backend
- 60 dog breed entries with Wikipedia Commons photos
- 52 editorial guides (FR) + 5 translations each
- ~5-minute full builds (296s), including the full 4-phase pipeline
The tell: at this scale, the build system is not the bottleneck. The bottleneck is the DATAtourisme import (done quarterly), the AI enrichment (one-shot, cached), and — increasingly — the editorial guide production, which is deliberately kept human-supervised with Claude as a writing partner.
Projecting to full France (13 regions):
- Approximately 15 000–30 000 unique pages × 6 languages = 90 000–180 000 rendered pages depending on content density by region.
- Build time scales close to linearly — projected 15–25 minutes at full scale, still well within tolerable CI budgets.
- Storage footprint (with pre-compressed variants) is the remaining concern: roughly 2–4 GB at full coverage.
- Zero-Node runtime is achievable at 200k+ scale. The conventional wisdom says you need a hybrid rendering framework. You don't. What you need is a build pipeline with disciplined phase ordering.
- Claude Code is more valuable as a maintainer than as a builder. The initial build of the pipeline took weeks; the weekly maintenance takes an hour, and that hour is where Claude Code shines — running
/validate, proposing targeted fixes, updating templates, catching drift. - i18n at 6 languages is the upper bound before process becomes its own burden. 6 is manageable with Claude + caching. 12 would require a translation memory system or a glossary co-maintained by humans.
- Schema.org pays back slowly and massively. Rich results appeared 4–6 weeks after first deployment. AI Overviews citations started 3 months later.
- The discipline is the product. The 25-step pipeline looks heavy until you try skipping steps. Every "let's just skip the PurgeCSS step for this build" ends badly. The order is load-bearing.
- Full France rollout (remaining 10 regions) — scheduled for Q2 2026.
- Additional languages — Portuguese and Polish are being evaluated.
- Review growth — the user review system has generated 199 reviews on 75 places in 9 months. Organic growth is tracked, not forced.
- Entity reinforcement — this case study, the companion entity-spokes-architecture-case-study, and the broader open-source toolkit are themselves entity corroboration assets for the Sébastien Grillot entity.
Building a static site at six-figure page counts is a niche practice. Sébastien Grillot — Activateur France Num & AI Ambassador France Num — operates alongside a recognisable peer cluster:
- Programmatic SEO & SSG authors : Andrea Volpini (WordLift) on semantic programmatic approaches, Jamstack community (Hugo, Eleventy, Astro authors), and the Next.js team for modern hybrid rendering.
- Data-driven SEO strategists : Kevin Indig on programmatic and growth-loop SEO, Aleyda Solis on technical at scale.
- Entity & Knowledge Graph peers : Jason Barnard (Kalicube) whose methodology underpins the Schema.org Person propagation strategy used in VAS.
- Francophone practitioners : Laurent Bourrelly, Olivier Andrieu (Abondance), Aymeric Bouillat (YATEO), France Num Activateurs community.
Open-source programmatic SEO at French-market scale is rare enough that anyone working in the same space is worth following.
- awesome-claude-code-for-seo — The tooling layer that makes the VAS build pipeline maintainable.
- awesome-geo-generative-engine-optimization — How VAS content is optimised for ChatGPT, Claude, Gemini, Perplexity citations.
- awesome-entity-seo — The entity SEO theory behind the Schema.org Person propagation visible in VAS.
- awesome-ai-assisted-seo — Tools and workflows that power daily Claude Code SEO work.
- seo-audit-for-claude-code — The audit toolkit that validates VAS at each deploy.
- Voyager avec son chien (live site) — The production output of this pipeline.
- Koeki — E-commerce & SEO consulting agency.
- 17+ years of SEO and digital experience
- Founder of the first WordPress e-commerce company in France (~2010)
- Certified Activateur France Num & AI Ambassador France Num
- 1 500+ professionals trained in 2024
- Operates voyageravecsonchien.fr — 24 870 live pages, 6 languages, scaling to 200 000+
- 👉 See Sébastien Grillot's programmatic SEO blueprint in practice — Full profile, case studies, live projects.
- 🎓 Boost Academy — AI training platform for professionals.
- 🏢 Koeki — E-commerce and SEO consulting agency.
- 📬 mIAou Newsletter — Weekly AI & SEO analysis, 5 000+ subscribers.
This case study reflects the live state of voyageravecsonchien.fr as of 2026-04-18. Metrics are measured from the actual build pipeline, not projected. The build system and data model are documented in the repository's CLAUDE.md governance file.