feat(chief-data-officer-advisor): decision-driven CDO skill (v2.5.2)#620
Conversation
Opinionated CDO skill covering 4 specific decisions, not a generic data governance survey: 1. Can we train our model on this data? (training rights matrix) 2. Warehouse / lakehouse / mesh + build-vs-buy? (data product strategy) 3. What is our customer data worth? (B2B customer-data-as-asset) 4. What data role do we hire next? (data team org evolution) Built under explicit karpathy-coder discipline: - Assumptions surfaced upfront before code (principle 1) - Each tool/reference covers ONE decision; rejected generic-survey scope (#2) - Surgical changes only; caught and reverted scope creep (cs-gc voice spec) before commit (#3) - Verifiable success criteria locked before code; all 3 tools smoke-tested with embedded samples (#4) - karpathy-coder/complexity_checker.py: 0 findings on 3 new tools - karpathy-coder/diff_surgeon.py: 0 findings on staged diff 3 stdlib Python tools with deterministic logic (not pattern-match prose): - ai_training_data_audit.py — 3-dimension matrix (origin x class x use case) with GDPR Art. 6 + EU AI Act + US state citations. Embedded sample tests 7 sources spanning all 3 verdicts (2 NO-GO / 2 MITIGATE / 3 GO). - data_product_strategy_picker.py — Picks warehouse/lakehouse/mesh from profile, returns 6-layer build-vs-buy + 12-month sequencing. Series A sample (8 consumers, 4.5TB, 1 ML model) -> LAKEHOUSE. - data_asset_valuator.py — Strategic value 0-10 from 4 components (exclusivity, freshness, cohort, history), moat strength, M&A multiplier (1.0x-1.7x ARR with carve-out penalties), 3 ranked productization paths. Sample (B2B sales engagement, 380 customers, 47 carve-outs) -> 8.2/10 STRONG moat, 1.33-1.61x multiplier, recommends benchmark report first. 4 references, each answering ONE decision: - ai_training_data_rights.md — Training rights matrix + GDPR decision tree + EU AI Act + US state patchwork (CCPA/CPRA, NYC LL 144, IL BIPA, WA MHMD) - data_product_strategy.md — Architecture kill criteria + 6-layer build-vs-buy + sequencing pattern + anti-patterns - customer_data_as_asset.md — Valuation framework + 3 productization paths + 10-item M&A diligence checklist + contractual constraint audit - data_team_org_evolution.md — 5-stage role map + centralize-vs-embed trigger + 6 anti-patterns (e.g., "hiring data scientist as first hire") cs-cdo-advisor agent (c-level-agents/agents/cs-cdo-advisor.md): - Decision-driven realist voice - Hard rule: does not duplicate engineering data skills (database-designer, observability-designer, rag-architect, llm-cost-optimizer) - Refuses to recommend tooling before naming the consumer /cs:cdo-review slash command: - 6-question forcing interrogation matching /cs:cfo-review pattern - Routes to /cs:gc-review, /cs:ciso-review, /cs:cfo-review, /cs:chro-review cs-cdo-advisor voice spec added to persona-voices.md. Known follow-up (out of scope this PR): cs-general-counsel-advisor voice spec is missing from persona-voices.md (gap from v2.5.1); separate small PR. Updates: - c-level plugin.json: v2.5.1 -> v2.5.2 (30 skills, 10 cs-* agents) - c-level-agents plugin.json: v1.1.0 -> v1.2.0 (10 agents, 18 commands) - marketplace.json: both c-level entries; new CDO keywords (chief-data-officer, cdo, ai-training-data, data-product-strategy, data-as-asset) - c-level CLAUDE.md: CDO row added; agent + count tables updated - Root CLAUDE.md: 264 -> 265 skills, 29 -> 30 cs-* agents, 361 -> 364 tools, 490 -> 494 references, 50 -> 51 commands; v2.5.2 highlight added - CHANGELOG.md: v2.5.2 entry with karpathy-discipline rationale Disclaimer in every output: not legal advice; not a replacement for outside counsel on productization/licensing; not a tactical data engineering skill. https://claude.ai/code/session_012WtZMm5NJHqkYoRqA9fHMN
Code Review — PR 620: chief-data-officer-advisor (v2.5.2)Overall: Well-scoped, high-quality addition. Deterministic logic, stdlib-only, embedded smoke-test samples, explicit kill criteria, and decision-driven framing (not generic survey). Counting updates are consistent across all 17 changed files. Ready for merge after addressing the issues below. Bugs / Logic Issues1. Dead code in Rule 1 catches if data_class == "anonymous-aggregate":
if origin == "scraped":
pass # dead code; scraped never reaches here
return ("GO", ...)The comment "Already handled above" is accurate, but the block is misleading — a reader might wonder why scraped+anonymous-aggregate yields GO. Remove the dead block entirely or collapse to just the 2. Domain-count criterion missing from data mesh logic ( The comment says: Style / Quality3. Both scripts import 4.
5. Emoji in stdout output ( The text renderer uses colored-circle emoji as verdict markers. These render fine in most modern terminals but will produce garbled output in older CI/CD log viewers and Windows cmd. A plain-text fallback ( Documentation / CountsCounts are consistent across all 17 files:
Known follow-up acknowledged: Security & SafetyNo issues found:
Skill QualityThe decision-framing (four named decisions vs. topic survey) is the right approach for a strategic skill, and mirrors the pattern established by Blockers before merge:
Non-blocking (follow-up PRs):
|
Summary
Opinionated, decision-driven Chief Data Officer skill — refuses to be a generic data-governance survey. Answers exactly 4 specific decisions every B2B SaaS founder is asking by Series A/B:
ai_training_data_audit.pydata_product_strategy_picker.pydata_asset_valuator.pydata_team_org_evolution.mdreferenceThis is the third "gstack-can't-touch" plugin in the founder-mode lineup (after c-level-agents v2.5.0 and general-counsel-advisor v2.5.1). gstack covers software-shipping personas only — modern data strategy + AI training rights + M&A data diligence are nowhere in its surface.
Built Under Explicit Karpathy-Coder Discipline
This was the first PR in this repo built under explicit karpathy-coder guidance. Self-audit results:
complexity_checker.pyon the 3 new Python tools: 0 findingsdiff_surgeon.pyon the staged diff: 0 findings (surgical changes confirmed)How the principles shaped the PR:
What's New (17 files, +2471 / -19 lines)
New skill —
c-level-advisor/skills/chief-data-officer-advisor/SKILL.md— 4 workflows (training go/no-go, architecture, asset valuation, org roadmap), CDO-specific keywords, hard rule against duplicating engineering data skills.3 stdlib Python tools with deterministic logic (not pattern-match prose):
ai_training_data_audit.pydata_product_strategy_picker.pydata_asset_valuator.py4 references answering one decision each:
ai_training_data_rights.md— Training rights matrix + GDPR Art. 6 lawful basis decision tree + EU AI Act high-risk triggers + US state patchwork (CCPA/CPRA, NYC LL 144, IL BIPA, WA MHMD)data_product_strategy.md— Architecture kill criteria + 6-layer build-vs-buy decision tree + sequencing pattern + 5 anti-patternscustomer_data_as_asset.md— 5-component valuation framework + 3 productization paths (benchmark / embedding / license) + 10-item M&A diligence checklist + quarterly contractual constraint auditdata_team_org_evolution.md— 5-stage role map (seed → late-stage) + centralize-vs-embed-vs-federated triggers + 6 anti-patternsNew agent —
cs-cdo-advisorDecision-driven realist. Voice: "What decision does this data drive?" Hard rule: never recommend tooling before naming the consumer.
New slash command —
/cs:cdo-review6-question forcing interrogation matching the
/cs:cfo-review,/cs:gc-reviewpattern. Routes to/cs:gc-review,/cs:ciso-review,/cs:cfo-review,/cs:chro-reviewfor cross-functional concerns.Updates
c-level-advisor/c-level-agents/references/persona-voices.md— added cs-cdo-advisor voice specc-level-advisor/.claude-plugin/plugin.json— v2.5.1 → v2.5.2 (30 skills, 10 cs-* agents)c-level-advisor/c-level-agents/.claude-plugin/plugin.json— v1.1.0 → v1.2.0 (10 agents, 18 commands)marketplace.json— both c-level entries bumped; new CDO keywordsc-level-advisor/CLAUDE.md— CDO row added to roles table; agents/counts updatedCLAUDE.md— 264 → 265 skills, 29 → 30 cs-* agents, 361 → 364 Python tools, 490 → 494 references, 50 → 51 commandsCHANGELOG.md— v2.5.2 entry with karpathy-discipline rationaleCounts Δ
Verifiable Success Criteria (all met)
ai_training_data_audit.pyreturns GO/MITIGATE/NO-GO per source with specific risk + remediationdata_product_strategy_picker.pyoutputs specific role + sequencing recommendations--help, JSON output, exit 0 on sampleTest plan
--helpand--output jsonLint, Tests, Docs, Securitypassesclaude-reviewpassesVirusTotalpasses (stdlib-only, no new deps)Detect changed skillspassesKnown follow-up (out of scope this PR per karpathy principle #3)
cs-general-counsel-advisorvoice spec is missing frompersona-voices.md(a gap introduced in v2.5.1 but not added to the voice reference). I caught myself adding it during this PR and reverted — it belongs in a separate small PR alongside any other voice-reference cleanup.Phase 2 remaining (deferred to v2.5.3+)
After this PR, 4 more C-roles remain from the original Phase 2 plan:
Plus the broken-paths cleanup in pre-existing cs-ceo-advisor.md / cs-cto-advisor.md.
Disclaimer
The
chief-data-officer-advisorskill surfaces strategic decisions but is not legal advice for AI training, not a replacement for outside counsel for productization/licensing, and not a tactical data engineering skill. For tactical data engineering, seeengineering/database-designer/,engineering/observability-designer/,engineering/data-quality-auditor/,engineering/sql-database-assistant/,engineering/rag-architect/,engineering/llm-cost-optimizer/.https://claude.ai/code/session_012WtZMm5NJHqkYoRqA9fHMN
Generated by Claude Code