Codestin Search App

alirezarezvani · 2026-05-12T15:21:40Z

Summary

Opinionated, decision-driven Chief Data Officer skill — refuses to be a generic data-governance survey. Answers exactly 4 specific decisions every B2B SaaS founder is asking by Series A/B:

Can we train our model on this data? → ai_training_data_audit.py
Warehouse, lakehouse, or mesh — and build vs buy? → data_product_strategy_picker.py
What is our customer data worth? (M&A, productization) → data_asset_valuator.py
What data role do we hire next? → data_team_org_evolution.md reference

This is the third "gstack-can't-touch" plugin in the founder-mode lineup (after c-level-agents v2.5.0 and general-counsel-advisor v2.5.1). gstack covers software-shipping personas only — modern data strategy + AI training rights + M&A data diligence are nowhere in its surface.

Built Under Explicit Karpathy-Coder Discipline

This was the first PR in this repo built under explicit karpathy-coder guidance. Self-audit results:

complexity_checker.py on the 3 new Python tools: 0 findings
diff_surgeon.py on the staged diff: 0 findings (surgical changes confirmed)

How the principles shaped the PR:

Add Claude Code GitHub Workflow #1 Think before coding: Assumptions surfaced upfront in chat before any file was written. User pushback expanded scope from 2 framings → 4, and direction was locked before implementation.
fix: correct table of contents anchor links in README.md #2 Simplicity first: Each tool/reference covers ONE decision. Explicitly rejected generic-survey framing (e.g., "data product strategy picker" doesn't try to also do data quality, RAG, or schema design).
Suggestion: Convert this repo into a Claude Marketplace #3 Surgical changes: Caught and reverted one scope-creep attempt mid-build (adding cs-general-counsel-advisor voice spec while editing persona-voices.md). That gap from v2.5.1 will be addressed in a separate small PR.
feat: refactor repository to Claude Code plugin marketplace structure #4 Goal-driven execution: Verifiable success criteria locked before code (see PR comment thread for the criteria table). All 3 Python tools smoke-tested with embedded samples before commit.

What's New (17 files, +2471 / -19 lines)

New skill — `c-level-advisor/skills/chief-data-officer-advisor/`

SKILL.md — 4 workflows (training go/no-go, architecture, asset valuation, org roadmap), CDO-specific keywords, hard rule against duplicating engineering data skills.

3 stdlib Python tools with deterministic logic (not pattern-match prose):

Tool	What it does	Embedded sample result
`ai_training_data_audit.py`	3-dimension matrix (origin × class × use case) → GO/MITIGATE/NO-GO per source with GDPR Art. 6 + EU AI Act + US state citations	7 sources → 2 NO-GO / 2 MITIGATE / 3 GO
`data_product_strategy_picker.py`	Architecture (warehouse/lakehouse/mesh) + 6-layer build-vs-buy + 12-month sequencing	Series A SaaS, 8 consumers, 4.5TB → LAKEHOUSE
`data_asset_valuator.py`	Strategic value 0-10, moat strength, M&A multiplier (with carve-out + anonymization penalties), 3 ranked productization paths	B2B sales engagement, 380 customers, 47 carve-outs → 8.2/10 STRONG moat, 1.33-1.61x multiplier, recommends benchmark report first

4 references answering one decision each:

ai_training_data_rights.md — Training rights matrix + GDPR Art. 6 lawful basis decision tree + EU AI Act high-risk triggers + US state patchwork (CCPA/CPRA, NYC LL 144, IL BIPA, WA MHMD)
data_product_strategy.md — Architecture kill criteria + 6-layer build-vs-buy decision tree + sequencing pattern + 5 anti-patterns
customer_data_as_asset.md — 5-component valuation framework + 3 productization paths (benchmark / embedding / license) + 10-item M&A diligence checklist + quarterly contractual constraint audit
data_team_org_evolution.md — 5-stage role map (seed → late-stage) + centralize-vs-embed-vs-federated triggers + 6 anti-patterns

New agent — `cs-cdo-advisor`

Decision-driven realist. Voice: "What decision does this data drive?" Hard rule: never recommend tooling before naming the consumer.

New slash command — `/cs:cdo-review`

6-question forcing interrogation matching the /cs:cfo-review, /cs:gc-review pattern. Routes to /cs:gc-review, /cs:ciso-review, /cs:cfo-review, /cs:chro-review for cross-functional concerns.

Updates

c-level-advisor/c-level-agents/references/persona-voices.md — added cs-cdo-advisor voice spec
c-level-advisor/.claude-plugin/plugin.json — v2.5.1 → v2.5.2 (30 skills, 10 cs-* agents)
c-level-advisor/c-level-agents/.claude-plugin/plugin.json — v1.1.0 → v1.2.0 (10 agents, 18 commands)
marketplace.json — both c-level entries bumped; new CDO keywords
c-level-advisor/CLAUDE.md — CDO row added to roles table; agents/counts updated
Root CLAUDE.md — 264 → 265 skills, 29 → 30 cs-* agents, 361 → 364 Python tools, 490 → 494 references, 50 → 51 commands
CHANGELOG.md — v2.5.2 entry with karpathy-discipline rationale

Counts Δ

	v2.5.1	v2.5.2	Δ
c-level skills	29	30	+1
cs-* agents (c-level-agents plugin)	9	10	+1
/cs:* slash commands	17	18	+1
c-level Python tools	27	30	+3
c-level references	57	61	+4
Total repo skills	264	265	+1
Total cs-* agents	29	30	+1

Verifiable Success Criteria (all met)

#	Criterion	Verified by
1	`ai_training_data_audit.py` returns GO/MITIGATE/NO-GO per source with specific risk + remediation	7 sources → 2/2/3 verdict split with GDPR + AI Act citations
2	`data_product_strategy_picker.py` outputs specific role + sequencing recommendations	Series A → LAKEHOUSE, 6 layers, 4 quarters
3	Each of 4 references covers one specific decision	Headers per reference describe a decision, not a topic
4	SKILL.md keywords are CDO-specific, not data-generic	"AI training rights", "consent provenance", "centralize-vs-embed trigger"
5	cs-cdo-advisor voice differentiates from cs-cto / cs-ciso / cs-cpo	Voice spec added
6	All 3 tools: stdlib-only, `--help`, JSON output, exit 0 on sample	Smoke test confirmed
7	karpathy complexity_checker: 0 findings	Run before commit
8	karpathy diff_surgeon: 0 findings	Run on staged diff before commit

Test plan

JSON validates (plugin.json × 2, marketplace.json)
All 3 Python tools run with --help and --output json
Embedded samples produce expected outputs
karpathy-coder/complexity_checker: 0 findings
karpathy-coder/diff_surgeon: 0 findings
CI: Lint, Tests, Docs, Security passes
CI: claude-review passes
CI: VirusTotal passes (stdlib-only, no new deps)
CI: Detect changed skills passes

Known follow-up (out of scope this PR per karpathy principle #3)

The cs-general-counsel-advisor voice spec is missing from persona-voices.md (a gap introduced in v2.5.1 but not added to the voice reference). I caught myself adding it during this PR and reverted — it belongs in a separate small PR alongside any other voice-reference cleanup.

Phase 2 remaining (deferred to v2.5.3+)

After this PR, 4 more C-roles remain from the original Phase 2 plan:

Chief AI Officer (CAIO)
Chief Customer Officer (CCO-customer)
VP of Engineering (execution layer below CTO)
Chief Communications Officer (CCO-comms)

Plus the broken-paths cleanup in pre-existing cs-ceo-advisor.md / cs-cto-advisor.md.

Disclaimer

The chief-data-officer-advisor skill surfaces strategic decisions but is not legal advice for AI training, not a replacement for outside counsel for productization/licensing, and not a tactical data engineering skill. For tactical data engineering, see engineering/database-designer/, engineering/observability-designer/, engineering/data-quality-auditor/, engineering/sql-database-assistant/, engineering/rag-architect/, engineering/llm-cost-optimizer/.

https://claude.ai/code/session_012WtZMm5NJHqkYoRqA9fHMN

Generated by Claude Code

Opinionated CDO skill covering 4 specific decisions, not a generic data governance survey: 1. Can we train our model on this data? (training rights matrix) 2. Warehouse / lakehouse / mesh + build-vs-buy? (data product strategy) 3. What is our customer data worth? (B2B customer-data-as-asset) 4. What data role do we hire next? (data team org evolution) Built under explicit karpathy-coder discipline: - Assumptions surfaced upfront before code (principle 1) - Each tool/reference covers ONE decision; rejected generic-survey scope (#2) - Surgical changes only; caught and reverted scope creep (cs-gc voice spec) before commit (#3) - Verifiable success criteria locked before code; all 3 tools smoke-tested with embedded samples (#4) - karpathy-coder/complexity_checker.py: 0 findings on 3 new tools - karpathy-coder/diff_surgeon.py: 0 findings on staged diff 3 stdlib Python tools with deterministic logic (not pattern-match prose): - ai_training_data_audit.py — 3-dimension matrix (origin x class x use case) with GDPR Art. 6 + EU AI Act + US state citations. Embedded sample tests 7 sources spanning all 3 verdicts (2 NO-GO / 2 MITIGATE / 3 GO). - data_product_strategy_picker.py — Picks warehouse/lakehouse/mesh from profile, returns 6-layer build-vs-buy + 12-month sequencing. Series A sample (8 consumers, 4.5TB, 1 ML model) -> LAKEHOUSE. - data_asset_valuator.py — Strategic value 0-10 from 4 components (exclusivity, freshness, cohort, history), moat strength, M&A multiplier (1.0x-1.7x ARR with carve-out penalties), 3 ranked productization paths. Sample (B2B sales engagement, 380 customers, 47 carve-outs) -> 8.2/10 STRONG moat, 1.33-1.61x multiplier, recommends benchmark report first. 4 references, each answering ONE decision: - ai_training_data_rights.md — Training rights matrix + GDPR decision tree + EU AI Act + US state patchwork (CCPA/CPRA, NYC LL 144, IL BIPA, WA MHMD) - data_product_strategy.md — Architecture kill criteria + 6-layer build-vs-buy + sequencing pattern + anti-patterns - customer_data_as_asset.md — Valuation framework + 3 productization paths + 10-item M&A diligence checklist + contractual constraint audit - data_team_org_evolution.md — 5-stage role map + centralize-vs-embed trigger + 6 anti-patterns (e.g., "hiring data scientist as first hire") cs-cdo-advisor agent (c-level-agents/agents/cs-cdo-advisor.md): - Decision-driven realist voice - Hard rule: does not duplicate engineering data skills (database-designer, observability-designer, rag-architect, llm-cost-optimizer) - Refuses to recommend tooling before naming the consumer /cs:cdo-review slash command: - 6-question forcing interrogation matching /cs:cfo-review pattern - Routes to /cs:gc-review, /cs:ciso-review, /cs:cfo-review, /cs:chro-review cs-cdo-advisor voice spec added to persona-voices.md. Known follow-up (out of scope this PR): cs-general-counsel-advisor voice spec is missing from persona-voices.md (gap from v2.5.1); separate small PR. Updates: - c-level plugin.json: v2.5.1 -> v2.5.2 (30 skills, 10 cs-* agents) - c-level-agents plugin.json: v1.1.0 -> v1.2.0 (10 agents, 18 commands) - marketplace.json: both c-level entries; new CDO keywords (chief-data-officer, cdo, ai-training-data, data-product-strategy, data-as-asset) - c-level CLAUDE.md: CDO row added; agent + count tables updated - Root CLAUDE.md: 264 -> 265 skills, 29 -> 30 cs-* agents, 361 -> 364 tools, 490 -> 494 references, 50 -> 51 commands; v2.5.2 highlight added - CHANGELOG.md: v2.5.2 entry with karpathy-discipline rationale Disclaimer in every output: not legal advice; not a replacement for outside counsel on productization/licensing; not a tactical data engineering skill. https://claude.ai/code/session_012WtZMm5NJHqkYoRqA9fHMN

claude · 2026-05-12T15:27:10Z

Code Review — PR 620: chief-data-officer-advisor (v2.5.2)

Overall: Well-scoped, high-quality addition. Deterministic logic, stdlib-only, embedded smoke-test samples, explicit kill criteria, and decision-driven framing (not generic survey). Counting updates are consistent across all 17 changed files. Ready for merge after addressing the issues below.

Bugs / Logic Issues

1. Dead code in ai_training_data_audit.py (Rule 6)

Rule 1 catches origin == "scraped" unconditionally before any rule below it can run, making the inner check inside Rule 6 unreachable:

if data_class == "anonymous-aggregate":
    if origin == "scraped":
        pass          # dead code; scraped never reaches here
    return ("GO", ...)

The comment "Already handled above" is accurate, but the block is misleading — a reader might wonder why scraped+anonymous-aggregate yields GO. Remove the dead block entirely or collapse to just the return.

2. Domain-count criterion missing from data mesh logic (data_product_strategy_picker.py)

The comment says: requires 25+ consumers across 4+ domains AND federated culture — but the condition only enforces consumers >= 25 and culture and stage in (...). The "4+ domains" requirement has no corresponding code check. Either (a) add a domain_count field to the profile schema and enforce it, or (b) remove the domain mention from the comment so the documented contract matches the implementation.

Style / Quality

3. import textwrap inside _wrap (both scripts)

Both scripts import textwrap inside the _wrap helper on every call. Since textwrap is stdlib, there is no cost reason to defer — move it to the top-level imports.

4. _wrap defined after its callers (both scripts)

render_text calls _wrap, but _wrap is defined several functions later in the file. Python resolves this at call time so it works, but convention is to define helpers before their callers. Consistent pattern in both scripts — worth fixing together.

5. Emoji in stdout output (ai_training_data_audit.py)

The text renderer uses colored-circle emoji as verdict markers. These render fine in most modern terminals but will produce garbled output in older CI/CD log viewers and Windows cmd. A plain-text fallback ([NO-GO] / [MITIGATE] / [GO]) would make the tool more portable. Low priority, but the tool is meant to run in diverse environments.

Documentation / Counts

Counts are consistent across all 17 files:

Skills: 264 -> 265
cs-* agents: 29 -> 30 (c-level-agents plugin: 9 -> 10; total with cs-ceo/cs-cto: 11 -> 12)
/cs:* commands: 17 -> 18
Python tools: 361 -> 364
References: 490 -> 494
c-level-skills plugin: v2.5.1 -> v2.5.2
c-level-agents plugin: v1.1.0 -> v1.2.0

Known follow-up acknowledged: cs-general-counsel-advisor voice spec still missing from persona-voices.md — explicitly deferred and documented. Acceptable scope decision.

Security & Safety

No issues found:

File I/O uses open(..., encoding="utf-8") with proper IOError/JSONDecodeError handling
No shell execution, eval, exec, or network calls
Input enum validation before processing
Legal disclaimer ("NOT legal advice") present in module docstring and SKILL.md
Regulated data (PHI/PCI) routes consistently to MITIGATE or NO-GO; no accidental GO path

Skill Quality

The decision-framing (four named decisions vs. topic survey) is the right approach for a strategic skill, and mirrors the pattern established by general-counsel-advisor. The GDPR Art. 6, EU AI Act Art. 10, and EDPB Opinion 28/2024 citations are current and accurate. The M&A multiplier math in data_asset_valuator.py (1.0x-1.7x ARR, with carve-out and anonymization penalties) is a realistic range for B2B SaaS data assets at Series B+. The hard scope boundary ("does not duplicate engineering data skills") is well-enforced in the agent definition and SKILL.md.

Blockers before merge:

Fix dead code branch in ai_training_data_audit.py Rule 6
Align data mesh domain-count comment with actual code in data_product_strategy_picker.py

Non-blocking (follow-up PRs):

Move import textwrap to top-level imports in both scripts
Define _wrap before its callers
Optional: ASCII fallback for emoji verdict markers

alirezarezvani marked this pull request as ready for review May 12, 2026 18:25

alirezarezvani merged commit 7b46850 into dev May 12, 2026
8 checks passed

alirezarezvani deleted the feature/chief-data-officer-advisor branch May 12, 2026 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(chief-data-officer-advisor): decision-driven CDO skill (v2.5.2)#620

feat(chief-data-officer-advisor): decision-driven CDO skill (v2.5.2)#620
alirezarezvani merged 1 commit into
devfrom
feature/chief-data-officer-advisor

alirezarezvani commented May 12, 2026

Uh oh!

claude Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alirezarezvani commented May 12, 2026

Summary

Built Under Explicit Karpathy-Coder Discipline

What's New (17 files, +2471 / -19 lines)

New skill — c-level-advisor/skills/chief-data-officer-advisor/

New agent — cs-cdo-advisor

New slash command — /cs:cdo-review

Updates

Counts Δ

Verifiable Success Criteria (all met)

Test plan

Known follow-up (out of scope this PR per karpathy principle #3)

Phase 2 remaining (deferred to v2.5.3+)

Disclaimer

Uh oh!

claude Bot commented May 12, 2026

Code Review — PR 620: chief-data-officer-advisor (v2.5.2)

Bugs / Logic Issues

Style / Quality

Documentation / Counts

Security & Safety

Skill Quality

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New skill — `c-level-advisor/skills/chief-data-officer-advisor/`

New agent — `cs-cdo-advisor`

New slash command — `/cs:cdo-review`