Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(chief-data-officer-advisor): decision-driven CDO skill (v2.5.2)#620

Merged
alirezarezvani merged 1 commit into
devfrom
feature/chief-data-officer-advisor
May 12, 2026
Merged

feat(chief-data-officer-advisor): decision-driven CDO skill (v2.5.2)#620
alirezarezvani merged 1 commit into
devfrom
feature/chief-data-officer-advisor

Conversation

@alirezarezvani
Copy link
Copy Markdown
Owner

Summary

Opinionated, decision-driven Chief Data Officer skill — refuses to be a generic data-governance survey. Answers exactly 4 specific decisions every B2B SaaS founder is asking by Series A/B:

  1. Can we train our model on this data?ai_training_data_audit.py
  2. Warehouse, lakehouse, or mesh — and build vs buy?data_product_strategy_picker.py
  3. What is our customer data worth? (M&A, productization) → data_asset_valuator.py
  4. What data role do we hire next?data_team_org_evolution.md reference

This is the third "gstack-can't-touch" plugin in the founder-mode lineup (after c-level-agents v2.5.0 and general-counsel-advisor v2.5.1). gstack covers software-shipping personas only — modern data strategy + AI training rights + M&A data diligence are nowhere in its surface.

Built Under Explicit Karpathy-Coder Discipline

This was the first PR in this repo built under explicit karpathy-coder guidance. Self-audit results:

  • complexity_checker.py on the 3 new Python tools: 0 findings
  • diff_surgeon.py on the staged diff: 0 findings (surgical changes confirmed)

How the principles shaped the PR:

What's New (17 files, +2471 / -19 lines)

New skill — c-level-advisor/skills/chief-data-officer-advisor/

SKILL.md — 4 workflows (training go/no-go, architecture, asset valuation, org roadmap), CDO-specific keywords, hard rule against duplicating engineering data skills.

3 stdlib Python tools with deterministic logic (not pattern-match prose):

Tool What it does Embedded sample result
ai_training_data_audit.py 3-dimension matrix (origin × class × use case) → GO/MITIGATE/NO-GO per source with GDPR Art. 6 + EU AI Act + US state citations 7 sources → 2 NO-GO / 2 MITIGATE / 3 GO
data_product_strategy_picker.py Architecture (warehouse/lakehouse/mesh) + 6-layer build-vs-buy + 12-month sequencing Series A SaaS, 8 consumers, 4.5TB → LAKEHOUSE
data_asset_valuator.py Strategic value 0-10, moat strength, M&A multiplier (with carve-out + anonymization penalties), 3 ranked productization paths B2B sales engagement, 380 customers, 47 carve-outs → 8.2/10 STRONG moat, 1.33-1.61x multiplier, recommends benchmark report first

4 references answering one decision each:

  • ai_training_data_rights.md — Training rights matrix + GDPR Art. 6 lawful basis decision tree + EU AI Act high-risk triggers + US state patchwork (CCPA/CPRA, NYC LL 144, IL BIPA, WA MHMD)
  • data_product_strategy.md — Architecture kill criteria + 6-layer build-vs-buy decision tree + sequencing pattern + 5 anti-patterns
  • customer_data_as_asset.md — 5-component valuation framework + 3 productization paths (benchmark / embedding / license) + 10-item M&A diligence checklist + quarterly contractual constraint audit
  • data_team_org_evolution.md — 5-stage role map (seed → late-stage) + centralize-vs-embed-vs-federated triggers + 6 anti-patterns

New agent — cs-cdo-advisor

Decision-driven realist. Voice: "What decision does this data drive?" Hard rule: never recommend tooling before naming the consumer.

New slash command — /cs:cdo-review

6-question forcing interrogation matching the /cs:cfo-review, /cs:gc-review pattern. Routes to /cs:gc-review, /cs:ciso-review, /cs:cfo-review, /cs:chro-review for cross-functional concerns.

Updates

  • c-level-advisor/c-level-agents/references/persona-voices.md — added cs-cdo-advisor voice spec
  • c-level-advisor/.claude-plugin/plugin.json — v2.5.1 → v2.5.2 (30 skills, 10 cs-* agents)
  • c-level-advisor/c-level-agents/.claude-plugin/plugin.json — v1.1.0 → v1.2.0 (10 agents, 18 commands)
  • marketplace.json — both c-level entries bumped; new CDO keywords
  • c-level-advisor/CLAUDE.md — CDO row added to roles table; agents/counts updated
  • Root CLAUDE.md — 264 → 265 skills, 29 → 30 cs-* agents, 361 → 364 Python tools, 490 → 494 references, 50 → 51 commands
  • CHANGELOG.md — v2.5.2 entry with karpathy-discipline rationale

Counts Δ

v2.5.1 v2.5.2 Δ
c-level skills 29 30 +1
cs-* agents (c-level-agents plugin) 9 10 +1
/cs:* slash commands 17 18 +1
c-level Python tools 27 30 +3
c-level references 57 61 +4
Total repo skills 264 265 +1
Total cs-* agents 29 30 +1

Verifiable Success Criteria (all met)

# Criterion Verified by
1 ai_training_data_audit.py returns GO/MITIGATE/NO-GO per source with specific risk + remediation 7 sources → 2/2/3 verdict split with GDPR + AI Act citations
2 data_product_strategy_picker.py outputs specific role + sequencing recommendations Series A → LAKEHOUSE, 6 layers, 4 quarters
3 Each of 4 references covers one specific decision Headers per reference describe a decision, not a topic
4 SKILL.md keywords are CDO-specific, not data-generic "AI training rights", "consent provenance", "centralize-vs-embed trigger"
5 cs-cdo-advisor voice differentiates from cs-cto / cs-ciso / cs-cpo Voice spec added
6 All 3 tools: stdlib-only, --help, JSON output, exit 0 on sample Smoke test confirmed
7 karpathy complexity_checker: 0 findings Run before commit
8 karpathy diff_surgeon: 0 findings Run on staged diff before commit

Test plan

  • JSON validates (plugin.json × 2, marketplace.json)
  • All 3 Python tools run with --help and --output json
  • Embedded samples produce expected outputs
  • karpathy-coder/complexity_checker: 0 findings
  • karpathy-coder/diff_surgeon: 0 findings
  • CI: Lint, Tests, Docs, Security passes
  • CI: claude-review passes
  • CI: VirusTotal passes (stdlib-only, no new deps)
  • CI: Detect changed skills passes

Known follow-up (out of scope this PR per karpathy principle #3)

  • The cs-general-counsel-advisor voice spec is missing from persona-voices.md (a gap introduced in v2.5.1 but not added to the voice reference). I caught myself adding it during this PR and reverted — it belongs in a separate small PR alongside any other voice-reference cleanup.

Phase 2 remaining (deferred to v2.5.3+)

After this PR, 4 more C-roles remain from the original Phase 2 plan:

  • Chief AI Officer (CAIO)
  • Chief Customer Officer (CCO-customer)
  • VP of Engineering (execution layer below CTO)
  • Chief Communications Officer (CCO-comms)

Plus the broken-paths cleanup in pre-existing cs-ceo-advisor.md / cs-cto-advisor.md.

Disclaimer

The chief-data-officer-advisor skill surfaces strategic decisions but is not legal advice for AI training, not a replacement for outside counsel for productization/licensing, and not a tactical data engineering skill. For tactical data engineering, see engineering/database-designer/, engineering/observability-designer/, engineering/data-quality-auditor/, engineering/sql-database-assistant/, engineering/rag-architect/, engineering/llm-cost-optimizer/.

https://claude.ai/code/session_012WtZMm5NJHqkYoRqA9fHMN


Generated by Claude Code

Opinionated CDO skill covering 4 specific decisions, not a generic data
governance survey:

  1. Can we train our model on this data?  (training rights matrix)
  2. Warehouse / lakehouse / mesh + build-vs-buy?  (data product strategy)
  3. What is our customer data worth?  (B2B customer-data-as-asset)
  4. What data role do we hire next?  (data team org evolution)

Built under explicit karpathy-coder discipline:
- Assumptions surfaced upfront before code (principle 1)
- Each tool/reference covers ONE decision; rejected generic-survey scope (#2)
- Surgical changes only; caught and reverted scope creep (cs-gc voice spec)
  before commit (#3)
- Verifiable success criteria locked before code; all 3 tools smoke-tested
  with embedded samples (#4)
- karpathy-coder/complexity_checker.py: 0 findings on 3 new tools
- karpathy-coder/diff_surgeon.py: 0 findings on staged diff

3 stdlib Python tools with deterministic logic (not pattern-match prose):

- ai_training_data_audit.py — 3-dimension matrix (origin x class x use case)
  with GDPR Art. 6 + EU AI Act + US state citations. Embedded sample tests
  7 sources spanning all 3 verdicts (2 NO-GO / 2 MITIGATE / 3 GO).
- data_product_strategy_picker.py — Picks warehouse/lakehouse/mesh from
  profile, returns 6-layer build-vs-buy + 12-month sequencing. Series A
  sample (8 consumers, 4.5TB, 1 ML model) -> LAKEHOUSE.
- data_asset_valuator.py — Strategic value 0-10 from 4 components
  (exclusivity, freshness, cohort, history), moat strength, M&A multiplier
  (1.0x-1.7x ARR with carve-out penalties), 3 ranked productization paths.
  Sample (B2B sales engagement, 380 customers, 47 carve-outs) -> 8.2/10
  STRONG moat, 1.33-1.61x multiplier, recommends benchmark report first.

4 references, each answering ONE decision:

- ai_training_data_rights.md — Training rights matrix + GDPR decision tree
  + EU AI Act + US state patchwork (CCPA/CPRA, NYC LL 144, IL BIPA, WA MHMD)
- data_product_strategy.md — Architecture kill criteria + 6-layer
  build-vs-buy + sequencing pattern + anti-patterns
- customer_data_as_asset.md — Valuation framework + 3 productization paths
  + 10-item M&A diligence checklist + contractual constraint audit
- data_team_org_evolution.md — 5-stage role map + centralize-vs-embed
  trigger + 6 anti-patterns (e.g., "hiring data scientist as first hire")

cs-cdo-advisor agent (c-level-agents/agents/cs-cdo-advisor.md):
- Decision-driven realist voice
- Hard rule: does not duplicate engineering data skills (database-designer,
  observability-designer, rag-architect, llm-cost-optimizer)
- Refuses to recommend tooling before naming the consumer

/cs:cdo-review slash command:
- 6-question forcing interrogation matching /cs:cfo-review pattern
- Routes to /cs:gc-review, /cs:ciso-review, /cs:cfo-review, /cs:chro-review

cs-cdo-advisor voice spec added to persona-voices.md.

Known follow-up (out of scope this PR): cs-general-counsel-advisor voice
spec is missing from persona-voices.md (gap from v2.5.1); separate small PR.

Updates:
- c-level plugin.json: v2.5.1 -> v2.5.2 (30 skills, 10 cs-* agents)
- c-level-agents plugin.json: v1.1.0 -> v1.2.0 (10 agents, 18 commands)
- marketplace.json: both c-level entries; new CDO keywords (chief-data-officer,
  cdo, ai-training-data, data-product-strategy, data-as-asset)
- c-level CLAUDE.md: CDO row added; agent + count tables updated
- Root CLAUDE.md: 264 -> 265 skills, 29 -> 30 cs-* agents, 361 -> 364 tools,
  490 -> 494 references, 50 -> 51 commands; v2.5.2 highlight added
- CHANGELOG.md: v2.5.2 entry with karpathy-discipline rationale

Disclaimer in every output: not legal advice; not a replacement for outside
counsel on productization/licensing; not a tactical data engineering skill.

https://claude.ai/code/session_012WtZMm5NJHqkYoRqA9fHMN
@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Code Review — PR 620: chief-data-officer-advisor (v2.5.2)

Overall: Well-scoped, high-quality addition. Deterministic logic, stdlib-only, embedded smoke-test samples, explicit kill criteria, and decision-driven framing (not generic survey). Counting updates are consistent across all 17 changed files. Ready for merge after addressing the issues below.


Bugs / Logic Issues

1. Dead code in ai_training_data_audit.py (Rule 6)

Rule 1 catches origin == "scraped" unconditionally before any rule below it can run, making the inner check inside Rule 6 unreachable:

if data_class == "anonymous-aggregate":
    if origin == "scraped":
        pass          # dead code; scraped never reaches here
    return ("GO", ...)

The comment "Already handled above" is accurate, but the block is misleading — a reader might wonder why scraped+anonymous-aggregate yields GO. Remove the dead block entirely or collapse to just the return.

2. Domain-count criterion missing from data mesh logic (data_product_strategy_picker.py)

The comment says: requires 25+ consumers across 4+ domains AND federated culture — but the condition only enforces consumers >= 25 and culture and stage in (...). The "4+ domains" requirement has no corresponding code check. Either (a) add a domain_count field to the profile schema and enforce it, or (b) remove the domain mention from the comment so the documented contract matches the implementation.


Style / Quality

3. import textwrap inside _wrap (both scripts)

Both scripts import textwrap inside the _wrap helper on every call. Since textwrap is stdlib, there is no cost reason to defer — move it to the top-level imports.

4. _wrap defined after its callers (both scripts)

render_text calls _wrap, but _wrap is defined several functions later in the file. Python resolves this at call time so it works, but convention is to define helpers before their callers. Consistent pattern in both scripts — worth fixing together.

5. Emoji in stdout output (ai_training_data_audit.py)

The text renderer uses colored-circle emoji as verdict markers. These render fine in most modern terminals but will produce garbled output in older CI/CD log viewers and Windows cmd. A plain-text fallback ([NO-GO] / [MITIGATE] / [GO]) would make the tool more portable. Low priority, but the tool is meant to run in diverse environments.


Documentation / Counts

Counts are consistent across all 17 files:

  • Skills: 264 -> 265
  • cs-* agents: 29 -> 30 (c-level-agents plugin: 9 -> 10; total with cs-ceo/cs-cto: 11 -> 12)
  • /cs:* commands: 17 -> 18
  • Python tools: 361 -> 364
  • References: 490 -> 494
  • c-level-skills plugin: v2.5.1 -> v2.5.2
  • c-level-agents plugin: v1.1.0 -> v1.2.0

Known follow-up acknowledged: cs-general-counsel-advisor voice spec still missing from persona-voices.md — explicitly deferred and documented. Acceptable scope decision.


Security & Safety

No issues found:

  • File I/O uses open(..., encoding="utf-8") with proper IOError/JSONDecodeError handling
  • No shell execution, eval, exec, or network calls
  • Input enum validation before processing
  • Legal disclaimer ("NOT legal advice") present in module docstring and SKILL.md
  • Regulated data (PHI/PCI) routes consistently to MITIGATE or NO-GO; no accidental GO path

Skill Quality

The decision-framing (four named decisions vs. topic survey) is the right approach for a strategic skill, and mirrors the pattern established by general-counsel-advisor. The GDPR Art. 6, EU AI Act Art. 10, and EDPB Opinion 28/2024 citations are current and accurate. The M&A multiplier math in data_asset_valuator.py (1.0x-1.7x ARR, with carve-out and anonymization penalties) is a realistic range for B2B SaaS data assets at Series B+. The hard scope boundary ("does not duplicate engineering data skills") is well-enforced in the agent definition and SKILL.md.


Blockers before merge:

  • Fix dead code branch in ai_training_data_audit.py Rule 6
  • Align data mesh domain-count comment with actual code in data_product_strategy_picker.py

Non-blocking (follow-up PRs):

  • Move import textwrap to top-level imports in both scripts
  • Define _wrap before its callers
  • Optional: ASCII fallback for emoji verdict markers

@alirezarezvani alirezarezvani marked this pull request as ready for review May 12, 2026 18:25
@alirezarezvani alirezarezvani merged commit 7b46850 into dev May 12, 2026
8 checks passed
@alirezarezvani alirezarezvani deleted the feature/chief-data-officer-advisor branch May 12, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants