Thanks to visit codestin.com
Credit goes to github.com

Skip to content

putmanmodel/cde-lite-v0.2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDE Lite v0.2

CDE Lite v0.2 is a local Python and CLI tool with two bounded paths: transcript analysis and narrow reply correction. It is a deterministic, inspectable baseline for small controlled cases, not a general conversational system.

It can analyze speaker-tagged transcripts for bounded drift signals, and it can run a small correction loop over a limited set of direct-answer repair shapes. The repository also includes a curated real-eval batch and a lightweight runner that writes machine-readable and human-readable outputs under real_ai_eval/.

This repository is licensed under CC-BY-NC-4.0. See LICENSE.

What CDE Lite v0.2 Is

CDE Lite is a local, text-first runtime for a narrow operational problem: surfacing bounded signs of conversational drift in plain text transcripts and applying a small deterministic repair loop to narrow reply failures.

The analyzer is rule-based and inspectable. The correction path is also intentionally small: it can emit a direct revised reply when the answer is recoverable from the bad reply, emit a safe known-unknown fallback when it is not, and compress a small safe subset of oversized article-mode or broad-background replies when a recoverable core answer is clearly present.

The repository is meant for people who want a readable baseline rather than a broad system:

  • people manually reviewing small transcript sets
  • developers building bounded local tooling
  • researchers or operators who want inspectable behavior more than broad coverage

v0.2 Scope

CDE Lite v0.2 is a local Python and CLI tool for bounded answer-drift triage and correction on narrow direct-answer cases. It focuses on small response shapes such as yes/no, count, attribute, and fallback handling.

For transcript analysis, the current focus is bounded drift signals such as procedural pressure, gradual escalation, rhetorical spikes, and repeated interactional friction. For reply correction, the supported direct-revision families are narrow: yes/no, count, measurement, object attribute, and time or opening-time style cases.

The correction loop remains a narrow control-layer demonstration:

user message
→ initial reply
→ signal snapshot
→ diagnosis
→ semantic classification
→ repair strategy
→ deterministic revised reply
→ optional local second pass (when allowed)

Pair-level direct revision is gated by a small pair-sufficiency check. If the selected user/reply pair appears underdetermined on its own, direct revision can be blocked and the flow falls back to the existing conservative path instead.

The correction path also surfaces inspectable fields such as direct_question_type, overanswer_candidate, overanswer_reason, recoverable_core, and overanswer_compression_used.

What It Does Not Try To Do

This is intentionally not a general answer-correction engine, not a general summarizer, not a broad hallucination solver, and not a general dialogue agent.

It is also not a UI, not a networked service, not machine learning, not the full CDE stack, and not a broad conversation-understanding system. It does not add memory, training, or autonomous revision behavior by default.

Current Eval Structure

  • release eval batch
  • bug repro pack
  • typo holdout

The frozen markdown artifacts for this structure live under evals/. They were split from the completed Llama 3.2 3B external probe triage and are meant to keep release checks, bug reproduction, and typo robustness separated.

Known Failure Patterns

  • generic clarification on already narrow questions
  • wrapper/control-phrase brittleness
  • unsupported hard yes/no answers
  • invented evidence framing under directive prompts
  • drift from narrow answer shape into background, advice, or category prose
  • weak conflict handling on contradictory count prompts

Input Format

Use one non-empty line per turn in the form:

SPEAKER: utterance text

Example:

ALICE: I need the report today.
BOB: I said I am still working on it.

Analyze One Transcript

From the project root:

PYTHONPATH=src python3 -m cde_lite.cli analyze examples/sample_transcript.txt --out output/run_001

This writes:

  • summary.txt
  • events.json
  • audit.jsonl

Run The Minimal Correction Loop

PYTHONPATH=src python3 -m cde_lite.cli correct \
  --user-message "I'm asking generally what your return policy is." \
  --reply-message "I can look into that for you. Could you provide your order number?"

The correction-loop output is a narrow local demonstration artifact. It includes:

  • the original user message and initial reply
  • a compact heuristic signal snapshot
  • the unchanged CDE Lite diagnosis
  • semantic classification and selected repair strategy
  • a deterministic repair instruction
  • a deterministic revised-reply fallback first
  • an optional local second-pass generation prompt and override if a local hook is supplied

This path stays conservative. Broader location questions, category mismatches, why-style explanations, and ambiguous multi-turn interpretation repairs remain outside the current correction scope. Fake-policy templates, recommendation-list replies, and broader mixed-content explainers are also blocked rather than treated as safe compression targets.

You can also pass --generate-revision to request an automatic second-pass reply, but this repository only attempts that if a local generator hook is explicitly supplied by the caller. Without that hook, the tool still prints the revised-reply prompt so the flow remains usable offline and by hand.

Run The Evaluation Pack

The repository includes a small hand-authored evaluation pack under evaluation/cases. It demonstrates:

  • quiet no-drift conversations
  • procedural pressure
  • gradual escalation
  • spike-and-recovery behavior
  • agent-caused friction and deflection
  • low-intensity passive-aggressive or soft-loop cases
PYTHONPATH=src python3 -m cde_lite.cli evaluate evaluation/cases --out output/evaluation_run

This writes one subfolder per case plus a top-level evaluation_summary.txt.

Run The Curated Real Eval Batch

The curated correction-loop baseline batch lives at real_ai_eval/first_eval_batch_v1.json.

PYTHONPATH=src python3 -m cde_lite.cli real-eval real_ai_eval/first_eval_batch_v1.json

By default this writes to real_ai_eval/results/first_eval_batch_v1:

  • results.json is the machine-readable per-case output artifact.
  • summary.md is the human-readable batch summary.

To extend the batch later, add another case object to the JSON with case_id, bucket, user_message, bad_reply, expected_broad_outcome, and the small set of expected_key_fields you want compared conservatively.

How To Read The Outputs

  • observed low-level signals: all turns with any detected signal
  • flagged turns: turns that reached medium or high severity
  • events: emitted drift events, kept more conservative than raw signal observation
  • persistence and freeze: intentionally conservative carry-state markers that should only appear in stronger sustained cases

Observed and flagged are intentionally different:

  • Observed = low + medium + high signal turns
  • Flagged = medium + high only

This means a case can show mild drift without producing flagged events.

Output Files

  • summary.txt: human-readable per-run summary
  • events.json: stable event records for flagged events
  • audit.jsonl: canonical line-by-line audit records for emitted events
  • evaluation_summary.txt: rollup table plus counts across the evaluation pack

Current Limits

CDE Lite is still intentionally narrow, and its limits should be read literally:

  • it relies on explicit rule patterns rather than deep language understanding
  • it does not resolve meaning the way a person would across long or ambiguous conversations
  • it can miss subtle relational drift when the signal is mostly contextual rather than lexical or structural
  • it can over- or under-weight edge cases outside the current evaluation pack
  • it does not produce global judgments about intent, truthfulness, safety, or sentiment
  • it should not be treated as an autonomous decision system

The analyzer in v0.2 is deterministic placeholder runtime logic. It uses a narrow, explainable ruleset such as lexical intensity markers, procedural pressure, interactional friction, rhetorical bursts, bounded persistence, and a conservative freeze flag. It is meant to serve as a usable bridge toward fuller bounded runtime integration, not as a final conversational inference engine.

The current version does not use timing, prosody, or other non-text interaction cues, even though such bounded inputs could strengthen resolution and attribution in future versions.

Release Readiness

This repository is coherent enough to run, inspect, and discuss seriously, but it is not yet ready for broad public promotion.

Before that would be justified, a few things should be true:

  • the evaluation pack should cover a wider range of realistic low-intensity and mixed cases
  • case expectations should be stable enough that calibration changes can be judged against an explicit baseline
  • summary interpretations should continue to improve without becoming more speculative
  • the analyzer should stay narrow while becoming more reliable on the specific patterns it claims to handle
  • a stranger should be able to run the tool on their own transcript and understand both the outputs and the limits without extra explanation

Packaging Status

GitHub is the current release path for CDE Lite v0.2.

PyPI packaging is not available yet. It may be added in a later release once the install surface, CLI entrypoints, and public repo contents are finalized.

Project Contact

Stephen A. Putman
Email: [email protected]
GitHub: @putmanmodel

About

CDE Lite v0.2 is a local Python/CLI tool for bounded transcript drift analysis and narrow reply correction on direct-answer cases. It emphasizes deterministic, inspectable behavior over broad conversational coverage.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages