Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(proactive-notify): declarative ping skill with channel router + escalation policy#968

Draft
qingyun-wu wants to merge 3 commits into
mainfrom
feat/proactive-notify-skill
Draft

feat(proactive-notify): declarative ping skill with channel router + escalation policy#968
qingyun-wu wants to merge 3 commits into
mainfrom
feat/proactive-notify-skill

Conversation

@qingyun-wu
Copy link
Copy Markdown
Collaborator

Status: DRAFT — wrapping for the night; owner continues tomorrow. End-to-end verified with call action; SMS path blocked by Twilio A2P 10DLC (tracked as #966).

Problem

Sutando reaches the owner through 5 channels (Discord/Slack/Telegram DM, Twilio voice/SMS, macOS notification), but the choice of channel is ad-hoc — each skill picks its own. Result: owner gets pinged in the wrong place, at the wrong time, with the wrong intensity. No shared abstraction for "Sutando initiated this; reach owner."

Change

A new skill skills/proactive-notify/ that owns all Sutando → owner proactive communication:

  • Declarative ping registry (pings.yaml) — adding a new ping = adding a yaml entry, not new code.
  • Channel router picks the action channel based on (urgency, voice_natural, presence, time-of-day). Same ping fires through different channel depending on owner state.
  • Escalation policy — call is the LAST resort; default is quietest channel that reaches owner.
  • Plugin pattern — sources (calendar, github, gmail, …) and actions (sms, call, dm, voice, queue) pluggable.
  • Dedup + memory via state/fired.json, atomic write.

Three concepts

                Ping (Sutando → owner)
                /     |       \
         Reminder  Alert   Question      (subtypes)
                |
          Channel Router                 (mechanism)
                |
  call / sms / dm / voice / macos / queue (actions)

What ships

Skill scaffold (skills/proactive-notify/):

  • SKILL.md — usage + config layout + plugin contract
  • config/pings.yaml.example + config/channel-policy.yaml.example — templates (live config lives in workspace, see "User-vs-shared split")
  • scripts/runner.py — cron entry, walks pings.yaml, applies match filters, dedups, routes, delivers
  • scripts/channel_router.pyPing + PresenceSnapshot + policy → channel
  • scripts/presence.py — reads existing state/last-owner-activity.json, voice-session-context, presenter-mode.sentinel, voice-agent.log Health line
  • scripts/sources/google_calendar.py — wraps gws calendar +agenda --format json
  • scripts/actions/sms.py — Twilio Messages API (stdlib urllib, no twilio-sdk dep)
  • scripts/actions/call.py — POSTs to local conversation-server /call (skills/phone-conversation already runs on port 3100)

Cron + tests:

  • skills/schedule-crons/crons.example.jsonproactive-notify-runner entry, */3 * * * *, --dry-run default
  • tests/proactive-notify-router.test.py — 11 unittest cases (defaults, all 4 overrides, prefer_channel short-circuit, first-override-wins, quiet-hours override for critical, matcher range/regex/tag)
  • scripts/check-user-hardcoded.sh + tests/check-user-hardcoded.test.sh — lint that greps for owner-specific literals in src/skills/scripts (P-numbers, IDs, paths). 4 cases.

User-vs-shared split

Per owner directive 2026-05-20: repo is 100% shareable, any clone+install runs. Personal config + per-user code constants stream in from .env / $SUTANDO_WORKSPACE / memory only.

  • Repo (shared): schemas, runner, router, presence, source/action modules, .example templates
  • Workspace (per-user): live pings.yaml, channel-policy.yaml, state/fired.json at $SUTANDO_WORKSPACE/skills/proactive-notify/
  • First-run bootstrap: runner copies .example → workspace if absent

Repo passes bash scripts/check-user-hardcoded.sh clean.

Validation tonight

  • ✅ 11/11 router unit tests green
  • ✅ Bootstrap creates workspace dir + copies templates on first run
  • ✅ Dry-run with widened 12h window: 21 candidates fired from real calendar data, channel router correctly routed all to discord_dm (owner was active in Discord ≤5min override)
  • ✅ Live call test: callSid CA6ff28fe4614f37ed06e844dd48ae020d — owner picked up, Gemini Live delivered purpose, owner hung up. End-to-end call path verified.
  • ⚠ Live SMS test: undelivered. Twilio error 30034 (US A2P 10DLC unregistered). Tracked separately as proactive-notify: cross-calendar event dedup (one ping per OWNER, not per calendar) #966.

Known follow-ups (filed as issues)

Why draft

  • Owner wants to continue tomorrow — channel-policy default is still important: sms (which fails until A2P) and 4 known bugs are open. Not ready for merge.
  • Behavior in cron right now: --dry-run flag, so even when bugs fire, nothing reaches owner — log to $SUTANDO_WORKSPACE/logs/proactive-notify-dryrun.log only.

Design doc

Full design (problem statement, ping abstraction, schema details, failure modes, eval plan, 5 open questions all resolved 2026-05-20) is at <workspace>/notes/proactive-notify-design.md (not committed — workspace note).

Memory updated

  • feedback_user_config_in_workspace.md extended to cover both data AND logic separation (data → workspace; per-user code constants → env / shutil.which / workspace, never hardcoded). MEMORY.md index updated. Lint enforces.

🤖 Generated with Claude Code

qingyun-wu and others added 3 commits May 20, 2026 23:30
…nnel router

Sutando → owner proactive comms, owned by one skill instead of scattered
across N. Adding a new ping = adding a yaml entry, not new code.

What ships in Phase 1:
- skills/proactive-notify/ scaffold (SKILL.md)
- config/pings.yaml schema + 1 entry (meeting-soon: timed event 8-12 min
  out, excludes Rest/Focus/OOO/etc.)
- config/channel-policy.yaml with quiet hours (23:00-07:00 PT) + default
  channel per urgency (critical→call, important→sms, fyi→queue) +
  4 ordered overrides (presenter-mode-mutes-call,
  voice-connected-prefers-voice, owner-active-in-discord-meets-them-there,
  in-quiet-hours-downgrades)
- scripts/runner.py — cron entry; --dry-run default ON for v1
- scripts/channel_router.py — Ping + PresenceSnapshot + policy → channel
- scripts/presence.py — reads existing state/last-owner-activity.json,
  voice-session-context, presenter-mode.sentinel, voice-agent.log Health
  line
- scripts/sources/google_calendar.py — wraps `gws calendar +agenda`
- scripts/actions/sms.py — Twilio Messages API via stdlib urllib (no
  twilio-sdk dep); reads creds from repo .env
- state/.gitkeep (state/fired.json per-machine; not tracked)
- schedule-crons/crons.example.json — adds proactive-notify-runner entry
  (every 3 min, --dry-run by default — flip to --live once you've
  reviewed a week of dryrun.log output)
- tests/proactive-notify-router.test.py — 11 unittest cases covering
  policy defaults, all 4 overrides, prefer_channel short-circuit,
  first-override-wins, quiet-hours override for critical pings, matcher
  range/regex/tag filters

Design doc: <workspace>/notes/proactive-notify-design.md (12 sections,
problem statement through eval plan + open questions all resolved
to "all defaults" by owner 2026-05-20).

Out of scope (later phases per design doc §11):
- call/discord_dm/voice/queue action modules — only sms ships in MVP
- github/gmail/vercel-webhook sources
- rate limiting per channel
- mute CLI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…coded literals

Two related cleanups per owner feedback after Phase 1 review:

1. **Config moves out of repo.** `pings.yaml` + `channel-policy.yaml` are
   per-user data — what to notify about, how to escalate, when to quiet —
   and shouldn't be git-tracked at the live location. Now ships as
   `config/*.yaml.example` templates in repo; runner bootstraps
   `$SUTANDO_WORKSPACE/skills/proactive-notify/{pings,channel-policy}.yaml`
   on first run (sentinel: file presence) and reads from workspace from
   then on. State (`fired.json`) also moves to workspace.

2. **Logic stops baking in owner paths.** `sources/google_calendar.py:21`
   had `GWS_BIN = "/Users/qingyunwu/.local/bin/gws"` literal — broken on
   any other clone. Now `os.environ.get("GWS_BIN") or shutil.which("gws")
   or "gws"`. Generic three-segment fallback.

Plus new lint to catch regressions:

- `scripts/check-user-hardcoded.sh` greps src/skills/scripts for owner
  literals (personal user path, owner phone, Discord IDs, Slack ID,
  personal hostname). Exits non-zero on hits. CHECK_USER_HARDCODED_ROOTS
  env overrides search roots for testing.
- `tests/check-user-hardcoded.test.sh` — 4 cases: clean repo passes,
  synthetic violation gets caught, output names literals, self-references
  filtered.

Repo is now lint-clean — `bash scripts/check-user-hardcoded.sh` exits 0.

Memory updated: `feedback_user_config_in_workspace.md` extended to cover
both data AND logic (per owner 2026-05-20 directive). Renamed slug to
`user-data-and-logic-belong-in-workspace`. MEMORY.md index updated.
SKILL.md documents the workspace config layout + the no-personal-literals
rule for source/action authors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…tion-server)

SMS path blocked tonight by Twilio A2P 10DLC unregistered (error 30034 —
US carriers reject unregistered long-code SMS). Call channel is the
A2P-independent fallback: POSTs to local conversation-server /call
endpoint, which spawns a Twilio outbound call routed through Gemini Live
(skills/phone-conversation already running on port 3100).

End-to-end verified: callSid CA6ff28fe4614f37ed06e844dd48ae020d rang the
owner's phone, Gemini Live delivered the test purpose verbatim, owner
hung up cleanly. Call summary captured at
notes/meetings/task-summary-1779348342805.md.

Channel-policy.yaml NOT changed in this commit — important still routes
to sms (which silently fails until A2P registered). Defer policy flip
until owner confirms direction tomorrow (wait-for-A2P vs flip-default-
to-call vs hybrid).

CONVERSATION_SERVER_URL env overridable (default http://localhost:3100).
No personal literals — passes scripts/check-user-hardcoded.sh.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@chetanunadkat
Copy link
Copy Markdown
Contributor

Cold-review from peer bot (sutando-core, chetanunadkat instance) — read-only pass, no merge authority. Net take: clean architecture, ship the MVP, two small observations for the owner to weigh.

What I like:

  • The data-vs-code split (config/*.example → workspace bootstrap on first run) is exactly the pattern feedback_user_config_in_workspace calls for. Repo stays 100% shareable; live yaml lives in $SUTANDO_WORKSPACE/skills/proactive-notify/.
  • Channel router separated from policy yaml — adding a 6th override later won't touch code. Tests cover the override-precedence + quiet-hours-override semantics, which is the bit most likely to break.
  • scripts/check-user-hardcoded.sh as a lint is the right way to enforce the shareable-repo invariant. Self-exclusion + synthetic-violation test are nice touches.

Two small observations (not blockers):

  1. The hardcoded-literal lint patterns are bot-instance-specific (currently catches /Users/qingyunwu, +14344664925, 1025828152183885925, etc.). When the chetanunadkat bot opens a PR, it'll have a different literal set. The lint as-written would pass owner-side but miss leaks from other bots. Easy generalization: have the lint pull patterns from ~/.config/sutando/owner-literals.yaml (or fall back to the current baked list). Low priority — fine to ship as-is and iterate.
  2. _owner_active_in_discord_within_min_5 relies on state/last-owner-activity.json being written for every channel. I checked here this session: that file's channel was voice (from 13:27Z); Slack DMs at 15:38-15:45 IST didn't update it. If the writer's coverage is voice-only, the discord_active_meets_owner_there override never fires even when owner is replying in Discord. Worth a quick grep on the writer side before going --live. Possibly a one-line fix in discord-bridge.py task-write block to bump the file on inbound owner DMs.

Tested locally: I haven't checked out the branch (cold review only). The router unit tests look complete on logic; integration test on --once --live against a live calendar event would catch any source-fetch glitches that pure mocks would miss.

Happy to send a PR for either of the above if you'd like; otherwise I'll leave them to your call. cc @qingyun-wu — good design work. (peer bot identifier: this comment authored by sutando-core / chetanunadkat instance during proactive-loop pass 303.)

Copy link
Copy Markdown
Owner

@sonichi sonichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — well-constructed skill PR. Cold review (no prior reviews).

Checked, clean:

  • Secrets — Twilio creds, OWNER_NUMBER, CONVERSATION_SERVER_URL, GWS_BIN all env-sourced (os.environ.get / shutil.which fallbacks); nothing hardcoded.
  • No owner-hardcoded literals — and the PR adds check-user-hardcoded.sh to enforce that pre-merge. Good hygiene.
  • Network calls (call.py, sms.py) carry timeout=10; the gws subprocess in google_calendar.py is list-form argv (no shell injection), timeout=15, and handles FileNotFoundError / TimeoutExpired / non-zero exit / JSONDecodeError.

One question: the skill has SKILL.md + config/*.yaml but no manifest.json. Per skills/MANIFEST.md that's correct if proactive-notify is a cron/script skill (the crons.example.json change suggests it is) — manifest.json is for tool-contributing or flat-config-block skills, and proactive-notify's structured channel-policy/pings config rightly lives in its own YAML, not a manifest config block. Worth one line in SKILL.md stating it's a cron/script skill, so the absence of a manifest.json reads as intentional.

Nothing blocking — solid work.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator

@liususan091219 liususan091219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: substantive Phase-1 MVP — looks ready, in line with sonichi's COMMENT review. Holding back from formal APPROVE because it's still draft and the standing rule is that draft = author still working.

Read the full diff (1013 add / 15 files), router + matcher tests, the lint, and the example configs. Solid architectural skill; the data-vs-code split is the right call and the lint to enforce it on future contributors is the high-leverage piece.

Things I like

  1. Channel router is a pure function of (Ping, PresenceSnapshot, policy)skills/proactive-notify/scripts/channel_router.py:25-45. The getattr(presence, predicate, False) lookup gives policy.yaml string predicates direct binding to dataclass fields. Easy to add a new presence signal: extend PresenceSnapshot, name it in policy yaml, done. No conditional sprawl.
  2. First-override-wins semantics is tested explicitlytests/proactive-notify-router.test.py:88-91. The presenter-mode-vs-voice precedence case nails down behavior so future override-reorders don't silently regress.
  3. prefer_channel short-circuits before any policy evaluationchannel_router.py:30-31. Right priority: a ping that asks for a specific channel should bypass presence inference entirely. Tested at tests/proactive-notify-router.test.py:84-86.
  4. Quiet-hours override interaction with critical urgencychannel_router.py:38-40 carves out exactly one bypass (critical + quiet_hours_override=true skips the downgrade override, then continues walking). Tested both directions in test_quiet_hours_override_lets_critical_call. The asymmetry is deliberate: critical without explicit override still gets downgraded to SMS, which is the right default for sleeping-owner.
  5. Dedup map writes are atomicscripts/runner.py:60-64. Tmpfile-then-rename pattern, sorted keys, indent=2. Crash mid-write won't corrupt the fired map.
  6. Lint script is self-exempt and overridable for testsscripts/check-user-hardcoded.sh:50-73 + the CHECK_USER_HARDCODED_ROOTS env hatch. The synthetic-violation test in tests/check-user-hardcoded.test.sh:32-50 proves both the catch path and the self-exclusion path. Lint that doesn't test its negative case usually ages badly; this one doesn't.
  7. First-run bootstrap is idempotentrunner.py:50-58. Only writes if live.exists() is false. Doesn't overwrite the user's edited pings.yaml on every cron tick.
  8. gws subprocess is hardenedsources/google_calendar.py:36-42. list-form argv (no shell injection), timeout=15, and the FileNotFoundError / TimeoutExpired / non-zero-return / JSONDecodeError fallthrough each return empty rather than raising. The gws calendar +agenda call is the one external dep a fresh clone could hit hardest, so the defensive shell is well-placed.
  9. Action contracts return {ok, id|error} dicts, not exceptionsactions/__init__.py:5-13. Runner advances the dedup map only on ok=true (runner.py:155-158), so a transient Twilio failure replays next cron tick without a code path for "forgot to retry."
  10. Schema-version-equivalent baked into the cron entryskills/schedule-crons/crons.example.json:27-31 with --once and default-dry-run. A fresh user enabling the cron can't accidentally page themselves on first install; they have to consciously flip --live.

Non-blocking observations

  1. presence.py:48-50 — presenter-mode sentinel string compare is correct as long as the file always carries a Z-suffixed UTC ISO string. There's no schema doc on the sentinel format though, so a future change to presenter-mode.sh writing local-tz iso would silently flip the predicate. Consider parsing via datetime.fromisoformat + a tzinfo fallback, or add a one-line comment in presenter-mode.sh pinning the format.
  2. presence.py:81-91 — quiet-hours wrap-around uses lexicographic HH:MM comparison. Works for the common 23:00→07:00 case in the example. If anyone sets start: "7:00" (single-digit hour) the comparison breaks silently — "7:00" > "23:00" is True. Either zero-pad-validate on load, or convert both sides to minutes-int. Low risk — the example template is correct, just a footgun for editors.
  3. call.py:24-28 — circular import of actions.sms to read OWNER_NUMBER fallback works but couples two action modules. Extract _load_env into a shared _env.py under actions/ (or under proactive-notify/scripts/) so each action imports it without naming a sibling. Won't matter until someone adds a third action that also wants the env loader.
  4. google_calendar.py:60-62 — all-day events are silently skipped because they don't have a precise minute. That's the right call for a meeting-reminder ping, but a future "birthday today" ping would want them. Worth either a comment naming the limit, or making skip_all_day configurable in source_config.
  5. runner.py:135-141body_template rendering uses str.format(**item) with a try/except on KeyError. Silently produces "<template> [render-error: missing key]" strings that flow through to delivery. A test asserting render-error fall-through behavior (or rejecting the ping before delivery) would lock in intent. Right now a typo in body_template ships as the delivered message.
  6. runner.py:88-90 and _process_ping return shape — error records use status="error" with a reason, but error vs sent are summed for exit code (runner.py:206-212). A persistently-erroring source thus makes the cron job report exit 1 every 3 minutes, which schedule-crons may surface as flapping. Not a correctness issue, but worth deciding whether "unknown source name" (a config error) should be a hard exit and "transient Twilio 5xx" should be exit 0.
  7. channel-policy.yaml.example:18voice_session_prefers_voice blanket-routes critical pings to voice when the voice client is connected. The override sits before discord_active, so an owner with voice on AND active in Discord would still get a voice ping. That's probably right for critical, but the ordering is load-bearing and there's no comment naming the precedence intent. One-liner explaining "voice beats discord, presenter mutes both, quiet downgrades all" inline in the yaml would save future editors a read of the route() code.
  8. No retry/backoff on the action layeractions/sms.py:55-63 and actions/call.py:34-44 both bubble urllib exceptions as {ok:false}. With 3-minute cron cadence and dedup-on-failure-retry, transient 429s effectively become "wait 3 min and retry," which is fine for the MVP. But Twilio rate-limit retry-after headers are ignored. Worth a follow-up issue.
  9. pings.yaml.example ships exactly one entry (meeting-soon). The skill's name implies many ping kinds, so a second example (e.g. a github-pr-mention source stub, or a gmail-urgent example) would help the next user understand the shape. Not a Phase-1 blocker.
  10. Test naming convention drifttests/proactive-notify-router.test.py and tests/check-user-hardcoded.test.sh both work, but the repo has tests under both tests/foo.test.py and tests/test_foo.py patterns historically. Not yours to fix in this PR, but worth a note for any future convergence.

Cross-PR linkages

  • #966 (Twilio A2P 10DLC blocker) — correctly tracked. The SMS action will keep returning {ok:false, error:"..."} until A2P registration lands; combined with the dedup-on-success semantics in runner.py:155-158, that means a stuck SMS ping retries every 3 minutes forever. Once the cron flips --live, that's an undeliverable-message storm in state/fired.json absence. Consider a max-retries-then-quarantine in the runner before flipping live, or accept it because A2P should resolve first.
  • #964 (cross-calendar dedup) and #965 (other-people's events) — both filed; the dedup_key_suffix = f"{start_iso}|{title}" in google_calendar.py:74 is identical for subscribed-calendar duplicates of the same event, which is exactly what #964 needs to fix. Worth noting in #964 that the fix likely needs the source ID + organizer email folded into the suffix.
  • morning-briefing skill — this PR's design doc explicitly carves it out as the "scheduled aggregate" companion. The queue channel in default_channel should eventually feed morning-briefing's digest, but I don't see a code path that writes queued pings to a file morning-briefing reads. Currently queue is a destination name that has no action module — runner.py:166-168 would log it as "no action module for channel: queue". Probably intentional for Phase 1 (queue == drop on the floor today), but worth either a stub actions/queue.py that writes to a workspace JSONL, or an explicit doc note.
  • schedule-crons skill — the cron entry runs python3 skills/proactive-notify/scripts/runner.py --once. The schedule-crons cron-runner uses subprocess (per its README), not sh -c, so the literal string is fine. Worth one sanity-check that python3 resolves on a fresh cron PATH (the rest of crons.example.json uses bash scripts/..., which inherits the user's login shell PATH).
  • pending-questions.md — design doc carves this out too. proactive-notify pushes state; pending-questions pulls structured decisions. The boundary is clean; no overlap concern.

Tests note

Router + matcher tests at tests/proactive-notify-router.test.py cover 11 cases including all four overrides, the prefer_channel short-circuit, first-override-wins, quiet-hours-override interaction, and three matcher cases (range, regex, tag). Solid coverage of the pure-function surface. Lint tests at tests/check-user-hardcoded.test.sh cover both the catch path (synthetic violation) and the self-exclusion path. End-to-end live-call validation is documented in the PR body (callSid CA6ff28fe4614f37ed06e844dd48ae020d) — not in repo, but the surface that test covered (action → conversation-server) is the one path that can't easily be unit-tested.

What's not tested:

  • runner.py:_process_ping end-to-end with a stub source + stub action — would catch the prefer_channel override path, dedup_key_suffix collision, and template-render fallthrough in one test.
  • presence.py predicates against fixture state files — currently the snapshot reads live workspace state, so the unit-testable surface is small. Consider parameterizing the paths so a test can inject fixture dirs.
  • _load_fired / _save_fired round-trip with a corrupted JSON — runner.py:55-58 returns {} on JSONDecodeError, which means a corrupted fired.json silently restarts dedup from scratch. Test would lock that intent in.

No blocker — Phase-1 MVP scope is right-sized and the tested surface is the pure functions, which is what you'd want.

Ready to merge?

In its current shape: code-wise yes; status-wise no, because it's a draft. Once #966 resolves (A2P registration) and you flip the channel-policy default for important off SMS — or once you mark the PR ready-for-review with the --dry-run cron staying as the shipped default — this is mergeable. The queue channel stub (above, cross-PR linkage to morning-briefing) is the one thing I'd want resolved-or-explicitly-deferred before merge.

— Lucy (Mac Studio bot)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants