An agentic AI system that learns from scientist interaction to inspect, analyze, and classify high-dimensional time series data — with persistent memory that improves across sessions.
A video demonstration of the system can be found here: https://www.youtube.com/watch?v=k1L-xqT8Owo
A deployable url available to try out the system after approval: https://yue-crew-projects.org/
Scientists across neuroscience, physiology, and sensor engineering routinely work with high-dimensional, high-sample-rate time series — multi-channel neural recordings at 1-2 kHz, wearable sensor streams, industrial vibration data. Raw samples exceed both human inspection capacity and LLM context windows, yet the core tasks (artifact rejection, event marking, signal classification) all demand domain expertise applied at scale. No universal thresholds exist — what counts as "muscle artifact" or "clean signal" varies by hardware, brain region, patient population, and analytical goals.
This system bridges the gap by converting raw time series into intermediate representations (spectrograms, PSD plots, statistical features) that both humans and LLMs can reason about, then using interactive human-AI collaboration to formalize expert knowledge into interpretable detection rules. Persistent memory ensures that learned parameters, signal patterns, and processing procedures carry across sessions, creating a system that improves with every interaction.
The system runs on three specialized agents that coordinate to plan, execute, and learn:
Orchestrator Agent — The decision-maker. Powered by Claude Sonnet with extended thinking (4,000-token reasoning budget), the Orchestrator receives each user message, reasons about what needs to happen, and decides whether to call a tool directly, delegate to the Task agent, or respond. It maintains the full conversation context, manages the tool-use loop (up to 25 iterations for complex requests), and decides when to recall from memory or save new knowledge. Extended thinking gives it a private scratchpad to plan multi-step analyses before acting — for example, deciding which signals to sample, what statistics to compute, and how to present results, all before making the first tool call.
Task Agent — The workhorse. When the Orchestrator encounters multi-step work — exploring a dataset, running a statistical analysis across dozens of files, generating and testing code — it delegates to the Task agent. The Task agent gets a fresh context with its own tool-use loop (up to 15 iterations) and access to bash, read, write, and vision. It works autonomously: loading files, computing features, writing results, and returning a final summary. Because it runs independently, the Orchestrator can continue reasoning about the bigger picture while the Task agent handles the details. The Orchestrator injects all necessary context (file paths, column names, data formats, prior findings) into the task description so the Task agent can work without access to the conversation history.
Think Agent — The knowledge keeper. A single-pass reasoning agent (no tools) with a 3,000-token thinking budget, the Think agent handles three critical functions:
- Memory extraction — Every 5 user messages, the Think agent reviews recent conversation turns and updates
project_memory.mdwith new findings, pruning stale facts and consolidating to stay under 5,000 tokens. - Context compaction — When the conversation exceeds 40,000 tokens, the Think agent summarizes the oldest 20 turns into a ~200-word paragraph that preserves key findings, metrics, and decisions. This keeps the context window fresh without losing information.
- Session reflection — When a session ends, the Think agent reviews the full conversation and promotes reusable procedures, user preferences, and cross-project insights to global memory.
| Tool | What it does |
|---|---|
bash |
Execute Python scripts, shell commands, or install packages via PowerShell (Windows) or shell (Unix). Supports timeouts and environment variable injection. |
read |
Read text files with optional offset/limit paging. Extracts text from PDFs (via PyMuPDF) and loads MATLAB .mat files (v4 through v7.3 via scipy and h5py), returning structure and shape information. |
write |
Create or update files inside the project directory. Path-restricted — refuses to write outside the project boundary. |
vision |
Send one or more images (up to 5) to Gemini Flash for structured analysis. Supports single-image description and multi-image comparison with named references. The system injects project memory into the vision prompt so the model can reference known patterns and parameters. |
plot |
Generate charts displayed in the browser. Two modes: Plotly (interactive) — code produces a figure dict printed as JSON, rendered as a zoomable/pannable chart with hover tooltips. Matplotlib (static) — code creates figures normally; a preamble auto-applies dark theme and an epilogue captures all open figures as base64 PNG, displayed inline with a lightbox. |
recall |
Search long-term memory across both project and global layers. Three-phase process: (1) retrieve matching memories via keyword + semantic search, (2) emit results to the UI, (3) synthesize findings through the Think agent into a coherent summary that distinguishes project-specific vs. cross-project knowledge. |
remember |
Explicitly save knowledge to long-term memory. Stores content with descriptive tags for future retrieval. Supports project scope (dataset-specific) or global scope (cross-project). In hybrid mode, writes to both file and EverMemOS simultaneously. |
task |
Delegate multi-step work to an autonomous Task agent. The orchestrator packs all necessary context into the description — the Task agent cannot see the conversation. Useful for exploration, statistical analysis, code generation, and evaluation. |
ask |
Pause execution and ask the user a clarifying question via a browser modal. The agent waits (up to 10 minutes) for the user's response before continuing. Used only when genuinely blocked. |
todo |
Create or update a structured task list for the current session. Each item has an ID, description, and status (pending → running → done/failed). Evidence from tool output is mandatory for marking items done or failed — no reasoning-only completions. |
Pre-installed scientific stack: NumPy, SciPy, pandas, scikit-learn, MNE, antropy, ruptures, h5py, mat73, matplotlib, plotly, statsmodels, scikit-image.
The browser-based UI connects via WebSocket for real-time streaming — no polling, no page refreshes.
- Chat panel — Messages stream in token-by-token as the agent generates them. Markdown rendered with sanitized HTML (XSS-safe via DOMPurify).
- Activity panel — A sidebar showing live tool calls and results as they happen. Color-coded labels distinguish orchestrator tools (blue), sub-agent calls (purple), and results (green). Memory recalls and saves appear with their content.
- Interactive plots — Plotly charts open as modals with full zoom, pan, and hover. Matplotlib figures display inline with a lightbox for detailed inspection.
- Ask modal — When the agent needs clarification, a question appears in the chat with an input field. The agent pauses until you respond.
- Todo tracker — The task list appears at the top of the activity panel, updating in real-time as the agent works through steps.
- Status indicator — A colored dot shows connection state (green = connected, amber = agent is thinking).
Without persistent memory, every AI session starts from scratch. Memory transforms the system from a stateless tool into a self-improving research assistant that remembers how to load your data, which parameters work best, what artifacts look like, and what procedures you prefer.
What gets remembered: processing parameters (spectrogram settings, filter configs, with rationale and example code), signal patterns (artifact signatures, pathological features, with example signal IDs and anatomy), procedures (analysis workflows, labeling protocols), and domain knowledge (terminology, detection rules, dataset metadata).
| Layer | Scope | File Storage | EverMemOS Storage |
|---|---|---|---|
| Session | Current conversation | sessions/session_{id}.json — every turn, tool call, result, and todo list. Saved after each turn. |
Raw chat turns logged via store_chat_turn() for passive fact extraction. |
| Project | Across sessions, one project | project_memory.md — structured markdown updated by the Think agent every 5 messages. Injected into the orchestrator's system prompt each turn. |
Memories stored with project-scoped group_id (hash of project path). Classified as episodic_memory (findings, parameters, patterns) or profile (user preferences). Keyword-indexed for semantic retrieval. |
| Global | Across all projects | projects/_global/global_memory.md — populated by end-of-session reflection. |
Stored under adt_global group_id. Contains reusable procedures, user preferences, and cross-project insights. |
Project and global memory are both injected into the orchestrator's context at the start of every turn — the agent has full knowledge from prior sessions without needing to recall explicitly.
User teaches something
|
v
Orchestrator calls `remember` ──> File: written to project_memory.md
| EverMemOS: extracted into keyword-indexed
| episodic_memory with tags for retrieval
v
Every 5 messages ────────────────> Think agent reviews recent turns,
| merges new findings into project_memory.md,
| prunes stale facts (≤500 words/section)
v
Passive extraction ──────────────> EverMemOS receives raw chat turns,
| auto-extracts facts the agent didn't
| explicitly save (hybrid backend)
v
Session ends ────────────────────> Think agent reflects on full session.
| Promotes to global memory:
| • reusable procedures → global file + EverMemOS
| • user preferences → global file + EverMemOS
| • cross-project insights → global file + EverMemOS
v
Next session starts ─────────────> project_memory.md + global_memory.md
| loaded into system prompt.
v `recall` tool queries EverMemOS for
Agent recalls as needed semantic search when deeper retrieval needed.
EverMemOS is an open-source persistent memory service for AI agents. It adds semantic understanding and intelligent retrieval on top of the file-based memory layer.
How it works:
-
Store. When a memory is saved (
POST /memories), EverMemOS processes the text, extracts discrete facts, and indexes them with keywords and timestamps. "Muscle artifact shows broadband power above 80 Hz in temporal channels" becomes a searchable entry tagged withmuscle artifact,broadband,80 Hz,temporal. -
Retrieve. The
recalltool sends natural language queries to/memories/search. EverMemOS finds conceptually related memories — not just exact keyword matches. Asking "how should I detect artifacts?" returns memories about thresholds, frequency characteristics, and anatomy-specific patterns. Results include full metadata (content, memory type, keywords, timestamp, group_id) for the Think agent to synthesize. -
Scope. Each project gets a deterministic
group_idso memories never leak between projects. Global memory uses a specialadt_globalgroup_id. -
Session init. At session start, conversation metadata is registered — project name, participant roles (AI Data Technician / Researcher), and tags — so EverMemOS maintains context about who said what.
Memory types: episodic_memory (findings, summaries, observations) and profile (plot preferences, channel selections, communication style).
Deployment: Run locally via Docker (docker run -p 1995:1995 ghcr.io/nicholasgasior/evermemos:latest) or use the managed cloud service at api.evermind.ai. The system handles API differences automatically.
| Backend | Writes to | Reads from | Best for |
|---|---|---|---|
| File | project_memory.md |
Full-text (entire document) | Simple setup, git-trackable |
| EverMemOS | EverMemOS API | Semantic search | Intelligent retrieval at scale |
| Hybrid (default) | Both simultaneously | EverMemOS with file fallback | Durability + semantic search |
The hybrid backend writes to the file first (synchronous, reliable), then to EverMemOS (best-effort). If EverMemOS is unreachable, the system falls back to file-based memory — it never crashes or loses data due to a backend outage.
The memory gate controls update frequency — rather than updating on every message, it waits until enough new information accumulates (configurable via MEMORY_UPDATE_INTERVAL), then sends recent turns to the Think agent to merge findings, prune stale facts, and keep each section under 500 words. The todo system requires evidence from actual tool results to mark items done — no reasoning-only completions — ensuring memory entries are grounded in real observations.
The following screenshots are from a real analysis session on the MayoData1000 multicenter iEEG dataset (1,000 intracranial EEG signals, 5 kHz, 3 seconds each, SOZ-labeled). Dataset link: https://springernature.figshare.com/articles/dataset/Dataset_Mayo/11734575?backTo=%2Fcollections%2FMulticenter_intracranial_EEG_dataset_for_classification_of_graphoelements_and_artifactual_signals%2F4681208&file=21359865
The scientist asks for a time series, spectrogram, and PSD of a hippocampal signal. The system loads the .mat file, computes all three views, and renders them in the browser.
The scientist points out artifact examples. The system analyzes them visually (via vision model) and statistically, documenting the distinguishing features of each artifact type.
After the scientist approves a set of spectrogram parameters or artifact definitions, the system saves them to persistent memory — including parameter values, rationale, and example code.
In a new session, the scientist asks the system to plot spectrograms. The system recalls the previously optimized parameters from memory and applies them without re-learning.
The system can also synthesize everything it knows about a signal pattern — combining findings from past sessions, the original paper, and domain knowledge into a comprehensive summary.
User (Browser)
| WebSocket (real-time streaming)
v
+----------------+
| FastAPI Web | interface/web.py
| Server | Session & project management
+-------+--------+
|
v
+----------------+ +---------------+
| Orchestrator |---->| Task Agent | Up to 15 iterations
| (Claude, | | (bash, vision | autonomous tool-use
| ext. thinking)| | read, write) |
| | +---------------+
| Plans & |
| delegates |----> Tools (bash, read, plot, vision, ask, todo)
| |
| |----> Think Agent (reasoning, memory updates, compaction)
+-------+--------+
|
v
+----------------------------------------------+
| Memory Backend |
| File (markdown) | EverMemOS (semantic search)|
| |
| Session JSON - project_memory - global_memory|
+----------------------------------------------+
- Pixi (conda-based package manager)
- An Anthropic API key (Claude)
- A Google API key (Gemini Flash, for vision)
- (Optional) Docker — for running EverMemOS locally
# Clone and install (all OS)
git clone [email protected]:yuansui123/AI-Data-Technician.git
cd AI-Data-Technician
pixi installLinux / macOS (POSIX shell)
cp .env.template .env
# Edit .env: ANTHROPIC_API_KEY=sk-ant-... GOOGLE_API_KEY=AIza...
# Launch
pixi run web
# Open http://localhost:8000Windows (PowerShell)
Copy-Item .env.template .env
# Edit .env: ANTHROPIC_API_KEY=sk-ant-... GOOGLE_API_KEY=AIza...
# Launch
pixi run web
# Open http://localhost:8000python main.py [OPTIONS]
--project NAME Project name (default: "default")
--start Initialize a new project directory
--port N Web UI port (default: 8000)
--debug Log all LLM I/O to logs/
# Local (Docker)
docker run -p 1995:1995 ghcr.io/nicholasgasior/evermemos:latest
# Set MEMORY_BACKEND = "evermemos_local" in config.py
# Cloud
# Add EVERMEM_API_KEY=your-key to .env
# Set MEMORY_BACKEND = "evermemos_cloud" in config.py
# Hybrid (both file + EverMemOS)
# Set MEMORY_BACKEND = "hybrid" in config.pyAll settings in config.py:
| Setting | Default | Description |
|---|---|---|
ORCHESTRATOR_MODEL |
claude-sonnet-4-6 |
Main orchestrator model |
THINK_MODEL |
claude-sonnet-4-6 |
Reasoning & memory model |
VISION_MODEL |
gemini-2.5-flash |
Image analysis model |
MEMORY_BACKEND |
"hybrid" |
"file", "evermemos_local", "evermemos_cloud", or "hybrid" |
ORCHESTRATOR_THINKING |
4000 |
Extended thinking token budget |
TASK_MAX_ITER |
15 |
Max tool-use iterations per task |
AUTO_COMPACT_THRESHOLD |
40000 |
Token count triggering compaction |
MEMORY_UPDATE_INTERVAL |
5 |
User messages between memory updates |
SANDBOX_BACKEND |
"local" |
Shell backend: "local" (default) or "docker" |
SANDBOX_COMMAND_TIMEOUT_SECONDS |
120 |
Default per-command timeout for sandbox sessions |
SANDBOX_DOCKER_IMAGE |
"python:3.11-slim" |
Container image used when SANDBOX_BACKEND="docker" |
SANDBOX_CPUS |
"1" |
Docker CPU limit (--cpus) |
SANDBOX_MEMORY |
"2g" |
Docker memory limit (--memory) |
SANDBOX_PIDS_LIMIT |
256 |
Docker process limit (--pids-limit) |
- Orchestrator uses one sandbox session per user turn.
- Task agent uses a separate sandbox session per task invocation.
- In Docker mode, each session is one
docker runcontainer with manydocker execcalls. - Network is disabled in v1 (
--network none) with no model/user override path. - Local backend remains the default for backward compatibility.
export SANDBOX_BACKEND=docker
python scripts/smoke_docker_sandbox.pyOptional integration test:
RUN_DOCKER_SANDBOX_TESTS=1 python -m unittest tests.test_sandbox_docker_integrationIf you see Cannot connect to the Docker daemon at unix:///var/run/docker.sock:
sudo systemctl status docker
sudo systemctl start docker
sudo systemctl enable docker
docker infoIf docker info only works with sudo, add your user to the docker group:
sudo usermod -aG docker $USER
newgrp docker
docker info- Build or pull a pinned sandbox image and set
SANDBOX_DOCKER_IMAGE. - Ensure the app host user can run
docker run,docker exec, anddocker rm. - Keep
SANDBOX_BACKEND=localduring initial rollout, then switch todocker. - Verify logs include
sandbox_session_started,sandbox_exec, andsandbox_session_closedper turn/task.
MIT




