Analyze your VS Code Copilot chat history to identify effective prompting patterns, measure AI engagement, and track improvement over time.
- Extracts copilot AI chat history directly from VS Code's workspace storage
- Detects prompting patterns (challenges, redirects, approvals, directives)
- Measures AI engagement (explicit acknowledgments, analytical responses)
- Correlates with git history to measure before/after effects of milestone changes
- Reports session-level and turn-level metrics
I am super curious about what I do that gets the best and worst results from my AI coding assistants. So I wanted to analyze my AI interactions ... with the meta-recursive help of AI to analyze my own AI interactions. Most prompting advice is generic and prescriptive. This tool will analyze your actual conversations and coach you about what is working and not working (!) specifically for you.
In my case, analyzing 51 hours of chat history from a PostgreSQL extension project pg_num2int_direct_comp revealed:
- Trimming a bloated instruction file by 66% increased session depth by 500%
- I was using github spec kit and trimmed my constitution.md from 660 lines (generated with AI) down to 220 lines (trimmed by AI).
- I suspected my constitution was polluting the model with too much context, and I was right (found via the Custom Milestone option below)
- More context didn't correlate with better responses for analytical work, better framing did
- Deep thinking and then socratic framing "Shouldn't we...?" challenges got 100% AI engagement (I didn't know I was using a socratic method until I saw the analysis)
- However, adding context like code selections and file references did help for straightforward tasks like "add a test for X".
# Clone or download
git clone https://github.com/yourusername/copilot-chat-analyzer.git
cd copilot-chat-analyzer
# No dependencies beyond Python 3.10+ standard library
python3 analyze_copilot_chats.py --helpVS Code stores Copilot chat sessions as JSON files in workspace storage.
| OS | Path |
|---|---|
| macOS | ~/Library/Application Support/Code/User/workspaceStorage/<hash>/chatSessions/ |
| Linux | ~/.config/Code/User/workspaceStorage/<hash>/chatSessions/ |
| Windows | %APPDATA%/Code/User/workspaceStorage/<hash>/chatSessions/ |
# macOS/Linux
find ~/Library/Application\ Support/Code/User/workspaceStorage/ \
-name "workspace.json" -exec grep -l "your-project-name" {} \;
# This returns something like:
# .../workspaceStorage/7d348763944a9089554430bce3b98996/workspace.jsonThe hash (7d348763944a9089554430bce3b98996) is your workspace ID. My project name was "pg_num2int_direct_comp" so I searched for that.
# Analyze all sessions given hash location (found from above)
python3 analyze_copilot_chats.py \
~/Library/Application\ Support/Code/User/workspaceStorage/<hash>/chatSessions \
~/your/git/repo
# Output: Markdown report to stdout# Analyze a specific session (partial UUID match)
python3 analyze_copilot_chats.py \
/path/to/chatSessions \
/path/to/repo \
--session 85e291ccThe session can be specified by partial ID match, e.g. "85e291cc" for UUID "85e291cc-1234-5678-90ab-cdef12345678". I spot analyzed a sample of my largest/middle/smallest sized sessions this way.
The --milestone flag enables before/after comparison of a git commit message. This helps you measure the impact of workflow changes.
# Compare before/after a specific commit message pattern
python3 analyze_copilot_chats.py \
/path/to/chatSessions \
/path/to/repo \
--milestone "Refactor constitution"This lets you see how your prompting patterns and AI engagement changed after a specific change, e.g. trimming your instruction file. For me, this showed a 500% increase in session depth after trimming my constitution file.
Pick a commit that represents a meaningful change to your AI workflow, such as:
| Change Type | Example Commit Message | What You'll Learn |
|---|---|---|
| Instruction file changes | "Trim constitution.md" | Did leaner instructions improve session depth? |
| New workflow adoption | "Add copilot-instructions.md" | Did explicit instructions change AI behavior? |
| Tool configuration | "Enable Copilot agent mode" | Did new capabilities change your patterns? |
| Project phase transitions | "Complete v1.0 release" | How did your prompting differ during crunch? |
# List recent commits to markdown/config files
git log --oneline --since="30 days ago" -- "*.md" "*.json"
# Search for workflow-related commits
git log --oneline --grep="copilot\|instruction\|config" -iSessions are classified as "before" or "after" based on their start time relative to the milestone commit timestamp. The report compares:
- Turns per session: Are conversations going deeper?
- Duration: Are you sustaining longer dialogues?
- Technical depth: Is the AI engaging with domain concepts?
- Challenges/Acks: Is your steering more effective?
# Export to JSON for further analysis
python3 analyze_copilot_chats.py \
/path/to/chatSessions \
/path/to/repo \
--json analysis.json| Pattern | Example | What It Means |
|---|---|---|
| Challenge | "Shouldn't it return false?" | You're questioning AI's approach |
| Redirect | "No, it should do X instead" | You're correcting course |
| Approval | "Ok, let's do it" | You're accepting a suggestion |
| Directive | "Please add a test for X" | You're giving instructions |
| Query | "How does X work?" | You're asking for information |
| Pattern | Example | What It Means |
|---|---|---|
| Explicit Ack | "You're right!" | AI validated your point |
| Analytical | "Let me think about this..." | AI engaged with your challenge |
| Self-Correction | "I missed that..." | AI caught its own error |
The report automatically identifies your best and worst prompting moments:
Scored by:
- Context richness: instruction file present, code selection, file references
- Steering effectiveness: challenges ("shouldn't it...") and redirects ("no, instead...")
- AI acknowledgment: whether the AI explicitly validated your point in the next turn
- Technical depth: domain-specific terminology in your prompt
Identifies substantive prompts that scored low, with specific suggestions:
- "Add code selection" if you discussed code without selecting it
- "Reference file with #file" when no explicit file context was provided
- "Reference spec artifact (RQ-, T00)" to anchor to documented decisions
This helps you spot patterns in when you're prompting effectively vs. when you're leaving context on the table.
Truncated for brevity (as seen in a markdown viewer):
- Sessions: 22
- Substantive turns: 203
- Total time: 3079 minutes (51.3 hours)
Pattern Count % of Turns Challenges 23 11.3% Redirects 2 1.0% AI Explicit Acks 39 19.2%
Metric Before (7) After (13) Change Turns/session 2.4 14.4 +500% Duration (min) 10.0 224.0 +2140% Technical depth 0.4 0.8 +86% ... etc
The default technical depth keywords are PostgreSQL-focused. Edit TECHNICAL_DEPTH_KEYWORDS in the script for your domain:
TECHNICAL_DEPTH_KEYWORDS = [
# Add your domain terms
"kubernetes", "deployment", "pod", "service",
"react", "component", "useState", "useEffect",
# etc.
]- Only analyzes Copilot Chat panel sessions (not inline completions)
- Pattern detection is heuristic and may miss some patterns or false-positives
- Git correlation requires commits with searchable messages
- Tested with Copilot Chat extension v0.36.0
Comments, Issues and PRs welcome. Particularly interested in:
- Expanding the pattern definitions for other communication styles
- Technical depth keywords for other domains
- Support for other AI coding assistants (Cursor, Cody, etc.)
- Better visualization of results
- Sharing what you learn from your own analyses
MIT License - see LICENSE file for details.
Developed while building pg_num2int_direct_comp, a PostgreSQL extension for exact numeric-to-integer comparisons.
Dave Sharpe developed this tool in an AI assisted development session with Claude (Anthropic). Claude contributed to designing the analysis methodology, identifying prompting patterns, and writing the initial codebase. The meta-recursive nature of using AI to analyze AI conversations proved both insightful and enjoyable.