YouTube Vibes Tracker

Track and analyze content patterns across YouTube channels over time.

What It Does

This tool helps you monitor what different groups of YouTube channels are talking about. You define "clusters" of channels (e.g., left-leaning news, right-leaning news, tech channels), and the tool:

Fetches recent videos from those channels
Analyzes the transcripts to extract themes, sentiment, and framing
Tracks how these narratives evolve over time
Compares what different clusters are focusing on

Recent Daily Runs

The tracker is continuously monitoring YouTube channels. Here's what's in the database:

Current Database Status (as of Jan 9, 2026):

2,703 total videos across all clusters
- Libs: 872 videos (from channels like Pod Save America, The Young Turks, David Pakman Show)
- Right: 908 videos (from channels like Fox News, Ben Shapiro Show, Daily Wire)
- Mainstream: 923 videos (from channels like CNN, Reuters, MSNBC, ABC News)

Latest Daily Analysis:

Combined topics across all clusters:

Breakdown by cluster perspective:

Libs	Right	Mainstream

View distribution across Liberal channels:

More daily reports available in data/reports/ directory with word clouds and view statistics updated regularly.

Features

Automated data collection from YouTube channels (quota-optimized)
Daily reporting system - automated daily word clouds and view statistics
Retroactive transcript search - download complete historical datasets for any channel
AI-powered analysis using Google's Gemini to extract themes, sentiment, and framing
Temporal tracking - see how topics evolve over weeks, months, or years
Cross-cluster comparison - identify consensus topics vs echo chambers
25+ visualizations including word clouds, heatmaps, and trend charts
Multi-year analysis - can collect years of historical data efficiently
Incremental processing - only processes new videos on subsequent runs (100x faster)
Parallel processing - analyze multiple videos concurrently (3-5x faster)
Caching - never re-fetch transcripts or re-analyze the same video

Quick Start

Installation

# Clone the repo
git clone https://github.com/YOUR_USERNAME/vibes-tracker.git
cd vibes-tracker

# Set up virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

API Keys

Create a .env file in the project root:

YOUTUBE_API_KEY=your_youtube_api_key
GEMINI_API_KEY=your_gemini_api_key

Get your API keys:

YouTube: Google Cloud Console → Enable YouTube Data API v3
Gemini: Google AI Studio

Configure Your Channels

Edit config/clusters.json to define your channel groups:

{
  "tech": ["@mkbhd", "@linustechtips", "@verge"],
  "news": ["@cnn", "@foxnews", "@bbcnews"],
  "finance": ["@bloomberg", "@cnbc", "@wallstreetjournal"]
}

Run the Pipeline

# Full pipeline: fetch data → analyze → visualize
python src/main.py pipeline

# Or run stages individually
python src/main.py ingest     # Fetch video metadata
python src/main.py analyze    # Run AI analysis
python src/main.py visualize  # Generate plots

Check figures/ for your visualizations and data/analyzed_data.csv for the results.

Usage

Basic Commands

# Fetch video metadata from YouTube
python src/main.py ingest

# Analyze transcripts with Gemini
python src/main.py analyze

# Generate all visualizations
python src/main.py visualize

# Run temporal trend analysis
python src/main.py temporal --days-back 30

# Compare clusters
python src/main.py compare

# Run complete pipeline
python src/main.py pipeline

Daily Reporting

Generate a daily report with word clouds and view statistics for videos published on a specific date:

# Generate report for today's videos
python src/daily_report.py

# Generate report for a specific date
python src/daily_report.py --date 2025-01-10

The report includes:

Word clouds for top content across all clusters (filtered by top 67% of views)
Cluster-specific word clouds with custom color schemes
View distribution charts showing individual videos as stacked segments
All outputs saved to data/reports/YYYY-MM-DD/

Retroactive Transcript Collection

Download complete historical datasets for any YouTube channel:

# Download all available transcripts from 2020-2023
python scripts/retroactive_search.py --channel @joerogan --start-year 2020 --end-year 2023

# Limit downloads per run (useful for rate limiting)
python scripts/retroactive_search.py --channel @joerogan --start-year 2020 --end-year 2023 --max-per-run 50

# Use specific date ranges
python scripts/retroactive_search.py --channel @joerogan --start-date 2020-06-01 --end-date 2021-12-31

Features:

Smart Resume: Checks for existing transcripts and only downloads missing ones
Resilient: Saves each transcript immediately - no data loss if interrupted
Progress Tracking: Maintains progress in JSON file, fully resumable
Organized Storage: Saves transcripts to data/{channel}/ with clear naming

See docs/RETROACTIVE_SEARCH.md for detailed documentation.

Incremental Mode (Recommended for Daily Use)

After the first run, use incremental mode to only process new videos:

# Only fetch and analyze new videos since last run
python src/main.py pipeline --incremental

# This is ~100x faster than re-processing everything

Performance Options

# Use more parallel workers for faster analysis (default: 10)
python src/main.py analyze --workers 20

# Force full refresh (re-process everything)
python src/main.py ingest --full-refresh
python src/main.py analyze --full-refresh

Historical Data Collection

Collect years of data efficiently:

# Collect 3 years of monthly data (takes ~1 day, uses 4,320 API units)
python src/main.py collect-historical \
  --start-year 2022 \
  --end-year 2024 \
  --frequency monthly

# Then run temporal analysis
python src/main.py temporal --days-back 1095  # 3 years

See docs/MULTI_YEAR_ANALYSIS_GUIDE.md for details.

What You Get

Data Files

data/cluster_data.csv - Raw video metadata
data/analyzed_data.csv - Enriched with AI analysis
data/historical/YYYY-MM-DD/ - Dated snapshots for temporal analysis
data/cache/ - Cached transcripts and analysis (avoids redundant API calls)
logs/ - Detailed logs of all runs

Visualizations (25+ plots in `figures/`)

Word Clouds:

Title word clouds (per cluster + combined)
Theme word clouds (per cluster + combined)

Sentiment Analysis:

Sentiment distribution by cluster
Framing distribution (favorable/critical/neutral/alarmist)

Temporal Trends:

Theme prevalence over time
Sentiment evolution
Emerging vs declining topics

Cross-Cluster Comparison:

Similarity heatmap
Consensus vs echo chamber topics
Theme distribution comparison

Analysis Fields

For each video, the AI extracts:

Core Themes - 3-5 main topics discussed
Theme Categories - Political, Social, Economic, Cultural, International, Tech, Other
Sentiment - Positive, Neutral, Negative, Mixed
Framing - favorable, critical, neutral, alarmist
Named Entities - Key people, organizations, events mentioned
Summary - One-sentence takeaway

Configuration

Edit config/pipeline_config.yaml to customize:

ingest:
  videos_per_channel: 30  # How many recent videos to fetch

analysis:
  model: "gemini-1.5-flash"
  enable_caching: true

visualization:
  wordcloud_width: 1200
  wordcloud_height: 800
  custom_stopwords: ["video", "podcast", "episode"]

Architecture

The pipeline has 3 main stages:

Ingest (src/ingest.py) - Fetches video metadata from YouTube
Analyze (src/analyze.py) - Extracts themes/sentiment using Gemini
Visualize (src/visualize.py) - Generates plots and charts

Additional modules:

Temporal Analysis (src/temporal_analysis.py) - Track trends over time
Cross-Cluster Analysis (src/cross_cluster_analysis.py) - Compare clusters
CLI (src/main.py) - Unified command-line interface

All configuration is in config/, utilities in src/utils/, and visualizations in src/visualizations/.

Performance

First run (1,000 videos):

Data collection: ~2 minutes (YouTube API)
Analysis: ~10 minutes with 10 parallel workers (Gemini API)
Visualization: ~1 minute

Incremental runs (50 new videos):

Data collection: ~10 seconds
Analysis: ~30 seconds with caching
Visualization: ~10 seconds

Caching benefits:

Transcripts: Never re-fetch the same video (YouTube API savings)
Analysis: Never re-analyze the same video (Gemini API savings)
Expected speedup: 10-50x on repeated runs

API Quota Usage

YouTube Data API v3 (10,000 units/day)

Typical usage:

First time fetching a channel: ~100 units (need to resolve channel ID)
Subsequent fetches: ~2 units per channel (cached channel ID)
60 channels daily: ~120 units
You can fetch ~83 time periods per day

Gemini API

1 API call per video analyzed
Caching prevents re-analyzing the same video
Incremental mode only analyzes new videos
Free tier: 1,500 requests/day (check current limits)

Project Structure

vibes-tracker/
├── config/
│   ├── clusters.json              # Your channel definitions
│   ├── pipeline_config.yaml       # Pipeline settings
│   └── prompts.yaml               # AI prompt templates
├── src/
│   ├── main.py                    # CLI entry point
│   ├── ingest.py                  # Data collection
│   ├── analyze.py                 # AI analysis
│   ├── visualize.py               # Visualization orchestration
│   ├── temporal_analysis.py       # Temporal tracking
│   ├── cross_cluster_analysis.py  # Cluster comparison
│   ├── utils/                     # Utilities
│   │   ├── config_loader.py
│   │   ├── logger.py
│   │   ├── cache_manager.py
│   │   └── metadata_manager.py
│   └── visualizations/            # Plotting modules
│       ├── word_clouds.py
│       ├── temporal_plots.py
│       ├── cluster_comparison.py
│       └── sentiment_plots.py
├── scripts/
│   └── collect_historical_data.py # Multi-year collection
├── data/                          # Generated data (gitignored)
├── figures/                       # Generated plots (gitignored)
├── logs/                          # Log files (gitignored)
└── docs/                          # Documentation

Common Workflows

Daily Monitoring

Set up a cron job for automated daily updates:

# Run at 2am daily - collect new videos and generate daily report
0 2 * * * cd /path/to/vibes-tracker && \
  source .venv/bin/activate && \
  python src/main.py pipeline --incremental && \
  python src/daily_report.py

This workflow:

Fetches new videos from all configured channels
Analyzes only new content (incremental mode)
Generates daily word clouds and view statistics
Saves reports to data/reports/YYYY-MM-DD/

Research Study

Collect and analyze historical data:

# 1. Collect 3 years of data
python src/main.py collect-historical \
  --start-year 2022 --end-year 2024 --frequency monthly

# 2. Run temporal analysis
python src/main.py temporal --days-back 1095

# 3. Generate all visualizations
python src/main.py visualize

# 4. Compare clusters
python src/main.py compare

Event Analysis

Study a specific time period:

# Collect data around a specific event (e.g., election)
python scripts/collect_historical_data.py \
  --start-date 2024-10-01 \
  --end-date 2024-12-01

# Analyze the data
python src/main.py analyze

# Generate visualizations
python src/main.py visualize

Channel Deep Dive

Build a complete dataset for a specific channel:

# Download all available transcripts for a channel
python scripts/retroactive_search.py \
  --channel @YourTargetChannel \
  --start-year 2020 \
  --end-year 2024 \
  --max-per-run 100

# The script is fully resumable - run it multiple times if needed
# It will skip already-downloaded transcripts and continue where it left off

Troubleshooting

No transcripts available:

Many channels disable transcripts or auto-captions
The tool skips videos without transcripts
This is expected behavior

API quota exceeded:

YouTube: 10,000 units/day limit (resets at midnight PT)
Use incremental mode to minimize API usage
Spread historical collection across multiple days if needed

Slow analysis:

Increase parallel workers: --workers 20
Use caching to avoid re-analyzing
Consider analyzing a sample instead of all videos

Cache getting large:

Cache files are stored in data/cache/
Safe to delete if you want to re-analyze everything
Each transcript is ~1-10 KB, each analysis is ~1-2 KB

Documentation

Multi-Year Analysis Guide - How to collect and analyze years of data
Retroactive Search Guide - Download complete historical datasets for any channel
Getting Started Guide - Step-by-step setup and usage instructions
Technical Guide - Architecture, performance tuning, and advanced usage
Implementation Summary - Technical details of all features
Phase 2 Test Report - Temporal analysis capabilities
Phase 3 Test Report - Performance improvements

Contributing

This is a research tool built for personal use. Feel free to fork and adapt for your own needs.

License

MIT License - see LICENSE file for details

Acknowledgments

Built with:

YouTube Data API v3
Google Gemini API
youtube-transcript-api
Standard Python data science stack (pandas, matplotlib, seaborn, wordcloud)

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.claude		.claude
.gemini/commands		.gemini/commands
.github/workflows		.github/workflows
config		config
docs		docs
figures		figures
gui		gui
logs		logs
notebooks		notebooks
results		results
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
GEMINI.md		GEMINI.md
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

connorjmack/vibes-tracker

Folders and files

Latest commit

History

Repository files navigation