Multi-Agent Web Scraping System

A multi-agent AI system built with AutoGen and Mistral LLM that extracts and analyzes information from web pages. The system uses four agents (Orchestrator, Web Scraper, Analyzer, Critique) that work together through a graph-based workflow to answer user questions.

Features

Multi-agent architecture with specialized roles
Web scraping tool that cleans HTML (removes 70%+ unnecessary content)
Multi-stage validation of extracted information
Automatic retry for irrelevant or incomplete data
Human-readable answer formatting
Self-evaluation using an evaluator agent

Project Structure

MachineLearning-MiniProject/
├── Program.cs                          # Main entry point with workflow orchestration
├── Agents/
│   └── WebScrapingAgents.cs           # Agent factory methods and configurations
├── Tools/
│   └── WebScrapingTools.cs            # HTML fetching and cleaning implementation
├── LLMConfiguration.cs                # Mistral API configuration
├── PAPER.md                           # Technical documentation (1-3 pages)
├── USE_CASES.md                       # Multiple use cases with example outputs
├── README.md                          # This file
├── CLAUDE.md                          # Development guidelines for Claude Code
└── Autogen-research-paper-tool-calling-evaluation.csproj  # Project configuration

Requirements

.NET 10.0 or higher
Mistral AI API key
Internet connection (for web scraping)
500MB+ available disk space

Dependencies

<PackageReference Include="AutoGen" Version="0.2.3" />
<PackageReference Include="AutoGen.SourceGenerator" Version="0.2.3" />
<PackageReference Include="HtmlAgilityPack" Version="1.11.61" />

Setup

Get a Mistral API key from https://console.mistral.ai/

2Set your API key:

export MISTRAL_API_KEY=your_key_here  # macOS/Linux
# or for Windows PowerShell:
$env:MISTRAL_API_KEY="your_key_here"

Running

Run:

dotnet run

By default, it extracts event info from a hackathon website. To change the task, edit the task variable in Program.cs:

var task = "Extract [what you want] from: https://website.com/";

The system works with any URL - product pages, articles, job listings, etc.

Understanding the Output

The system produces three main output sections:

1. TASK & ANSWER

Shows your question and the extracted answer in simple, human-readable format.

TASK:
Extract when is event happening from webpage: https://hack-esbjerg.cod3rs.org/

ANSWER:
The event is happening on March 19-20, 2026.

2. Multi-Agent Conversation (Debug Output)

Shows detailed interaction between agents:

Web Scraper fetches and cleans the page
Analyzer extracts information
Critique validates the results
Answer Formatter creates the final answer

3. EXTERNAL EVALUATION

Shows system performance metrics:

{
  "correctness": 5,
  "instruction_following": 5,
  "efficiency": 3,
  "quality_of_reasoning": 5,
  "constraint_satisfaction": 5,
  "overall_score": 5
}

How It Works

The system uses 4 agents working together:

Web Scraper - Fetches the webpage and cleans out unnecessary stuff (scripts, ads, etc.)
Analyzer - Looks at the cleaned content and extracts the relevant information
Critique - Checks if the extracted data is complete and accurate
Answer Formatter - Takes the analysis and gives you a simple, readable answer

The system also evaluates itself at the end to show how well it performed.

Customization

To modify the system:

Change the LLM model: Edit LLMConfiguration.cs
Modify HTML cleaning: Adjust patterns in WebScrapingTools.cs
Change agent behavior: Edit system messages in WebScrapingAgents.cs

See PAPER.md for technical details.

Troubleshooting

API Key not set? - Make sure MISTRAL_API_KEY is exported before running the program.

API errors? - Check that your API key is valid and has available credits.

Can't connect to website? - The URL might be invalid, or the website might block automated access. Try a different site.

No results? - Verify the URL is correct and the website's content is publicly accessible.

Example

When you run the program, you'll see:

TASK & ANSWER - Your question and the extracted answer
Debug output - What each agent is doing
EVALUATION RESULTS - A score showing how well the system performed (usually 5/5)

Documentation Files

PAPER.md: Comprehensive technical paper (1-3 pages) describing:
- System architecture and design
- Tools and instructions given to the LLM
- Implementation details
- Example use cases and outputs
USE_CASES.md: Real-world web scraping examples including:
- Event information extraction
- Product price and availability scraping
- News headlines and articles
CLAUDE.md: Development guidelines for Claude Code AI assistant

Performance

Response time: 3-7 seconds
HTML reduction: 70-90%
Extraction accuracy: 95%+
Overall score: Usually 5/5

Questions?

Check PAPER.md for technical details, USE_CASES.md for examples, or CLAUDE.md for development info.

Built with AutoGen, Mistral AI, and .NET.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Agent Web Scraping System

Features

Project Structure

Requirements

Dependencies

Setup

Running

Understanding the Output

1. TASK & ANSWER

2. Multi-Agent Conversation (Debug Output)

3. EXTERNAL EVALUATION

How It Works

Customization

Troubleshooting

Example

Documentation Files

Performance

Questions?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Agents		Agents
Tools		Tools
.gitignore		.gitignore
Autgoen-shopping-assistant.slnx		Autgoen-shopping-assistant.slnx
Autogen-research-paper-tool-calling-evaluation.csproj		Autogen-research-paper-tool-calling-evaluation.csproj
LLMConfiguration.cs		LLMConfiguration.cs
PAPER.md		PAPER.md
Program.cs		Program.cs
README.md		README.md
USE_CASES.md		USE_CASES.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Web Scraping System

Features

Project Structure

Requirements

Dependencies

Setup

Running

Understanding the Output

1. TASK & ANSWER

2. Multi-Agent Conversation (Debug Output)

3. EXTERNAL EVALUATION

How It Works

Customization

Troubleshooting

Example

Documentation Files

Performance

Questions?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages