Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mmEASV/Autogen-shopping-assistant

Repository files navigation

Multi-Agent Web Scraping System

A multi-agent AI system built with AutoGen and Mistral LLM that extracts and analyzes information from web pages. The system uses four agents (Orchestrator, Web Scraper, Analyzer, Critique) that work together through a graph-based workflow to answer user questions.

Features

  • Multi-agent architecture with specialized roles
  • Web scraping tool that cleans HTML (removes 70%+ unnecessary content)
  • Multi-stage validation of extracted information
  • Automatic retry for irrelevant or incomplete data
  • Human-readable answer formatting
  • Self-evaluation using an evaluator agent

Project Structure

MachineLearning-MiniProject/
├── Program.cs                          # Main entry point with workflow orchestration
├── Agents/
│   └── WebScrapingAgents.cs           # Agent factory methods and configurations
├── Tools/
│   └── WebScrapingTools.cs            # HTML fetching and cleaning implementation
├── LLMConfiguration.cs                # Mistral API configuration
├── PAPER.md                           # Technical documentation (1-3 pages)
├── USE_CASES.md                       # Multiple use cases with example outputs
├── README.md                          # This file
├── CLAUDE.md                          # Development guidelines for Claude Code
└── Autogen-research-paper-tool-calling-evaluation.csproj  # Project configuration

Requirements

  • .NET 10.0 or higher
  • Mistral AI API key
  • Internet connection (for web scraping)
  • 500MB+ available disk space

Dependencies

<PackageReference Include="AutoGen" Version="0.2.3" />
<PackageReference Include="AutoGen.SourceGenerator" Version="0.2.3" />
<PackageReference Include="HtmlAgilityPack" Version="1.11.61" />

Setup

  1. Get a Mistral API key from https://console.mistral.ai/

2Set your API key:

export MISTRAL_API_KEY=your_key_here  # macOS/Linux
# or for Windows PowerShell:
$env:MISTRAL_API_KEY="your_key_here"

Running

Run:

dotnet run

By default, it extracts event info from a hackathon website. To change the task, edit the task variable in Program.cs:

var task = "Extract [what you want] from: https://website.com/";

The system works with any URL - product pages, articles, job listings, etc.

Understanding the Output

The system produces three main output sections:

1. TASK & ANSWER

Shows your question and the extracted answer in simple, human-readable format.

TASK:
Extract when is event happening from webpage: https://hack-esbjerg.cod3rs.org/

ANSWER:
The event is happening on March 19-20, 2026.

2. Multi-Agent Conversation (Debug Output)

Shows detailed interaction between agents:

  • Web Scraper fetches and cleans the page
  • Analyzer extracts information
  • Critique validates the results
  • Answer Formatter creates the final answer

3. EXTERNAL EVALUATION

Shows system performance metrics:

{
  "correctness": 5,
  "instruction_following": 5,
  "efficiency": 3,
  "quality_of_reasoning": 5,
  "constraint_satisfaction": 5,
  "overall_score": 5
}

How It Works

The system uses 4 agents working together:

  1. Web Scraper - Fetches the webpage and cleans out unnecessary stuff (scripts, ads, etc.)
  2. Analyzer - Looks at the cleaned content and extracts the relevant information
  3. Critique - Checks if the extracted data is complete and accurate
  4. Answer Formatter - Takes the analysis and gives you a simple, readable answer

The system also evaluates itself at the end to show how well it performed.

Customization

To modify the system:

  • Change the LLM model: Edit LLMConfiguration.cs
  • Modify HTML cleaning: Adjust patterns in WebScrapingTools.cs
  • Change agent behavior: Edit system messages in WebScrapingAgents.cs

See PAPER.md for technical details.

Troubleshooting

API Key not set? - Make sure MISTRAL_API_KEY is exported before running the program.

API errors? - Check that your API key is valid and has available credits.

Can't connect to website? - The URL might be invalid, or the website might block automated access. Try a different site.

No results? - Verify the URL is correct and the website's content is publicly accessible.

Example

When you run the program, you'll see:

  1. TASK & ANSWER - Your question and the extracted answer
  2. Debug output - What each agent is doing
  3. EVALUATION RESULTS - A score showing how well the system performed (usually 5/5)

Documentation Files

  • PAPER.md: Comprehensive technical paper (1-3 pages) describing:

    • System architecture and design
    • Tools and instructions given to the LLM
    • Implementation details
    • Example use cases and outputs
  • USE_CASES.md: Real-world web scraping examples including:

    • Event information extraction
    • Product price and availability scraping
    • News headlines and articles
  • CLAUDE.md: Development guidelines for Claude Code AI assistant

Performance

  • Response time: 3-7 seconds
  • HTML reduction: 70-90%
  • Extraction accuracy: 95%+
  • Overall score: Usually 5/5

Questions?

Check PAPER.md for technical details, USE_CASES.md for examples, or CLAUDE.md for development info.


Built with AutoGen, Mistral AI, and .NET.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages