🕸️ ExtractGraph

English | 中文

A powerful, configurable text extraction system for building knowledge graphs

Built on top of langextract with advanced configuration management and visualization capabilities

🚀 Quick Start • 📖 Documentation • 🎯 Examples • 🤝 Contributing

✨ Features

🎛️ Highly Configurable Extraction

Strategy-based extraction with YAML configuration files
Multi-dimensional granularity control (breadth, depth, confidence, context scope)
Dynamic prompt generation using Jinja2 templates
Few-shot learning with customizable examples

🎨 Interactive Visualization

Real-time node visualization with pyvis integration
Pre-import data quality checks before Neo4j insertion
Multi-strategy comparison views for analysis
Customizable styling for different node types

🗄️ Database Integration

Neo4j-ready data formatting with automatic Cypher generation
Flexible schema support for various entity types
Relationship mapping with configurable properties
Batch import capabilities with MERGE statements

🔧 Developer-Friendly

Type-safe configuration with Pydantic models
Comprehensive logging and debugging support
Extensible architecture for custom strategies
Rich examples and documentation

🚀 Quick Start

Prerequisites

Python 3.11+
OpenAI-compatible API (OpenAI, Azure OpenAI, local models, etc.)

Installation

# Clone the repository
git clone https://github.com/Adoubf/ExtractGraph.git
cd ExtractGraph

# Install with uv (recommended)
uv sync

# Or use pip
pip install -e .

Basic Setup

Configure your API credentials:

cp .env.example .env
# Edit .env with your API settings

Run a quick extraction:

from src.core.extractor import extractor

text = "Alice is a data scientist at TechCorp. She feels excited about the new AI project."

# Extract with default strategy
result = extractor.extract_for_neo4j(text)
print(f"Found {len(result['neo4j_data']['nodes'])} nodes and {len(result['neo4j_data']['relationships'])} relationships")

Visualize the results:

from src.core.visual_nodes import visual_nodes

# Generate interactive visualization
html_path = visual_nodes.visualize_text_extraction(
    text=text,
    strategy="literary",
    save_path="output/demo.html"
)
# Open the HTML file in your browser!

🏗️ Architecture

ExtractGraph/
├── 🎛️ Strategy Layer      # YAML-based extraction strategies
├── 🔧 Configuration Layer # Dynamic prompt generation with Jinja2  
├── 📊 Granularity Layer   # Multi-dimensional extraction control
├── 🎨 Visualization Layer # Interactive node preview and analysis
└── 🗄️ Database Layer      # Neo4j integration with Cypher generation

Core Components

Component	Description	Key Features
ConfigurableExtractor	Main extraction engine	Strategy management, dynamic prompting
VisualNodes	Visualization engine	Interactive graphs, comparison views
StrategyManager	Configuration management	YAML loading, custom strategy creation
CypherGenerator	Database integration	CREATE/MERGE statement generation

📖 Documentation

🎯 Extraction Strategies

Create custom extraction strategies with YAML configuration:

# strategies/custom_strategy.yaml
name: "scientific_papers"
description: "Extract entities from scientific literature"
entities:
  - "researcher"
  - "institution" 
  - "concept"
  - "method"
relations:
  - "affiliated_with"
  - "researches"  
  - "cites"
granularity:
  breadth: "comprehensive"
  depth: "inferential"
  confidence: "high"
  context_scope: "document"

🎨 Visualization Options

# Basic visualization
visual_nodes.visualize_text_extraction(text, strategy="scientific")

# Custom styling
custom_visual = VisualNodes(
    width="1200px", 
    height="800px",
    bgcolor="#f8f9fa"
)

# Comparison analysis  
visual_nodes.create_comparison_view(
    data_list=[result1, result2],
    titles=["Strategy A", "Strategy B"]
)

🗄️ Neo4j Integration

# Generate Cypher statements
result = extractor.extract_for_neo4j_merge(text, strategy="business")

# Get CREATE statements
nodes_cypher = result['merge_statements']['nodes']
relationships_cypher = result['merge_statements']['relationships']

# Execute in Neo4j
# driver.session().run(nodes_cypher)
# driver.session().run(relationships_cypher)

🎯 Examples

📝 Text Analysis Pipeline

from src.core.extractor import extractor
from src.core.visual_nodes import visual_nodes

# Multi-step analysis pipeline
text = """
Dr. Sarah Chen, a machine learning researcher at Stanford University, 
published groundbreaking work on neural networks. She collaborates 
with Dr. Mike Johnson from MIT on deep learning applications.
"""

# 1. Extract with academic strategy
result = extractor.extract_for_neo4j(text, strategy="academic")

# 2. Visualize for quality check
visual_nodes.visualize_neo4j_data(
    result['neo4j_data'], 
    title="Academic Knowledge Graph"
)

# 3. Generate database import
cypher_statements = result['cypher_statements']
print("Ready for Neo4j import!")

🔍 Strategy Comparison

# Compare different extraction approaches
strategies = ["literary", "business", "academic"]
results = []

for strategy in strategies:
    result = extractor.extract_for_neo4j(text, strategy=strategy)
    results.append(result['neo4j_data'])

# Generate comparison visualization
visual_nodes.create_comparison_view(
    data_list=results,
    titles=[f"Strategy: {s}" for s in strategies],
    save_dir="analysis/strategy_comparison"
)

🎛️ Custom Configuration

# Create custom extraction with granular control
result = extractor.extract(
    text=text,
    entities=["person", "organization", "project"],
    relations=["works_at", "collaborates_with"],
    breadth="comprehensive",
    depth="inferential", 
    confidence="medium",
    context_scope="document"
)

🎨 Visualization Gallery

Basic Text Extraction

Interactive visualization of entities and relationships extracted from text.

Literary Strategy Analysis

Specialized extraction for literary texts with character and emotion analysis.

Business Data Visualization

Structured data ready for Neo4j import with organizational relationships.

Strategy Comparison

Compare different extraction strategies side by side.

Literary Strategy

Business Strategy

✨ Interactive Features

All visualizations support:

🖱️ Drag nodes - Adjust graph layout
🔍 Hover details - View detailed information
🎯 Click highlighting - Highlight connected nodes
📐 Zoom & pan - Explore large graphs
⚙️ Physics layout - Auto-optimize positioning

🔗 Try It Yourself

Experience the full interactive features:

git clone https://github.com/Adoubf/extractGraph.git
cd extractGraph
uv sync
python -m code_examples.visual_nodes_demo
# Open the generated HTML files in your browser!

🛠️ Advanced Usage

Custom Strategy Development

Create strategy file:

# strategies/my_domain.yaml  
name: "my_domain"
entities: ["entity1", "entity2"]
relations: ["relation1"]
# ... configuration

Load and use:

strategy = strategy_manager.load_strategy("my_domain")
result = extractor.extract(text, strategy="my_domain")

Visualization Customization

# Custom node styles
visual_nodes.node_styles.update({
    'RESEARCHER': {
        'color': '#e74c3c',
        'shape': 'star', 
        'size': 30
    }
})

# Custom relationship styles  
visual_nodes.edge_styles.update({
    'COLLABORATES_WITH': {
        'color': '#3498db',
        'width': 4
    }
})

📊 Performance & Scalability

Efficient processing: Optimized for large documents with configurable chunking
Memory management: Streaming processing for large datasets
Parallel extraction: Multi-strategy concurrent processing
Caching: Built-in result caching for repeated analyses

🤝 Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes and add tests
Run the test suite: python -m pytest
Submit a pull request

Development Setup

# Clone and setup development environment
git clone https://github.com/Adoubf/ExtractGraph.git
cd ExtractGraph
uv sync --dev

# Run tests
python -m pytest tests/

# Run examples
python -m code_examples.configurable_extraction_demo
python -m code_examples.visual_nodes_demo

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

langextract - The powerful extraction engine that powers this project
pyvis - Interactive network visualization
Neo4j - Graph database platform
Pydantic - Data validation and settings management

📞 Support

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
📧 Email: [email protected]

⭐ Star this repository if you find it helpful!

Made with ❤️ by Haoyue

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
images		images
output		output
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml

License

Adoubf/ExtractGraph

Folders and files

Latest commit

History

Repository files navigation