Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Adoubf/ExtractGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•ΈοΈ ExtractGraph

English | δΈ­ζ–‡

Python Version License: MIT Code Style: Black Built with Love

A powerful, configurable text extraction system for building knowledge graphs

Built on top of langextract with advanced configuration management and visualization capabilities

πŸš€ Quick Start β€’ πŸ“– Documentation β€’ 🎯 Examples β€’ 🀝 Contributing


✨ Features

πŸŽ›οΈ Highly Configurable Extraction

  • Strategy-based extraction with YAML configuration files
  • Multi-dimensional granularity control (breadth, depth, confidence, context scope)
  • Dynamic prompt generation using Jinja2 templates
  • Few-shot learning with customizable examples

🎨 Interactive Visualization

  • Real-time node visualization with pyvis integration
  • Pre-import data quality checks before Neo4j insertion
  • Multi-strategy comparison views for analysis
  • Customizable styling for different node types

πŸ—„οΈ Database Integration

  • Neo4j-ready data formatting with automatic Cypher generation
  • Flexible schema support for various entity types
  • Relationship mapping with configurable properties
  • Batch import capabilities with MERGE statements

πŸ”§ Developer-Friendly

  • Type-safe configuration with Pydantic models
  • Comprehensive logging and debugging support
  • Extensible architecture for custom strategies
  • Rich examples and documentation

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • OpenAI-compatible API (OpenAI, Azure OpenAI, local models, etc.)

Installation

# Clone the repository
git clone https://github.com/Adoubf/ExtractGraph.git
cd ExtractGraph

# Install with uv (recommended)
uv sync

# Or use pip
pip install -e .

Basic Setup

  1. Configure your API credentials:
cp .env.example .env
# Edit .env with your API settings
  1. Run a quick extraction:
from src.core.extractor import extractor

text = "Alice is a data scientist at TechCorp. She feels excited about the new AI project."

# Extract with default strategy
result = extractor.extract_for_neo4j(text)
print(f"Found {len(result['neo4j_data']['nodes'])} nodes and {len(result['neo4j_data']['relationships'])} relationships")
  1. Visualize the results:
from src.core.visual_nodes import visual_nodes

# Generate interactive visualization
html_path = visual_nodes.visualize_text_extraction(
    text=text,
    strategy="literary",
    save_path="output/demo.html"
)
# Open the HTML file in your browser!

πŸ—οΈ Architecture

ExtractGraph/
β”œβ”€β”€ πŸŽ›οΈ Strategy Layer      # YAML-based extraction strategies
β”œβ”€β”€ πŸ”§ Configuration Layer # Dynamic prompt generation with Jinja2  
β”œβ”€β”€ πŸ“Š Granularity Layer   # Multi-dimensional extraction control
β”œβ”€β”€ 🎨 Visualization Layer # Interactive node preview and analysis
└── πŸ—„οΈ Database Layer      # Neo4j integration with Cypher generation

Core Components

Component Description Key Features
ConfigurableExtractor Main extraction engine Strategy management, dynamic prompting
VisualNodes Visualization engine Interactive graphs, comparison views
StrategyManager Configuration management YAML loading, custom strategy creation
CypherGenerator Database integration CREATE/MERGE statement generation

πŸ“– Documentation

🎯 Extraction Strategies

Create custom extraction strategies with YAML configuration:

# strategies/custom_strategy.yaml
name: "scientific_papers"
description: "Extract entities from scientific literature"
entities:
  - "researcher"
  - "institution" 
  - "concept"
  - "method"
relations:
  - "affiliated_with"
  - "researches"  
  - "cites"
granularity:
  breadth: "comprehensive"
  depth: "inferential"
  confidence: "high"
  context_scope: "document"

🎨 Visualization Options

# Basic visualization
visual_nodes.visualize_text_extraction(text, strategy="scientific")

# Custom styling
custom_visual = VisualNodes(
    width="1200px", 
    height="800px",
    bgcolor="#f8f9fa"
)

# Comparison analysis  
visual_nodes.create_comparison_view(
    data_list=[result1, result2],
    titles=["Strategy A", "Strategy B"]
)

πŸ—„οΈ Neo4j Integration

# Generate Cypher statements
result = extractor.extract_for_neo4j_merge(text, strategy="business")

# Get CREATE statements
nodes_cypher = result['merge_statements']['nodes']
relationships_cypher = result['merge_statements']['relationships']

# Execute in Neo4j
# driver.session().run(nodes_cypher)
# driver.session().run(relationships_cypher)

🎯 Examples

πŸ“ Text Analysis Pipeline

from src.core.extractor import extractor
from src.core.visual_nodes import visual_nodes

# Multi-step analysis pipeline
text = """
Dr. Sarah Chen, a machine learning researcher at Stanford University, 
published groundbreaking work on neural networks. She collaborates 
with Dr. Mike Johnson from MIT on deep learning applications.
"""

# 1. Extract with academic strategy
result = extractor.extract_for_neo4j(text, strategy="academic")

# 2. Visualize for quality check
visual_nodes.visualize_neo4j_data(
    result['neo4j_data'], 
    title="Academic Knowledge Graph"
)

# 3. Generate database import
cypher_statements = result['cypher_statements']
print("Ready for Neo4j import!")

πŸ” Strategy Comparison

# Compare different extraction approaches
strategies = ["literary", "business", "academic"]
results = []

for strategy in strategies:
    result = extractor.extract_for_neo4j(text, strategy=strategy)
    results.append(result['neo4j_data'])

# Generate comparison visualization
visual_nodes.create_comparison_view(
    data_list=results,
    titles=[f"Strategy: {s}" for s in strategies],
    save_dir="analysis/strategy_comparison"
)

πŸŽ›οΈ Custom Configuration

# Create custom extraction with granular control
result = extractor.extract(
    text=text,
    entities=["person", "organization", "project"],
    relations=["works_at", "collaborates_with"],
    breadth="comprehensive",
    depth="inferential", 
    confidence="medium",
    context_scope="document"
)

🎨 Visualization Gallery

Basic Text Extraction

Interactive visualization of entities and relationships extracted from text.

Basic Text Extraction

Literary Strategy Analysis

Specialized extraction for literary texts with character and emotion analysis.

Literary Strategy

Business Data Visualization

Structured data ready for Neo4j import with organizational relationships.

Business Data

Strategy Comparison

Compare different extraction strategies side by side.

Literary Strategy
Business Strategy

✨ Interactive Features

All visualizations support:

  • πŸ–±οΈ Drag nodes - Adjust graph layout
  • πŸ” Hover details - View detailed information
  • 🎯 Click highlighting - Highlight connected nodes
  • πŸ“ Zoom & pan - Explore large graphs
  • βš™οΈ Physics layout - Auto-optimize positioning

πŸ”— Try It Yourself

Experience the full interactive features:

git clone https://github.com/Adoubf/extractGraph.git
cd extractGraph
uv sync
python -m code_examples.visual_nodes_demo
# Open the generated HTML files in your browser!

πŸ› οΈ Advanced Usage

Custom Strategy Development

  1. Create strategy file:
# strategies/my_domain.yaml  
name: "my_domain"
entities: ["entity1", "entity2"]
relations: ["relation1"]
# ... configuration
  1. Load and use:
strategy = strategy_manager.load_strategy("my_domain")
result = extractor.extract(text, strategy="my_domain")

Visualization Customization

# Custom node styles
visual_nodes.node_styles.update({
    'RESEARCHER': {
        'color': '#e74c3c',
        'shape': 'star', 
        'size': 30
    }
})

# Custom relationship styles  
visual_nodes.edge_styles.update({
    'COLLABORATES_WITH': {
        'color': '#3498db',
        'width': 4
    }
})

πŸ“Š Performance & Scalability

  • Efficient processing: Optimized for large documents with configurable chunking
  • Memory management: Streaming processing for large datasets
  • Parallel extraction: Multi-strategy concurrent processing
  • Caching: Built-in result caching for repeated analyses

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run the test suite: python -m pytest
  5. Submit a pull request

Development Setup

# Clone and setup development environment
git clone https://github.com/Adoubf/ExtractGraph.git
cd ExtractGraph
uv sync --dev

# Run tests
python -m pytest tests/

# Run examples
python -m code_examples.configurable_extraction_demo
python -m code_examples.visual_nodes_demo

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • langextract - The powerful extraction engine that powers this project
  • pyvis - Interactive network visualization
  • Neo4j - Graph database platform
  • Pydantic - Data validation and settings management

πŸ“ž Support


⭐ Star this repository if you find it helpful!

Made with ❀️ by Haoyue

About

A powerful, configurable text extraction system for building knowledge graphs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published