Thanks to visit codestin.com
Credit goes to github.com

Skip to content

h1Anonymoush1/Docify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Docify - AI-Powered Document Analysis

Docify is a web application that allows users to analyze any website using AI-powered content extraction and visualization. Users can input a URL and instructions, and the system will scrape the content, analyze it with Google's Gemini AI, and present the results in interactive charts and summaries.

🎯 Key Features

  • πŸ”’ Raw Content Preservation: Saves exact browserless HTML without dangerous cleaning
  • πŸ€– AI-Generated Titles: Smart 2-4 word titles using Gemini AI
  • πŸ“ Readable Summaries: Human-friendly summaries up to 200 characters
  • πŸ”— Format Compatibility: Same JSON blocks format as original analyzer
  • 🎯 8-Step Linear Process: Clear, reliable processing pipeline
  • πŸ›‘οΈ Error Recovery: Graceful failure handling with status updates
  • 🌐 Universal Web Scraping: Extract content from any website
  • πŸ“Š Interactive Visualizations: Automatic generation of Mermaid diagrams and charts
  • πŸ“± Responsive Design: Works on all devices with adaptive grid layouts

πŸ”„ How It Works

graph TD
    A[User Submits URL] --> B[Create Document Record]
    B --> C[Trigger Unified Function]
    C --> D[Extract Document Data]
    D --> E[Validate Environment]
    E --> F[Raw Browserless Scraping]
    F --> G[Save Raw Content]
    G --> H[Generate AI Title]
    H --> I[Generate Analysis]
    I --> J[Create Compatible Blocks]
    J --> K[Final Save & Complete]
    K --> L[Display Results]

    style A fill:#14b8a6,color:#ffffff
    style K fill:#14b8a6,color:#ffffff
    style L fill:#14b8a6,color:#ffffff
Loading

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Input    β”‚    β”‚   Unified Function   β”‚    β”‚   Results View  β”‚
β”‚   (Frontend)    │───▢│   (Appwrite)         │───▢│   (Frontend)    β”‚
β”‚                 β”‚    β”‚   8-Step Process     β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Document      β”‚    β”‚   Gemini AI     β”‚    β”‚   Browserless   β”‚
β”‚   Creation      β”‚    β”‚   (Analysis)    β”‚    β”‚   (Scraping)    β”‚
β”‚   (Database)    β”‚    β”‚                 β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ Processing Flow

  1. User submits URL β†’ Document record created
  2. Unified function triggered β†’ 8-step processing begins
  3. Raw content scraped β†’ Exact HTML preserved
  4. AI analysis performed β†’ Gemini generates insights
  5. Results formatted β†’ Compatible with existing frontend
  6. Document updated β†’ Ready for display

πŸ› οΈ Technical Stack

  • Frontend: SvelteKit with TypeScript
  • Backend: Appwrite Functions (Python)
  • Database: Appwrite Database (consolidated schema)
  • AI: Google Gemini 2.5 Pro
  • Scraping: Browserless.io + Requests
  • Hosting: Vercel (frontend) + Appwrite Cloud (backend)

πŸ“‹ Prerequisites

  • Node.js 18+
  • npm or yarn
  • Python 3.9+ (for function development)
  • Appwrite account and project
  • Google Gemini API key
  • Browserless.io API key (optional, enhances scraping)

πŸ› οΈ Setup Instructions

1. Appwrite Project Setup

  1. Create a new project on Appwrite Cloud
  2. Note your Project ID and API Endpoint
  3. Enable the following services:
    • Databases
    • Functions
    • Storage (optional)

2. Database Configuration

Create a single consolidated collection in your Appwrite database:

Documents Collection (Consolidated)

{
  "name": "documents_table",
  "permissions": ["create", "read", "update"],
  "attributes": [
    {"key": "user_id", "type": "string", "size": 36, "required": true},
    {"key": "title", "type": "string", "size": 255, "required": false},
    {"key": "url", "type": "string", "required": true},
    {"key": "instructions", "type": "string", "size": 1000, "required": true},
    {"key": "status", "type": "enum", "elements": ["pending", "scraping", "analyzing", "completed", "failed"], "required": true},
    {"key": "public", "type": "boolean", "default": false},
    {"key": "scraped_content", "type": "string", "size": 99999, "required": false},
    {"key": "analysis_summary", "type": "string", "size": 2000, "required": false},
    {"key": "analysis_blocks", "type": "string", "size": 99999, "required": false},
    {"key": "gemini_tools_used", "type": "string", "size": 1000, "required": false},
    {"key": "research_context", "type": "string", "size": 5000, "required": false},
    {"key": "$createdAt", "type": "datetime", "required": true},
    {"key": "$updatedAt", "type": "datetime", "required": true}
  ]
}

Key Changes:

  • Single Collection: All data consolidated into one table
  • AI-Generated Titles: title field now contains AI-generated 2-4 word titles
  • Raw Content: scraped_content stores exact browserless HTML
  • Compatible Format: analysis_blocks maintains same JSON structure as original analyzer
  • Enhanced Fields: Added gemini_tools_used and research_context for tracking

3. Environment Variables

Create environment files for both frontend and backend:

Frontend (.env.local in docify-website/)

# Appwrite Configuration
NEXT_PUBLIC_APPWRITE_ENDPOINT=https://your-region.cloud.appwrite.io/v1
NEXT_PUBLIC_APPWRITE_PROJECT_ID=your-project-id
NEXT_PUBLIC_APPWRITE_DATABASE_ID=your-database-id
NEXT_PUBLIC_APPWRITE_DOCUMENTS_COLLECTION_ID=documents_table

# OAuth Configuration (if using social login)
NEXT_PUBLIC_APPWRITE_OAUTH_SUCCESS_URL=http://localhost:5173/auth/success
NEXT_PUBLIC_APPWRITE_OAUTH_FAILURE_URL=http://localhost:5173/auth/error

Backend Function Environment Variables

Set these in your Appwrite function configuration:

# Required
GEMINI_API_KEY=your-gemini-api-key
DATABASE_ID=your-database-id
DOCUMENTS_COLLECTION_ID=documents_table

# Optional (enhances scraping)
BROWSERLESS_API_KEY=your-browserless-api-key

4. Deploy Unified Function

Deploy the unified orchestrator function:

# Install Appwrite CLI
npm install -g appwrite-cli

# Login to Appwrite
appwrite login

# Navigate to function directory
cd functions/docify-unified-orchestrator

# Deploy the unified function
appwrite functions create-deployment \
  --function-id docify-unified-orchestrator \
  --activate true \
  --code .

Function Details:

  • Name: Docify Unified Orchestrator v3.0
  • Runtime: Python 3.9
  • Trigger: Database events on document creation
  • Timeout: 500 seconds (for 8-step process)
  • Memory: 1024MB

Note: The unified function replaces the previous separate scraper and analyzer functions.

5. Frontend Setup

cd docify-website
npm install
npm run dev

🎯 Usage

Creating a Document

  1. Navigate to the main page of your application
  2. Enter a URL you want to analyze
  3. Provide analysis instructions (e.g., "Create a visual overview of the API endpoints")
  4. Click "Create Document"

8-Step Processing Flow

The unified function executes 8 sequential steps:

  1. πŸ“‹ Extract Document Data - Parse request and validate inputs
  2. πŸ“ Validate Environment - Check API keys and configuration
  3. 🌐 Raw Browserless Scraping - Scrape content without modification
  4. πŸ’Ύ Save Raw Content - Store exact HTML in database
  5. 🏷️ Generate AI Title - Create 2-4 word intelligent titles
  6. πŸ“ˆ Generate Analysis - Produce comprehensive AI analysis
  7. 🧩 Create Compatible Blocks - Format blocks for frontend
  8. βœ… Final Save & Complete - Update database and mark complete

Analysis Results

The system will:

  1. Preserve raw HTML content without dangerous cleaning
  2. Generate AI-powered 2-4 word titles
  3. Analyze content using Google Gemini AI
  4. Create multiple content blocks in compatible JSON format:
    • Summary of the document (≀200 chars)
    • Mermaid diagrams and flowcharts
    • Code examples with syntax highlighting
    • Key points and highlights
    • API references and guides
    • Troubleshooting and best practices

Content Block Types

  • Summary: High-level overview
  • Mermaid: Visual diagrams and flowcharts
  • Code: Code examples with syntax highlighting
  • Key Points: Important highlights and takeaways
  • API Reference: API documentation
  • Guide: Step-by-step instructions
  • Architecture: System/component diagrams
  • Best Practices: Recommendations
  • Troubleshooting: Common issues and solutions

πŸ”§ Configuration

Unified Function Environment Variables

Required Variables

  • GEMINI_API_KEY: Your Google Gemini API key
  • DATABASE_ID: Your Appwrite database ID
  • DOCUMENTS_COLLECTION_ID: Documents table ID (documents_table)

Optional Variables

  • BROWSERLESS_API_KEY: Browserless.io API key for enhanced scraping

Status Tracking

The function updates document status through 5 stages:

  • pending β†’ Document created, waiting for processing
  • scraping β†’ Currently scraping content from URL
  • analyzing β†’ Scraping complete, analyzing with Gemini
  • completed β†’ Analysis complete, ready for display
  • failed β†’ Processing failed (can be retried)

Customizing the AI Analysis

Edit the analysis prompt in functions/docify-unified-orchestrator/src/main.py to customize how Gemini analyzes documents. The prompt includes instructions for generating compatible JSON blocks.

πŸ“Š API Endpoints

POST /functions/docify-unified-orchestrator/executions

Triggers the unified document processing pipeline.

Request Body:

{
  "documentId": "document-id",
  "url": "https://example.com",
  "instructions": "Analyze this documentation and create visual diagrams"
}

Response:

{
  "success": true,
  "executionId": "execution-id",
  "message": "Unified processing started - 8 steps will be executed"
}

Database Event Triggers

The unified function is automatically triggered when:

  • Document Creation: databases.docify_db.collections.documents_table.documents.*.create
  • Status Updates: Automatic progression through processing stages

πŸ› Troubleshooting

Common Issues

  1. Function Deployment Fails: Ensure Python 3.9+ runtime is selected and all dependencies are installed.

  2. Gemini API Errors: Check your GEMINI_API_KEY and ensure you have API quota remaining.

  3. Browserless Scraping Fails: Some websites block scraping. Try without BROWSERLESS_API_KEY or use different URLs.

  4. Database Connection Issues: Verify your Appwrite database configuration and collection permissions.

  5. Function Timeouts: The 8-step process may take time. The default 500s timeout should handle most documents.

Debug Mode

Monitor function logs through the Appwrite Console:

appwrite functions logs --function-id docify-unified-orchestrator

Check document status in your database to see processing progress through the 5 stages: pending β†’ scraping β†’ analyzing β†’ completed/failed.

πŸ”’ Security

  • OAuth authentication with Google and GitHub
  • User-based data isolation in database
  • API keys stored securely as environment variables
  • Raw content preservation maintains original security context
  • Function execution limited to authorized users only

πŸš€ Deployment

Production Deployment

  1. Appwrite Setup:

    • Create production project on Appwrite Cloud
    • Set up database with consolidated schema
    • Configure OAuth providers (Google, GitHub)
  2. Function Deployment:

    cd functions/docify-unified-orchestrator
    appwrite functions create-deployment --function-id docify-unified-orchestrator --activate true --code .
  3. Frontend Deployment:

    cd docify-website
    npm run build
    npm run preview  # or deploy to Vercel/Netlify
  4. Environment Configuration:

    • Set production API keys
    • Configure production database
    • Set up monitoring and alerts

Scaling Considerations

  • Function Limits: 500s timeout, 1024MB memory for complex analyses
  • Gemini API: Monitor usage and costs
  • Database: Consolidated schema reduces query complexity
  • Browserless: Optional enhancement for difficult sites
  • Storage: Raw content preservation requires adequate storage

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

For support and questions:

  • Check the troubleshooting section above
  • Review the Appwrite documentation
  • Monitor function logs: appwrite functions logs --function-id docify-unified-orchestrator
  • Check document status in database for processing progress

πŸ“ˆ Key Improvements (v3.0)

πŸ”„ Unified Architecture

  • Single Function: Replaced separate scraper + analyzer with unified orchestrator
  • 8-Step Process: Clear, linear processing pipeline
  • Raw Content: Preserves exact HTML without dangerous cleaning
  • AI Titles: Smart 2-4 word titles using Gemini

πŸ€– Enhanced AI

  • Google Gemini: Latest AI model with advanced capabilities
  • Compatible Format: Same JSON blocks as original analyzer
  • Error Recovery: Graceful failure handling with status updates
  • Simple Tools: Clean tracking of AI tool usage

πŸ—„οΈ Database Optimization

  • Consolidated Schema: Single table for all document data
  • Removed Fields: Cleaned up unused attributes (13/17 used)
  • Enhanced Fields: Added tracking for tools and research context

Built with ❀️ using Appwrite, SvelteKit, Google Gemini, and Python