Docify is a web application that allows users to analyze any website using AI-powered content extraction and visualization. Users can input a URL and instructions, and the system will scrape the content, analyze it with Google's Gemini AI, and present the results in interactive charts and summaries.
- π Raw Content Preservation: Saves exact browserless HTML without dangerous cleaning
- π€ AI-Generated Titles: Smart 2-4 word titles using Gemini AI
- π Readable Summaries: Human-friendly summaries up to 200 characters
- π Format Compatibility: Same JSON blocks format as original analyzer
- π― 8-Step Linear Process: Clear, reliable processing pipeline
- π‘οΈ Error Recovery: Graceful failure handling with status updates
- π Universal Web Scraping: Extract content from any website
- π Interactive Visualizations: Automatic generation of Mermaid diagrams and charts
- π± Responsive Design: Works on all devices with adaptive grid layouts
graph TD
A[User Submits URL] --> B[Create Document Record]
B --> C[Trigger Unified Function]
C --> D[Extract Document Data]
D --> E[Validate Environment]
E --> F[Raw Browserless Scraping]
F --> G[Save Raw Content]
G --> H[Generate AI Title]
H --> I[Generate Analysis]
I --> J[Create Compatible Blocks]
J --> K[Final Save & Complete]
K --> L[Display Results]
style A fill:#14b8a6,color:#ffffff
style K fill:#14b8a6,color:#ffffff
style L fill:#14b8a6,color:#ffffff
βββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββ
β User Input β β Unified Function β β Results View β
β (Frontend) βββββΆβ (Appwrite) βββββΆβ (Frontend) β
β β β 8-Step Process β β β
βββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Document β β Gemini AI β β Browserless β
β Creation β β (Analysis) β β (Scraping) β
β (Database) β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- User submits URL β Document record created
- Unified function triggered β 8-step processing begins
- Raw content scraped β Exact HTML preserved
- AI analysis performed β Gemini generates insights
- Results formatted β Compatible with existing frontend
- Document updated β Ready for display
- Frontend: SvelteKit with TypeScript
- Backend: Appwrite Functions (Python)
- Database: Appwrite Database (consolidated schema)
- AI: Google Gemini 2.5 Pro
- Scraping: Browserless.io + Requests
- Hosting: Vercel (frontend) + Appwrite Cloud (backend)
- Node.js 18+
- npm or yarn
- Python 3.9+ (for function development)
- Appwrite account and project
- Google Gemini API key
- Browserless.io API key (optional, enhances scraping)
- Create a new project on Appwrite Cloud
- Note your Project ID and API Endpoint
- Enable the following services:
- Databases
- Functions
- Storage (optional)
Create a single consolidated collection in your Appwrite database:
{
"name": "documents_table",
"permissions": ["create", "read", "update"],
"attributes": [
{"key": "user_id", "type": "string", "size": 36, "required": true},
{"key": "title", "type": "string", "size": 255, "required": false},
{"key": "url", "type": "string", "required": true},
{"key": "instructions", "type": "string", "size": 1000, "required": true},
{"key": "status", "type": "enum", "elements": ["pending", "scraping", "analyzing", "completed", "failed"], "required": true},
{"key": "public", "type": "boolean", "default": false},
{"key": "scraped_content", "type": "string", "size": 99999, "required": false},
{"key": "analysis_summary", "type": "string", "size": 2000, "required": false},
{"key": "analysis_blocks", "type": "string", "size": 99999, "required": false},
{"key": "gemini_tools_used", "type": "string", "size": 1000, "required": false},
{"key": "research_context", "type": "string", "size": 5000, "required": false},
{"key": "$createdAt", "type": "datetime", "required": true},
{"key": "$updatedAt", "type": "datetime", "required": true}
]
}Key Changes:
- Single Collection: All data consolidated into one table
- AI-Generated Titles:
titlefield now contains AI-generated 2-4 word titles - Raw Content:
scraped_contentstores exact browserless HTML - Compatible Format:
analysis_blocksmaintains same JSON structure as original analyzer - Enhanced Fields: Added
gemini_tools_usedandresearch_contextfor tracking
Create environment files for both frontend and backend:
# Appwrite Configuration
NEXT_PUBLIC_APPWRITE_ENDPOINT=https://your-region.cloud.appwrite.io/v1
NEXT_PUBLIC_APPWRITE_PROJECT_ID=your-project-id
NEXT_PUBLIC_APPWRITE_DATABASE_ID=your-database-id
NEXT_PUBLIC_APPWRITE_DOCUMENTS_COLLECTION_ID=documents_table
# OAuth Configuration (if using social login)
NEXT_PUBLIC_APPWRITE_OAUTH_SUCCESS_URL=http://localhost:5173/auth/success
NEXT_PUBLIC_APPWRITE_OAUTH_FAILURE_URL=http://localhost:5173/auth/errorSet these in your Appwrite function configuration:
# Required
GEMINI_API_KEY=your-gemini-api-key
DATABASE_ID=your-database-id
DOCUMENTS_COLLECTION_ID=documents_table
# Optional (enhances scraping)
BROWSERLESS_API_KEY=your-browserless-api-keyDeploy the unified orchestrator function:
# Install Appwrite CLI
npm install -g appwrite-cli
# Login to Appwrite
appwrite login
# Navigate to function directory
cd functions/docify-unified-orchestrator
# Deploy the unified function
appwrite functions create-deployment \
--function-id docify-unified-orchestrator \
--activate true \
--code .Function Details:
- Name: Docify Unified Orchestrator v3.0
- Runtime: Python 3.9
- Trigger: Database events on document creation
- Timeout: 500 seconds (for 8-step process)
- Memory: 1024MB
Note: The unified function replaces the previous separate scraper and analyzer functions.
cd docify-website
npm install
npm run dev- Navigate to the main page of your application
- Enter a URL you want to analyze
- Provide analysis instructions (e.g., "Create a visual overview of the API endpoints")
- Click "Create Document"
The unified function executes 8 sequential steps:
- π Extract Document Data - Parse request and validate inputs
- π Validate Environment - Check API keys and configuration
- π Raw Browserless Scraping - Scrape content without modification
- πΎ Save Raw Content - Store exact HTML in database
- π·οΈ Generate AI Title - Create 2-4 word intelligent titles
- π Generate Analysis - Produce comprehensive AI analysis
- π§© Create Compatible Blocks - Format blocks for frontend
- β Final Save & Complete - Update database and mark complete
The system will:
- Preserve raw HTML content without dangerous cleaning
- Generate AI-powered 2-4 word titles
- Analyze content using Google Gemini AI
- Create multiple content blocks in compatible JSON format:
- Summary of the document (β€200 chars)
- Mermaid diagrams and flowcharts
- Code examples with syntax highlighting
- Key points and highlights
- API references and guides
- Troubleshooting and best practices
- Summary: High-level overview
- Mermaid: Visual diagrams and flowcharts
- Code: Code examples with syntax highlighting
- Key Points: Important highlights and takeaways
- API Reference: API documentation
- Guide: Step-by-step instructions
- Architecture: System/component diagrams
- Best Practices: Recommendations
- Troubleshooting: Common issues and solutions
GEMINI_API_KEY: Your Google Gemini API keyDATABASE_ID: Your Appwrite database IDDOCUMENTS_COLLECTION_ID: Documents table ID (documents_table)
BROWSERLESS_API_KEY: Browserless.io API key for enhanced scraping
The function updates document status through 5 stages:
pendingβ Document created, waiting for processingscrapingβ Currently scraping content from URLanalyzingβ Scraping complete, analyzing with Geminicompletedβ Analysis complete, ready for displayfailedβ Processing failed (can be retried)
Edit the analysis prompt in functions/docify-unified-orchestrator/src/main.py to customize how Gemini analyzes documents. The prompt includes instructions for generating compatible JSON blocks.
Triggers the unified document processing pipeline.
Request Body:
{
"documentId": "document-id",
"url": "https://example.com",
"instructions": "Analyze this documentation and create visual diagrams"
}Response:
{
"success": true,
"executionId": "execution-id",
"message": "Unified processing started - 8 steps will be executed"
}The unified function is automatically triggered when:
- Document Creation:
databases.docify_db.collections.documents_table.documents.*.create - Status Updates: Automatic progression through processing stages
-
Function Deployment Fails: Ensure Python 3.9+ runtime is selected and all dependencies are installed.
-
Gemini API Errors: Check your
GEMINI_API_KEYand ensure you have API quota remaining. -
Browserless Scraping Fails: Some websites block scraping. Try without
BROWSERLESS_API_KEYor use different URLs. -
Database Connection Issues: Verify your Appwrite database configuration and collection permissions.
-
Function Timeouts: The 8-step process may take time. The default 500s timeout should handle most documents.
Monitor function logs through the Appwrite Console:
appwrite functions logs --function-id docify-unified-orchestratorCheck document status in your database to see processing progress through the 5 stages: pending β scraping β analyzing β completed/failed.
- OAuth authentication with Google and GitHub
- User-based data isolation in database
- API keys stored securely as environment variables
- Raw content preservation maintains original security context
- Function execution limited to authorized users only
-
Appwrite Setup:
- Create production project on Appwrite Cloud
- Set up database with consolidated schema
- Configure OAuth providers (Google, GitHub)
-
Function Deployment:
cd functions/docify-unified-orchestrator appwrite functions create-deployment --function-id docify-unified-orchestrator --activate true --code .
-
Frontend Deployment:
cd docify-website npm run build npm run preview # or deploy to Vercel/Netlify
-
Environment Configuration:
- Set production API keys
- Configure production database
- Set up monitoring and alerts
- Function Limits: 500s timeout, 1024MB memory for complex analyses
- Gemini API: Monitor usage and costs
- Database: Consolidated schema reduces query complexity
- Browserless: Optional enhancement for difficult sites
- Storage: Raw content preservation requires adequate storage
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Check the troubleshooting section above
- Review the Appwrite documentation
- Monitor function logs:
appwrite functions logs --function-id docify-unified-orchestrator - Check document status in database for processing progress
- Single Function: Replaced separate scraper + analyzer with unified orchestrator
- 8-Step Process: Clear, linear processing pipeline
- Raw Content: Preserves exact HTML without dangerous cleaning
- AI Titles: Smart 2-4 word titles using Gemini
- Google Gemini: Latest AI model with advanced capabilities
- Compatible Format: Same JSON blocks as original analyzer
- Error Recovery: Graceful failure handling with status updates
- Simple Tools: Clean tracking of AI tool usage
- Consolidated Schema: Single table for all document data
- Removed Fields: Cleaned up unused attributes (13/17 used)
- Enhanced Fields: Added tracking for tools and research context
Built with β€οΈ using Appwrite, SvelteKit, Google Gemini, and Python