A full-stack web application designed to detect, visualize, and manage duplicates within your Appwrite projects' databases and storage buckets using AI-powered algorithms and perceptual hashing, featuring a gamified user experience.
Live App Link - https://appwrite-ai-duplicates-detector-aadd.appwrite.network/
Live Demo Link - https://youtu.be/VjmDfk_CCQ8
Managing data effectively often involves dealing with duplicate entries or files, which can consume storage space and complicate data processing. AADD provides an intelligent solution specifically tailored for Appwrite users.
Connect your Appwrite projects securely, let the AI scan for textual and file duplicates (images, videos, audio, documents, etc.), and manage them through an intuitive interface. Enhance your data hygiene with features like scheduled scans and track your progress with the engaging "AI Garden" gamification system.
Utilizes Sentence Transformers (all-MiniLM-L6-v2) to find similar documents within Appwrite Database collections based on semantic meaning.
Detects duplicates and near-duplicates for various file types in Appwrite Storage:
| File Type | Detection Method |
|---|---|
| Images | Perceptual hashing (pHash, aHash, dHash) and color histograms via ImageHash |
| Videos | Feature extraction (ORB descriptors) from sampled frames using OpenCV |
| Audio | Mel-frequency cepstral coefficients (MFCC) analysis via Librosa |
| PDFs | Text extraction and embedding comparison using PyPDF2 and Sentence Transformers |
| Documents (.doc, .docx, .txt) | Text embedding comparison |
| Tables (.csv, .xlsx) | Content extraction and embedding comparison using pandas |
| Presentations (.pptx) | Text and image content extraction and hashing using python-pptx |
Exact Duplicates: Fallback to MD5 hashing for unsupported types or error cases.
- Multi-Project Connectivity: Securely connect and manage multiple Appwrite projects from a single dashboard
- Secure Credential Handling: User-provided Appwrite API keys are encrypted using Fernet symmetric encryption before being stored
- User Authentication: Full auth flow including signup, login, email verification, and profile management powered by Appwrite Auth
- Profile Management: Update name, email, and upload profile picture
- Intuitive Dashboard: View connected projects and initiate scans
- Detailed Results Page: Filtering, sorting, and similarity scores
- Data Visualizations: Circle Packing charts to understand duplicate distribution (via
@nivo) - Bulk Operations: Select and delete multiple duplicates at once
- Source Deletion: Option to delete duplicates directly from your Appwrite project
- Activity Logging: Tracks user actions like project connections, scans, and deletions
- Scheduled Reminders: Configure automated scans (hourly, daily, weekly, etc.)
- Email Notifications: Receive alerts upon scan completion
- Project-Specific Scheduling: Set different schedules for different projects/services
- Visual Health Representation: See your data health through dynamic plant visualizations
- SVG Animations: Engaging, animated garden that reflects your scanning and cleaning activity
- AI Gardener Chat: Powered by Google Gemini API for tips and encouragement
- Progress Tracking: Monitor your data hygiene improvements over time
- Responsive Design: Works seamlessly on desktop, tablet, and mobile
- Smooth Animations: Built with Framer Motion for delightful interactions
- Modern Components: Utilizing shadcn/ui and Tailwind CSS
- Dark Mode Optimized: Beautiful dark theme for comfortable viewing
Additional Tools:
- State Management: React Context API (
useAuth)
Duplicate Detection Libraries:
| Category | Libraries |
|---|---|
| Text | sentence-transformers |
| Images | Pillow, imagehash |
| Video | opencv-python |
| Audio | librosa |
PyPDF2 |
|
| Tables | pandas, openpyxl |
| Presentations | python-pptx |
smtplib (Standard Library) |
Services Used:
- Authentication
- Database
- Storage
Create an account or log in using your email and password. Verify your email if it's your first time signing up.
Navigate to the "Connect Project" page and enter:
- Project ID
- API Endpoint
- API Key for the Appwrite project you want to scan
View your connected projects with quick stats, manage automated scan reminders, and access recent activities.
π‘π’π§π: The emails may land in SPAM, so keep checking for it there.
From the Dashboard, click "Duplicates" on a project card to navigate to the project overview page.
On the project overview page (/duplicates/<projectId>), select:
- Storage: Scan all buckets
- Database ID: Input a database ID to scan
- Optionally load collections and scan specific collections or entire database
Click a "Scan" button to navigate to the results page (/duplicates/<projectId>/<service>). The scan will automatically trigger and display:
- Loading state during scan
- Detected duplicates with similarity scores
- Visual representations of duplicate distribution
Filter & Sort:
- Use the search bar to find specific duplicates
- Sort by similarity, date, or file size
Bulk Operations:
- Select duplicates using checkboxes
- Use "Select All" / "Deselect All" for bulk actions
Delete Duplicates:
- Click "Delete Selected"
- Choose deletion mode:
- Delete from source: Removes actual files/documents from your Appwrite project
- Remove from list: Only removes from AADD tracking
- Confirm the deletion
Visit the "AI Garden" page to:
- View your data health visualization
- Check cleaning statistics
- Chat with the AI Gardener for motivation and tips
Manage your account:
- Update name and email
- Upload profile picture
- Delete account (with comprehensive cleanup)
Review a detailed log of all your actions within the AADD application, including:
- Project connections
- Scan operations
- Deletion activities
- Configuration changes
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Next.js ββββββββββΆβ Flask ββββββββββΆβ Appwrite β
β Frontend β API β Backend β SDK β Service β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β
βΌ
βββββββββββββββ
β AI/ML β
β Libraries β
βββββββββββββββ
- User Authentication: Handled by Appwrite Auth
- Project Connection: API keys encrypted and stored in Appwrite Database
- Scan Request: Frontend triggers scan via Flask API
- Duplicate Detection: Backend fetches data from user's Appwrite project and runs AI algorithms
- Results Storage: Duplicates stored in AADD's Appwrite database
- Visualization: Results fetched and displayed with interactive charts
- Encryption at Rest: API keys encrypted using Fernet
- Secure Communication: HTTPS for all API calls
- Token-based Auth: JWT tokens for session management
- Input Validation: All user inputs sanitized
- Rate Limiting: Protection against abuse
- Appwrite Team for the amazing backend platform
- Google Gemini for AI capabilities
- Open Source Community for the incredible libraries used in this project
Made with β€οΈ by Devika Harshey
β Star on GitHub if you find this project useful!