This repository contains a structured database of listings from Personal Care Suppliers.
This repository includes automated GitHub Actions that monitor and facilitate communication between AI agents working on the project. Messages between agents are automatically detected and processed.
π See AUTOMATION.md for details on how the automated message monitoring works.
The repository organizes listings by category path in a hierarchical directory structure:
βββ source_pages.json # Master list of source pages to track
βββ listings/ # Supplier listing JSON files
β βββ Raw_Materials/
β β βββ Actives/
β β βββ 1828_*.json
β βββ Business_Services/
β β βββ Auditing/
β β β βββ 1790_*.json
β β βββ Contract_Manufacturing/
β β βββ 1790_*.json
β βββ Equipment/
β β βββ Tanks/
β β βββ 1801_*.json
β βββ Labels__Sleeves/
β βββ Stretch_Sleeve/
β βββ 1800_*.json
βββ scripts/ # Automation tools
β βββ validation/ # Data quality and validation tools
β βββ scraper/ # Web scraping tools (planned)
β βββ import/ # Batch import utilities (planned)
β βββ reporting/ # Analytics and reporting (planned)
βββ tests/ # Test suite
β βββ fixtures/ # Sample data for testing
β βββ test_validation.py # Validation tool tests
βββ docs/ # Documentation
βββ DATA_QUALITY.md # Data quality standards
βββ CONTRIBUTING.md # Contribution guidelines
βββ SCRAPER_GUIDE.md # Scraper documentation (planned)
βββ API_DESIGN.md # Future API design
The source_pages.json file contains a comprehensive list of 313 source pages from personalcaresuppliers.com that should be tracked. This includes:
- Informational pages: Homepage, guides (CUI, Help), and media kit
- Category listing pages: 309 category-specific listing pages across all major product and service categories
This file serves as a reference for scraping, crawling, or monitoring activities.
Each listing is stored as a JSON file following schema version 1.0:
{
"schema_version": "1.0",
"category_id": 1828,
"listing_id": "company_name",
"category_path": "Raw_Materials/Actives",
"url": "https://personalcaresuppliers.com/Listing/Index/Raw_Materials/Actives/1828/",
"company_name": "Example Company Inc.",
"status": "active",
"date_added": "2025-11-03",
"date_updated": "2025-11-03",
"tags": ["organic", "sustainable"],
"metadata": {
"last_validated": "2025-11-03",
"validation_method": "manual",
"data_source": "manual_entry"
}
}schema_version: Version of the schema being used (currently "1.0")category_id: The numeric identifier for the categorylisting_id: The unique identifier for the listingcategory_path: The hierarchical path of the category (using underscores for spaces)url: The full URL to the listing on personalcaresuppliers.comstatus: Status of the listing (e.g., "active", "inactive", "pending", "archived")date_added: Date when the listing was added to this database (YYYY-MM-DD format)
date_updated: Date when last updated (YYYY-MM-DD)company_name: Name of the company/supplieraddress,country,phone,email,website: Contact informationspecializations: Array of specializations or product categoriescertifications: Array of certifications heldproduct_highlights: Key products or product linestags: Array of classification tags for filtering (e.g., "oat-specialist", "organic", "major-player")metadata: Tracking information (last_validated, validation_method, data_source, quality_score)notes: Additional notes about the listing
See docs/DATA_QUALITY.md for complete field documentation.
Listing files are named using the pattern: {category_id}_{listing_id}.json
For example: 1828_1102292.json
The database currently contains 15 listings across the following categories:
- Raw Materials β Actives (8 listings) - Category ID: 1828
- Business Services β Contract Manufacturing (4 listings) - Category ID: 1790
- Business Services β Auditing (1 listing) - Category ID: 1790
- Equipment β Tanks (1 listing) - Category ID: 1801
- Labels & Sleeves β Stretch Sleeve (1 listing) - Category ID: 1800
See LISTINGS_INDEX.md for a complete index.
Validate all listings against schema and quality standards:
python3 scripts/validation/validate_listings.pySee scripts/README.md for all available tools.
python3 tests/test_validation.pySee tests/README.md for test documentation.
- docs/DATA_QUALITY.md - Data quality standards and validation rules
- docs/CONTRIBUTING.md - Guidelines for adding new listings
- docs/SCRAPER_GUIDE.md - Web scraping documentation (coming Week 2)
- docs/API_DESIGN.md - Future API specifications
See docs/CONTRIBUTING.md for detailed guidelines.
Quick steps:
- Create the appropriate category directory structure under
listings/if it doesn't exist - Create a new JSON file following the naming convention:
{category_id}_{listing_id}.json - Use schema version 1.0 and include all required fields
- Add enhanced fields for better quality (especially for strategic suppliers)
- Validate your listing:
python3 scripts/validation/validate_listings.py - Submit a pull request with your additions
Example listing:
{
"schema_version": "1.0",
"category_id": 1828,
"listing_id": "example_company",
"category_path": "Raw_Materials/Actives",
"url": "https://personalcaresuppliers.com/Listing/Index/Raw_Materials/Actives/1828/",
"company_name": "Example Company",
"status": "active",
"date_added": "2025-11-03",
"date_updated": "2025-11-03",
"metadata": {
"last_validated": "2025-11-03",
"validation_method": "manual",
"data_source": "manual_entry"
}
}See LICENSE file for details.