Thanks to visit codestin.com
Credit goes to github.com

Skip to content

hyperholmes/pcsdbx

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Personal Care Suppliers Database (pcsdbx)

This repository contains a structured database of listings from Personal Care Suppliers.

πŸ€– Automated Agent Communication

This repository includes automated GitHub Actions that monitor and facilitate communication between AI agents working on the project. Messages between agents are automatically detected and processed.

πŸ‘‰ See AUTOMATION.md for details on how the automated message monitoring works.

Repository Structure

The repository organizes listings by category path in a hierarchical directory structure:

β”œβ”€β”€ source_pages.json          # Master list of source pages to track
β”œβ”€β”€ listings/                  # Supplier listing JSON files
β”‚   β”œβ”€β”€ Raw_Materials/
β”‚   β”‚   └── Actives/
β”‚   β”‚       └── 1828_*.json
β”‚   β”œβ”€β”€ Business_Services/
β”‚   β”‚   β”œβ”€β”€ Auditing/
β”‚   β”‚   β”‚   └── 1790_*.json
β”‚   β”‚   └── Contract_Manufacturing/
β”‚   β”‚       └── 1790_*.json
β”‚   β”œβ”€β”€ Equipment/
β”‚   β”‚   └── Tanks/
β”‚   β”‚       └── 1801_*.json
β”‚   └── Labels__Sleeves/
β”‚       └── Stretch_Sleeve/
β”‚           └── 1800_*.json
β”œβ”€β”€ scripts/                   # Automation tools
β”‚   β”œβ”€β”€ validation/            # Data quality and validation tools
β”‚   β”œβ”€β”€ scraper/               # Web scraping tools (planned)
β”‚   β”œβ”€β”€ import/                # Batch import utilities (planned)
β”‚   └── reporting/             # Analytics and reporting (planned)
β”œβ”€β”€ tests/                     # Test suite
β”‚   β”œβ”€β”€ fixtures/              # Sample data for testing
β”‚   └── test_validation.py     # Validation tool tests
└── docs/                      # Documentation
    β”œβ”€β”€ DATA_QUALITY.md        # Data quality standards
    β”œβ”€β”€ CONTRIBUTING.md        # Contribution guidelines
    β”œβ”€β”€ SCRAPER_GUIDE.md       # Scraper documentation (planned)
    └── API_DESIGN.md          # Future API design

Source Pages

The source_pages.json file contains a comprehensive list of 313 source pages from personalcaresuppliers.com that should be tracked. This includes:

  • Informational pages: Homepage, guides (CUI, Help), and media kit
  • Category listing pages: 309 category-specific listing pages across all major product and service categories

This file serves as a reference for scraping, crawling, or monitoring activities.

Data Format

Each listing is stored as a JSON file following schema version 1.0:

{
  "schema_version": "1.0",
  "category_id": 1828,
  "listing_id": "company_name",
  "category_path": "Raw_Materials/Actives",
  "url": "https://personalcaresuppliers.com/Listing/Index/Raw_Materials/Actives/1828/",
  "company_name": "Example Company Inc.",
  "status": "active",
  "date_added": "2025-11-03",
  "date_updated": "2025-11-03",
  "tags": ["organic", "sustainable"],
  "metadata": {
    "last_validated": "2025-11-03",
    "validation_method": "manual",
    "data_source": "manual_entry"
  }
}

Required Fields

  • schema_version: Version of the schema being used (currently "1.0")
  • category_id: The numeric identifier for the category
  • listing_id: The unique identifier for the listing
  • category_path: The hierarchical path of the category (using underscores for spaces)
  • url: The full URL to the listing on personalcaresuppliers.com
  • status: Status of the listing (e.g., "active", "inactive", "pending", "archived")
  • date_added: Date when the listing was added to this database (YYYY-MM-DD format)

Optional Fields

  • date_updated: Date when last updated (YYYY-MM-DD)
  • company_name: Name of the company/supplier
  • address, country, phone, email, website: Contact information
  • specializations: Array of specializations or product categories
  • certifications: Array of certifications held
  • product_highlights: Key products or product lines
  • tags: Array of classification tags for filtering (e.g., "oat-specialist", "organic", "major-player")
  • metadata: Tracking information (last_validated, validation_method, data_source, quality_score)
  • notes: Additional notes about the listing

See docs/DATA_QUALITY.md for complete field documentation.

File Naming Convention

Listing files are named using the pattern: {category_id}_{listing_id}.json

For example: 1828_1102292.json

Current Listings

The database currently contains 15 listings across the following categories:

  1. Raw Materials β†’ Actives (8 listings) - Category ID: 1828
  2. Business Services β†’ Contract Manufacturing (4 listings) - Category ID: 1790
  3. Business Services β†’ Auditing (1 listing) - Category ID: 1790
  4. Equipment β†’ Tanks (1 listing) - Category ID: 1801
  5. Labels & Sleeves β†’ Stretch Sleeve (1 listing) - Category ID: 1800

See LISTINGS_INDEX.md for a complete index.

Tools & Automation

Validation Tools

Validate all listings against schema and quality standards:

python3 scripts/validation/validate_listings.py

See scripts/README.md for all available tools.

Running Tests

python3 tests/test_validation.py

See tests/README.md for test documentation.

Documentation

Contributing

Adding New Listings

See docs/CONTRIBUTING.md for detailed guidelines.

Quick steps:

  1. Create the appropriate category directory structure under listings/ if it doesn't exist
  2. Create a new JSON file following the naming convention: {category_id}_{listing_id}.json
  3. Use schema version 1.0 and include all required fields
  4. Add enhanced fields for better quality (especially for strategic suppliers)
  5. Validate your listing: python3 scripts/validation/validate_listings.py
  6. Submit a pull request with your additions

Example listing:

{
  "schema_version": "1.0",
  "category_id": 1828,
  "listing_id": "example_company",
  "category_path": "Raw_Materials/Actives",
  "url": "https://personalcaresuppliers.com/Listing/Index/Raw_Materials/Actives/1828/",
  "company_name": "Example Company",
  "status": "active",
  "date_added": "2025-11-03",
  "date_updated": "2025-11-03",
  "metadata": {
    "last_validated": "2025-11-03",
    "validation_method": "manual",
    "data_source": "manual_entry"
  }
}

License

See LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%