Thanks to visit codestin.com
Credit goes to github.com

Skip to content

allanbchancery/altimages

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Matalan Website Accessibility & Widget Analysis Toolkit

A comprehensive Python-based toolkit for analyzing e-commerce website accessibility, widget functionality, and generating detailed reports with interactive dashboards.

🎯 Project Overview

This toolkit provides a complete solution for analyzing the Matalan e-commerce website (and adaptable for other sites) with focus on:

  • Accessibility Compliance: Image alt tags, ARIA attributes, skip links, form labels
  • Widget Detection: Product galleries, navigation, search, recommendations, filters
  • Performance Analysis: Page structure, loading times, technical health
  • Strategic Insights: Industry comparisons, recommendations, monitoring strategies

πŸ“Š Key Features

βœ… Multi-Phase Analysis Approach

  • Phase 1: Site discovery and URL categorization
  • Phase 2: Representative sampling and detailed analysis
  • Phase 3: Comprehensive reporting and strategic recommendations

🎨 Interactive Dashboards

  • Real-time HTML dashboards with responsive design
  • Executive summary views
  • Detailed technical reports
  • Visual accessibility scoring

πŸ” Comprehensive Analysis

  • Image Accessibility: Alt tag presence, length validation, decorative image handling
  • Widget Functionality: Detection across homepage, category, product, and static pages
  • Technical Health: Success rates, error handling, performance metrics
  • Strategic Recommendations: Immediate actions, short-term improvements, long-term strategy

πŸš€ Quick Start

Prerequisites

# Required Python packages
pip install selenium webdriver-manager pandas requests beautifulsoup4 urllib3

Basic Usage

  1. Run Single Page Analysis (Product Page):
python matalan_analyzer_final.py
  1. Run Complete Site Discovery:
python matalan_site_discovery.py
  1. Run Sampling Analysis:
python matalan_phase2_sampling.py
  1. Generate Comprehensive Report:
python matalan_phase3_comprehensive_report.py
  1. Quick Product Page Analysis:
python matalan_pdp_analyzer.py

πŸ“ Project Structure

altimages/
β”œβ”€β”€ README.md                                    # This file
β”œβ”€β”€ requirements.txt                             # Python dependencies
β”‚
β”œβ”€β”€ Core Analyzers/
β”‚   β”œβ”€β”€ matalan_analyzer_final.py               # Single page accessibility analyzer
β”‚   β”œβ”€β”€ matalan_site_discovery.py               # Phase 1: Site structure discovery
β”‚   β”œβ”€β”€ matalan_phase2_sampling.py              # Phase 2: Representative sampling
β”‚   β”œβ”€β”€ matalan_phase3_comprehensive_report.py  # Phase 3: Strategic reporting
β”‚   └── matalan_pdp_analyzer.py                 # Product page specific analyzer
β”‚
β”œβ”€β”€ Alternative Approaches/
β”‚   β”œβ”€β”€ matalan_site_discovery_alternative.py   # Alternative discovery method
β”‚   β”œβ”€β”€ matalan_quick_discovery.py              # Quick pattern testing
β”‚   └── matalan_analyzer_debug.py               # Debug version with logging
β”‚
β”œβ”€β”€ Generated Reports/
β”‚   β”œβ”€β”€ matalan_quick_discovery.json            # Phase 1 discovery results
β”‚   β”œβ”€β”€ matalan_phase2_report.json              # Phase 2 analysis results
β”‚   β”œβ”€β”€ matalan_phase3_comprehensive_report.json # Phase 3 strategic insights
β”‚   └── matalan_pdp_analysis.json               # Product page analysis
β”‚
β”œβ”€β”€ Interactive Dashboards/
β”‚   β”œβ”€β”€ matalan_analysis_dashboard.html         # Main analysis dashboard
β”‚   β”œβ”€β”€ matalan_enhanced_dashboard.html         # Enhanced visualization
β”‚   β”œβ”€β”€ matalan_executive_dashboard.html        # Executive summary view
β”‚   β”œβ”€β”€ matalan_pdp_interactive_dashboard.html  # Product page dashboard
β”‚   β”œβ”€β”€ matalan_phase1_discovery_dashboard.html # Discovery phase dashboard
β”‚   └── matalan_sitewide_analysis_dashboard.html # Site-wide analysis view
β”‚
└── Configuration/
    └── workspace.code-workspace                 # VSCode workspace settings

πŸ”§ Core Components

1. Single Page Analyzer (matalan_analyzer_final.py)

Purpose: Analyze individual pages for accessibility and widget functionality

Key Features:

  • Selenium-based web scraping with headless Chrome
  • Image alt tag analysis with status classification
  • Widget detection using multiple CSS selectors and XPath queries
  • Interactive HTML dashboard generation
  • Accessibility scoring and recommendations

Usage:

from matalan_analyzer_final import analyze_matalan_product_page

url = "https://www.matalan.co.uk/clothing/brown-leaf-print-tunic-midaxi-dress/15824638.html"
images, widgets = analyze_matalan_product_page(url)

2. Site Discovery (matalan_site_discovery.py)

Purpose: Phase 1 - Discover site structure and categorize URLs

Key Features:

  • Robots.txt compliance checking
  • Sitemap.xml parsing with recursive sitemap index support
  • URL categorization (product, category, brand, static, utility pages)
  • Representative sampling strategy
  • Respectful crawling with configurable delays

Output: matalan_discovery_report.json

3. Sampling Analysis (matalan_phase2_sampling.py)

Purpose: Phase 2 - Analyze representative pages across site

Key Features:

  • Multi-page accessibility analysis
  • Widget detection by page type
  • Page structure analysis
  • ARIA attribute assessment
  • Aggregated accessibility scoring

Output: matalan_phase2_report.json

4. Comprehensive Reporting (matalan_phase3_comprehensive_report.py)

Purpose: Phase 3 - Generate strategic insights and recommendations

Key Features:

  • Cross-phase data integration
  • Industry benchmarking
  • Strategic recommendations (immediate, short-term, long-term)
  • Performance insights
  • Competitive analysis

Output: matalan_phase3_comprehensive_report.json

πŸ“Š Analysis Capabilities

Accessibility Analysis

  • Image Alt Tags: Presence, length validation, decorative handling
  • ARIA Attributes: aria-label, aria-labelledby, aria-describedby usage
  • Skip Links: Navigation accessibility for keyboard users
  • Form Accessibility: Label-to-input ratios and proper associations
  • Heading Structure: H1-H6 hierarchy analysis

Widget Detection

  • Navigation: Primary and secondary navigation elements
  • Search: Search input fields and functionality
  • Product Gallery: Image carousels and zoom features
  • Filters: Category and product filtering widgets
  • Recommendations: "You may also like" sections
  • Shopping Cart: Add to cart/bag functionality
  • Pagination: Page navigation controls

Performance Metrics

  • Analysis Speed: Pages per second processing
  • Server Impact: Respectful crawling with delays
  • Success Rates: Error handling and retry logic
  • Scalability: Methodology for 100+ page analysis

🎨 Dashboard Features

Interactive HTML Dashboards

  • Responsive Design: Works on desktop, tablet, and mobile
  • Real-time Data: Live accessibility scoring and metrics
  • Visual Indicators: Color-coded status indicators
  • Detailed Tables: Sortable and filterable data views
  • Progress Bars: Visual accessibility score representation

Dashboard Types

  1. Executive Dashboard: High-level summary for stakeholders
  2. Technical Dashboard: Detailed analysis for developers
  3. Product Page Dashboard: Specific product page insights
  4. Site-wide Dashboard: Comprehensive site analysis

πŸ“ˆ Sample Results

Accessibility Scores

  • Overall Site Score: 100% (Perfect compliance)
  • Image Accessibility: 92 images analyzed, 100% compliant
  • Widget Coverage: 8 widget types detected across all page types
  • Technical Health: 100% success rate, 0% error rate

Industry Comparison

  • Retail Average: 73.2%
  • E-commerce Average: 68.5%
  • Matalan Score: 100%
  • Industry Ranking: Top 1% (Accessibility Leader)

πŸ› οΈ Configuration Options

Crawl Settings

# Adjust crawl delay (default: 2 seconds)
self.crawl_delay = 2

# User agent configuration
'User-Agent': 'Mozilla/5.0 (compatible; AccessibilityAnalyzer/1.0)'

# Timeout settings
timeout=15  # 15 second request timeout

Analysis Parameters

# Alt text length validation
alt_length_threshold = 125  # characters

# Widget detection selectors
price_selectors = [
    '[data-testid*="price"]',
    '.price',
    '[class*="price"]',
    'span[class*="Price"]'
]

πŸ” Advanced Usage

Custom URL Analysis

# Analyze custom URL
from matalan_analyzer_final import analyze_matalan_product_page

custom_url = "https://your-ecommerce-site.com/product/123"
images, widgets = analyze_matalan_product_page(custom_url)

Batch Analysis

# Analyze multiple URLs
urls = [
    "https://www.matalan.co.uk/",
    "https://www.matalan.co.uk/womens.list",
    "https://www.matalan.co.uk/mens.list"
]

for url in urls:
    result = analyze_page_accessibility(url, determine_page_type(url))
    print(f"Analyzed: {url} - Score: {result['accessibility_score']}")

Custom Widget Detection

# Add custom widget selectors
custom_selectors = {
    'wishlist': ['[data-testid="wishlist"]', '.wishlist-btn'],
    'reviews': ['.reviews-section', '[data-component="reviews"]'],
    'size_guide': ['.size-guide', '[data-modal="size-guide"]']
}

πŸ“‹ Requirements

System Requirements

  • Python 3.7+
  • Chrome/Chromium browser (for Selenium)
  • 4GB+ RAM (for large site analysis)
  • Internet connection

Python Dependencies

selenium>=4.0.0
webdriver-manager>=3.8.0
pandas>=1.3.0
requests>=2.25.0
beautifulsoup4>=4.9.0
urllib3>=1.26.0
lxml>=4.6.0

Installation

# Clone repository
git clone <repository-url>
cd altimages

# Install dependencies
pip install -r requirements.txt

# Run initial analysis
python matalan_analyzer_final.py

πŸš€ Getting Started Guide

1. Quick Single Page Analysis

# Analyze a single product page
python matalan_analyzer_final.py

# View results
open matalan_analysis_dashboard.html

2. Complete Site Analysis

# Phase 1: Discovery
python matalan_site_discovery.py

# Phase 2: Sampling
python matalan_phase2_sampling.py

# Phase 3: Comprehensive Report
python matalan_phase3_comprehensive_report.py

# View executive dashboard
open matalan_executive_dashboard.html

3. Custom Analysis

# Quick discovery for different patterns
python matalan_quick_discovery.py

# Product page specific analysis
python matalan_pdp_analyzer.py

πŸ“Š Output Files

JSON Reports

  • Discovery Report: Site structure and URL categorization
  • Phase 2 Report: Detailed accessibility and widget analysis
  • Phase 3 Report: Strategic insights and recommendations
  • PDP Analysis: Product page specific findings

HTML Dashboards

  • Interactive Visualizations: Real-time data exploration
  • Executive Summaries: High-level insights for stakeholders
  • Technical Details: In-depth analysis for developers
  • Responsive Design: Mobile-friendly viewing

🀝 Contributing

Development Setup

# Fork the repository
git clone <your-fork-url>
cd altimages

# Create development branch
git checkout -b feature/your-feature-name

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

Code Standards

  • PEP 8: Python code formatting
  • Type Hints: Use type annotations where applicable
  • Documentation: Comprehensive docstrings for all functions
  • Error Handling: Robust exception handling with logging

Pull Request Process

  1. Create feature branch from main
  2. Implement changes with tests
  3. Update documentation
  4. Submit pull request with detailed description

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

Common Issues

Issue: Chrome driver not found

# Solution: Update webdriver-manager
pip install --upgrade webdriver-manager

Issue: Timeout errors

# Solution: Increase timeout in configuration
timeout=30  # Increase from default 15 seconds

Issue: Memory usage high

# Solution: Use headless mode and limit concurrent analysis
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')

Getting Help

  • Issues: Create GitHub issue with detailed description
  • Questions: Use GitHub Discussions
  • Documentation: Check inline code documentation

πŸ”„ Version History

v1.0.0 (Current)

  • Complete accessibility analysis toolkit
  • Multi-phase analysis approach
  • Interactive HTML dashboards
  • Strategic recommendations engine
  • Industry benchmarking capabilities

Roadmap

  • v1.1.0: Multi-site support and comparison
  • v1.2.0: API integration for continuous monitoring
  • v1.3.0: Machine learning-based widget detection
  • v2.0.0: Real-time monitoring dashboard

πŸ“ž Contact

For questions, suggestions, or collaboration opportunities:

  • Project Repository: [GitHub Repository URL]
  • Issues: [GitHub Issues URL]
  • Discussions: [GitHub Discussions URL]

Built with ❀️ for better web accessibility

This toolkit helps make the web more accessible for everyone by providing comprehensive analysis and actionable insights for e-commerce websites.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published