Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Interactive dashboard analyzing NYC affordable housing violations to identify enforcement gaps and hold repeat offenders accountable. Data-driven tool for tenant advocacy and policy analysis.

License

Notifications You must be signed in to change notification settings

snedmagdous/nyc-housing-violations-dashboard

Repository files navigation

NYC Housing Violations Dashboard

Python License: MIT Code style: black

Holding landlords accountable through data transparency

πŸ“Œ Current Status (November 2025)

This project demonstrates end-to-end data pipeline development, statistical analysis methodology, and system architecture for housing violations analysis. Core ETL pipeline and exploratory analysis are complete. Interactive dashboard and API are currently in development for future enhancement.

What's Complete:

  • βœ… Data acquisition from NYC Open Data API (10,000+ violations)
  • βœ… Data cleaning and transformation pipeline using pandas
  • βœ… PostgreSQL database with PostGIS for geospatial queries
  • βœ… Exploratory data analysis with statistical insights
  • βœ… Advanced SQL queries (window functions, CTEs, geospatial operations)
  • βœ… FastAPI REST API with documented endpoints
  • βœ… React + TypeScript frontend foundation with search functionality

In Development:

  • 🚧 Interactive data visualization dashboard
  • 🚧 Real-time violation trend analysis
  • 🚧 Geospatial mapping interface

An interactive data analysis platform that exposes patterns of housing code violations across New York City, identifies repeat offenders, and reveals enforcement gaps in affordable housing protection. Built to empower tenant advocacy and inform policy decisions.


🎯 Project Overview

The Problem

In New York City, thousands of tenants live in buildings with serious housing code violationsβ€”lack of heat, broken plumbing, pest infestations, and more. While data on these violations is publicly available through NYC Open Data, it remains fragmented and difficult to interpret, making it challenging for tenants, advocates, and policymakers to identify patterns of landlord negligence and enforcement failures.

The Solution

This project transforms raw housing violation data into actionable insights by:

  • Identifying repeat offenders: Tracking landlords and buildings with persistent violation patterns
  • Revealing enforcement gaps: Analyzing complaint response times and inspection rates across neighborhoods
  • Geospatial analysis: Mapping violation hotspots to identify areas of concentrated housing injustice
  • Temporal trend detection: Uncovering seasonal patterns and long-term trends in housing conditions
  • Predictive risk modeling: Forecasting which buildings are most likely to accumulate future violations

Social Impact

This tool serves as a public accountability mechanism, enabling:

  • Tenants to research buildings before renting and document patterns of neglect
  • Tenant advocates to identify priority cases and systemic issues
  • Journalists to investigate landlord practices and enforcement failures
  • Policymakers to target interventions and allocate enforcement resources
  • Legal advocates to build cases against negligent property owners

πŸ“Š Key Features

Data Analysis

  • Multi-source integration: Combines HPD violations, complaints, building ownership, and demographic data
  • Geospatial clustering: Hotspot analysis using Getis-Ord Gi* statistics
  • Time series analysis: Seasonal decomposition and trend detection
  • Network analysis: Linking corporate landlords across multiple properties
  • Statistical testing: Identifies significant disparities in enforcement by neighborhood

Interactive Dashboard

  • Building search: Look up violation history by address
  • Interactive maps: Visualize violations across NYC with filtering options
  • Temporal visualizations: Track violation trends over time
  • Landlord rankings: Identify worst offenders by violation count and severity
  • Neighborhood comparisons: Analyze enforcement equity across communities

API

  • RESTful endpoints: Programmatic access to cleaned data and analysis results
  • Flexible filtering: Query by date range, violation type, borough, and more
  • Aggregated statistics: Pre-computed metrics for fast dashboard performance

πŸ› οΈ Technology Stack

Data Pipeline & Analysis

  • Python 3.9+: Core data processing
  • pandas & NumPy: Data manipulation and numerical analysis
  • GeoPandas & Shapely: Geospatial analysis and mapping
  • scikit-learn: Machine learning models for risk prediction
  • statsmodels: Statistical testing and time series analysis
  • sodapy: NYC Open Data API integration

Backend

  • FastAPI: High-performance REST API
  • PostgreSQL + PostGIS: Geospatial database
  • SQLAlchemy: Database ORM

Frontend (Coming Soon)

  • React + TypeScript: Interactive web application
  • Recharts/Plotly: Data visualization components
  • Leaflet/Mapbox: Interactive mapping

Development Tools

  • Jupyter: Exploratory analysis and documentation
  • pytest: Testing framework
  • black & isort: Code formatting
  • GitHub Actions: CI/CD (Coming Soon)

πŸ“ Project Structure

nyc-housing-violations-dashboard/
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .gitignore
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                    # Raw data from NYC Open Data (not tracked)
β”‚   β”œβ”€β”€ processed/              # Cleaned and transformed data
β”‚   └── README.md               # Data documentation and sources
β”œβ”€β”€ notebooks/
β”‚   └── exploratory_analysis.ipynb  # Initial data exploration
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data_pipeline/          # ETL pipeline
β”‚   β”‚   β”œβ”€β”€ fetch_data.py       # Download data from NYC Open Data
β”‚   β”‚   β”œβ”€β”€ clean_data.py       # Data cleaning and validation
β”‚   β”‚   └── load_data.py        # Load to PostgreSQL
β”‚   β”œβ”€β”€ analysis/               # Analytical modules
β”‚   β”‚   β”œβ”€β”€ temporal_analysis.py    # Time series analysis
β”‚   β”‚   β”œβ”€β”€ geospatial_analysis.py  # Spatial clustering & hotspots
β”‚   β”‚   └── repeat_offenders.py     # Landlord tracking
β”‚   └── api/                    # FastAPI application
β”‚       β”œβ”€β”€ main.py             # API entry point
β”‚       └── routes/             # API route definitions
β”œβ”€β”€ frontend/                   # React dashboard (coming soon)
β”œβ”€β”€ tests/                      # Unit and integration tests
β”œβ”€β”€ docs/                       # Additional documentation
β”‚   └── methodology.md          # Detailed analysis methodology
└── config/
    └── config.yaml             # Configuration settings

πŸš€ Getting Started

Prerequisites

  • Python 3.9 or higher
  • PostgreSQL 14+ with PostGIS extension (for geospatial features)
  • Git

Installation

  1. Clone the repository

    git clone https://github.com/snedmagdous/nyc-housing-violations-dashboard.git
    cd nyc-housing-violations-dashboard
  2. Create a virtual environment

    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # macOS/Linux
    source venv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Set up environment variables

    # Create a .env file in the project root
    cp config/.env.example .env
    
    # Edit .env with your configuration
    # - PostgreSQL connection string
    # - NYC Open Data API token (optional, for higher rate limits)
  5. Initialize the database

    python src/data_pipeline/setup_db.py

Quick Start

1. Fetch Data

python src/data_pipeline/fetch_data.py

Downloads the latest HPD violations data from NYC Open Data API.

2. Clean and Process

python src/data_pipeline/clean_data.py

Cleans raw data, geocodes addresses, and prepares for analysis.

3. Run Analysis

jupyter notebook notebooks/exploratory_analysis.ipynb

Explore the data and see initial findings.

4. Start the API

uvicorn src.api.main:app --reload

Access API documentation at http://localhost:8000/docs


πŸ“ˆ Analysis Methodology

Data Sources

  1. HPD Housing Maintenance Code Violations

    • Source: NYC Open Data
    • Records: 1.5M+ violations (2018-present)
    • Contains: Violation type, severity class, dates, addresses, status
  2. HPD Complaints

    • Tenant-reported issues
    • Used to identify enforcement gaps
  3. PLUTO (Primary Land Use Tax Lot Output)

    • Building characteristics and ownership
    • Enables demographic analysis
  4. NYC Borough Boundaries & Census Tracts

    • For geospatial analysis and demographic overlays

Key Analyses

1. Temporal Pattern Detection

  • Seasonal decomposition: Identifies recurring patterns (e.g., heating violations spike in winter)
  • Trend analysis: Long-term changes in violation rates
  • Enforcement lag calculation: Time from complaint to inspection to resolution

2. Geospatial Clustering

  • Hotspot analysis: Getis-Ord Gi* statistic identifies areas with significantly high violation concentrations
  • Spatial autocorrelation: Moran's I test for neighborhood effects
  • Demographic overlay: Correlates violation patterns with income, race, and other census data

3. Repeat Offender Identification

  • Ownership network analysis: Connects properties owned by the same entity across different LLCs
  • Violation rate normalization: Accounts for building size and age
  • Ranking algorithm: Weights by violation severity (Class A/B/C)

4. Enforcement Equity Analysis

  • Response time analysis: Compares complaint-to-inspection times across neighborhoods
  • Inspection rate disparities: Tests for statistical significance in enforcement patterns
  • Demographic correlation: Examines relationship between neighborhood demographics and enforcement activity

5. Predictive Risk Modeling (In Development)

  • Features: Building age, past violations, ownership type, neighborhood characteristics
  • Model: Random Forest classifier for binary prediction (high-risk vs. low-risk)
  • Output: Risk scores for proactive intervention targeting

πŸ“Š Key Findings from Dataset Analysis

Analysis of 10,000 NYC housing violations across 9,249 buildings reveals:

  1. Geographic Concentration of Violations

    • Brooklyn accounts for 59% of all violations (5,904 cases)
    • Bronx represents 38% (3,779 cases)
    • Manhattan and Queens together account for only 3% of violations
    • Indicates geographic disparities in housing code enforcement and compliance
  2. Violation Severity Distribution

    • Class I (immediately hazardous): 90.1% of violations (9,010 cases)
    • Class C (immediately hazardous): 4.5% (445 cases)
    • Class B (hazardous): 3.7% (367 cases)
    • Class A (non-hazardous): 1.8% (178 cases)
    • The predominance of Class I violations suggests critical safety concerns requiring immediate attention
  3. Open Violations Indicate Ongoing Risk

    • 10% of violations (1,000 cases) remain open
    • Open violations represent unresolved safety hazards affecting tenant welfare
    • Demonstrates need for targeted enforcement and follow-up inspections
  4. Data-Driven Policy Implications

    • High concentration in Brooklyn/Bronx suggests need for focused intervention resources
    • Majority severe violations (Class I/C) highlight urgent habitability concerns
    • Database enables identification of repeat offender buildings for proactive enforcement

🀝 Contributing

This is currently a portfolio project, but suggestions and feedback are welcome! If you're interested in:

  • Extending the analysis
  • Improving the visualization
  • Adding new data sources
  • Deploying for public use

Please open an issue or reach out directly.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘€ Author

Maya Murry

  • Cornell University, B.Sc. Computer Science (May 2025)
  • Lead Full-Stack Developer at an AI Healthcare Startup
  • Focus: Data science for social justice and public service

Contact: [email protected] Portfolio: mayamurry.com LinkedIn: linkedin.com/in/maya-murry GitHub: @snedmagdous


πŸ™ Acknowledgments

  • NYC Open Data: For making housing violations data publicly accessible
  • Tenant advocacy organizations: For inspiration and guidance on policy priorities
  • Open source community: For the excellent tools that made this analysis possible

πŸ“š Related Resources


πŸ” Project Status

Current Phase: Data Pipeline Development

  • Project setup and structure
  • Requirements and dependencies defined
  • Data fetching from NYC Open Data
  • Data cleaning and preprocessing
  • Exploratory data analysis
  • Geospatial analysis implementation
  • API development
  • Frontend dashboard
  • Deployment

This project uses data-driven analysis to advance housing justice in New York City. Technology should serve the collective, dismantle systems of oppression, and empower those fighting for their rights.

About

Interactive dashboard analyzing NYC affordable housing violations to identify enforcement gaps and hold repeat offenders accountable. Data-driven tool for tenant advocacy and policy analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •