Holding landlords accountable through data transparency
This project demonstrates end-to-end data pipeline development, statistical analysis methodology, and system architecture for housing violations analysis. Core ETL pipeline and exploratory analysis are complete. Interactive dashboard and API are currently in development for future enhancement.
What's Complete:
- β Data acquisition from NYC Open Data API (10,000+ violations)
- β Data cleaning and transformation pipeline using pandas
- β PostgreSQL database with PostGIS for geospatial queries
- β Exploratory data analysis with statistical insights
- β Advanced SQL queries (window functions, CTEs, geospatial operations)
- β FastAPI REST API with documented endpoints
- β React + TypeScript frontend foundation with search functionality
In Development:
- π§ Interactive data visualization dashboard
- π§ Real-time violation trend analysis
- π§ Geospatial mapping interface
An interactive data analysis platform that exposes patterns of housing code violations across New York City, identifies repeat offenders, and reveals enforcement gaps in affordable housing protection. Built to empower tenant advocacy and inform policy decisions.
In New York City, thousands of tenants live in buildings with serious housing code violationsβlack of heat, broken plumbing, pest infestations, and more. While data on these violations is publicly available through NYC Open Data, it remains fragmented and difficult to interpret, making it challenging for tenants, advocates, and policymakers to identify patterns of landlord negligence and enforcement failures.
This project transforms raw housing violation data into actionable insights by:
- Identifying repeat offenders: Tracking landlords and buildings with persistent violation patterns
- Revealing enforcement gaps: Analyzing complaint response times and inspection rates across neighborhoods
- Geospatial analysis: Mapping violation hotspots to identify areas of concentrated housing injustice
- Temporal trend detection: Uncovering seasonal patterns and long-term trends in housing conditions
- Predictive risk modeling: Forecasting which buildings are most likely to accumulate future violations
This tool serves as a public accountability mechanism, enabling:
- Tenants to research buildings before renting and document patterns of neglect
- Tenant advocates to identify priority cases and systemic issues
- Journalists to investigate landlord practices and enforcement failures
- Policymakers to target interventions and allocate enforcement resources
- Legal advocates to build cases against negligent property owners
- Multi-source integration: Combines HPD violations, complaints, building ownership, and demographic data
- Geospatial clustering: Hotspot analysis using Getis-Ord Gi* statistics
- Time series analysis: Seasonal decomposition and trend detection
- Network analysis: Linking corporate landlords across multiple properties
- Statistical testing: Identifies significant disparities in enforcement by neighborhood
- Building search: Look up violation history by address
- Interactive maps: Visualize violations across NYC with filtering options
- Temporal visualizations: Track violation trends over time
- Landlord rankings: Identify worst offenders by violation count and severity
- Neighborhood comparisons: Analyze enforcement equity across communities
- RESTful endpoints: Programmatic access to cleaned data and analysis results
- Flexible filtering: Query by date range, violation type, borough, and more
- Aggregated statistics: Pre-computed metrics for fast dashboard performance
- Python 3.9+: Core data processing
- pandas & NumPy: Data manipulation and numerical analysis
- GeoPandas & Shapely: Geospatial analysis and mapping
- scikit-learn: Machine learning models for risk prediction
- statsmodels: Statistical testing and time series analysis
- sodapy: NYC Open Data API integration
- FastAPI: High-performance REST API
- PostgreSQL + PostGIS: Geospatial database
- SQLAlchemy: Database ORM
- React + TypeScript: Interactive web application
- Recharts/Plotly: Data visualization components
- Leaflet/Mapbox: Interactive mapping
- Jupyter: Exploratory analysis and documentation
- pytest: Testing framework
- black & isort: Code formatting
- GitHub Actions: CI/CD (Coming Soon)
nyc-housing-violations-dashboard/
βββ README.md
βββ requirements.txt
βββ .gitignore
βββ data/
β βββ raw/ # Raw data from NYC Open Data (not tracked)
β βββ processed/ # Cleaned and transformed data
β βββ README.md # Data documentation and sources
βββ notebooks/
β βββ exploratory_analysis.ipynb # Initial data exploration
βββ src/
β βββ data_pipeline/ # ETL pipeline
β β βββ fetch_data.py # Download data from NYC Open Data
β β βββ clean_data.py # Data cleaning and validation
β β βββ load_data.py # Load to PostgreSQL
β βββ analysis/ # Analytical modules
β β βββ temporal_analysis.py # Time series analysis
β β βββ geospatial_analysis.py # Spatial clustering & hotspots
β β βββ repeat_offenders.py # Landlord tracking
β βββ api/ # FastAPI application
β βββ main.py # API entry point
β βββ routes/ # API route definitions
βββ frontend/ # React dashboard (coming soon)
βββ tests/ # Unit and integration tests
βββ docs/ # Additional documentation
β βββ methodology.md # Detailed analysis methodology
βββ config/
βββ config.yaml # Configuration settings
- Python 3.9 or higher
- PostgreSQL 14+ with PostGIS extension (for geospatial features)
- Git
-
Clone the repository
git clone https://github.com/snedmagdous/nyc-housing-violations-dashboard.git cd nyc-housing-violations-dashboard -
Create a virtual environment
python -m venv venv # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create a .env file in the project root cp config/.env.example .env # Edit .env with your configuration # - PostgreSQL connection string # - NYC Open Data API token (optional, for higher rate limits)
-
Initialize the database
python src/data_pipeline/setup_db.py
python src/data_pipeline/fetch_data.pyDownloads the latest HPD violations data from NYC Open Data API.
python src/data_pipeline/clean_data.pyCleans raw data, geocodes addresses, and prepares for analysis.
jupyter notebook notebooks/exploratory_analysis.ipynbExplore the data and see initial findings.
uvicorn src.api.main:app --reloadAccess API documentation at http://localhost:8000/docs
-
HPD Housing Maintenance Code Violations
- Source: NYC Open Data
- Records: 1.5M+ violations (2018-present)
- Contains: Violation type, severity class, dates, addresses, status
-
HPD Complaints
- Tenant-reported issues
- Used to identify enforcement gaps
-
PLUTO (Primary Land Use Tax Lot Output)
- Building characteristics and ownership
- Enables demographic analysis
-
NYC Borough Boundaries & Census Tracts
- For geospatial analysis and demographic overlays
- Seasonal decomposition: Identifies recurring patterns (e.g., heating violations spike in winter)
- Trend analysis: Long-term changes in violation rates
- Enforcement lag calculation: Time from complaint to inspection to resolution
- Hotspot analysis: Getis-Ord Gi* statistic identifies areas with significantly high violation concentrations
- Spatial autocorrelation: Moran's I test for neighborhood effects
- Demographic overlay: Correlates violation patterns with income, race, and other census data
- Ownership network analysis: Connects properties owned by the same entity across different LLCs
- Violation rate normalization: Accounts for building size and age
- Ranking algorithm: Weights by violation severity (Class A/B/C)
- Response time analysis: Compares complaint-to-inspection times across neighborhoods
- Inspection rate disparities: Tests for statistical significance in enforcement patterns
- Demographic correlation: Examines relationship between neighborhood demographics and enforcement activity
- Features: Building age, past violations, ownership type, neighborhood characteristics
- Model: Random Forest classifier for binary prediction (high-risk vs. low-risk)
- Output: Risk scores for proactive intervention targeting
Analysis of 10,000 NYC housing violations across 9,249 buildings reveals:
-
Geographic Concentration of Violations
- Brooklyn accounts for 59% of all violations (5,904 cases)
- Bronx represents 38% (3,779 cases)
- Manhattan and Queens together account for only 3% of violations
- Indicates geographic disparities in housing code enforcement and compliance
-
Violation Severity Distribution
- Class I (immediately hazardous): 90.1% of violations (9,010 cases)
- Class C (immediately hazardous): 4.5% (445 cases)
- Class B (hazardous): 3.7% (367 cases)
- Class A (non-hazardous): 1.8% (178 cases)
- The predominance of Class I violations suggests critical safety concerns requiring immediate attention
-
Open Violations Indicate Ongoing Risk
- 10% of violations (1,000 cases) remain open
- Open violations represent unresolved safety hazards affecting tenant welfare
- Demonstrates need for targeted enforcement and follow-up inspections
-
Data-Driven Policy Implications
- High concentration in Brooklyn/Bronx suggests need for focused intervention resources
- Majority severe violations (Class I/C) highlight urgent habitability concerns
- Database enables identification of repeat offender buildings for proactive enforcement
This is currently a portfolio project, but suggestions and feedback are welcome! If you're interested in:
- Extending the analysis
- Improving the visualization
- Adding new data sources
- Deploying for public use
Please open an issue or reach out directly.
This project is licensed under the MIT License - see the LICENSE file for details.
Maya Murry
- Cornell University, B.Sc. Computer Science (May 2025)
- Lead Full-Stack Developer at an AI Healthcare Startup
- Focus: Data science for social justice and public service
Contact: [email protected] Portfolio: mayamurry.com LinkedIn: linkedin.com/in/maya-murry GitHub: @snedmagdous
- NYC Open Data: For making housing violations data publicly accessible
- Tenant advocacy organizations: For inspiration and guidance on policy priorities
- Open source community: For the excellent tools that made this analysis possible
- NYC Housing Preservation & Development
- NYC Open Data Portal
- Right to Counsel NYC
- Housing Justice for All
Current Phase: Data Pipeline Development
- Project setup and structure
- Requirements and dependencies defined
- Data fetching from NYC Open Data
- Data cleaning and preprocessing
- Exploratory data analysis
- Geospatial analysis implementation
- API development
- Frontend dashboard
- Deployment
This project uses data-driven analysis to advance housing justice in New York City. Technology should serve the collective, dismantle systems of oppression, and empower those fighting for their rights.