EUBUCCO

Code repository for creating the EUBUCCO database - European Building stock Characteristics in a Common and Open database for 322+ million individual buildings.

Website: eubucco.com - Interactive map explorer and data download interface
Docs: docs.eubucco.com - Documentation of data schema, data access, and data usage
Zenodo: 10.5281/zenodo.6524780 - Archivied data dumps with DOI

About

EUBUCCO is a scientific database of individual building footprints for 322+ million buildings across the 27 European Union countries, Norway, Switzerland, and the UK. It is composed of 55 open datasets, including government registries (62.2%), OpenStreetMap (17.4%), and Microsoft building footprints (20.4%) that have been collected, harmonized, and validated.

EUBUCCO provides the basis for high-resolution urban sustainability studies across scales – continental, comparative or local studies – using a centralized source and is relevant for a variety of use cases, e.g. for energy system analysis or natural hazard risk assessments.

The database provides high-granularity information for building type, height, floors, and construction year. To maximize utility, EUBUCCO distinguishes between Ground Truth (from original source data), Merged (from other building footprint datasets), and ML Estimated (inferred with machine learning) attributes.

Attribute	Ground Truth	Merged	ML Estimated	Total Coverage
Type (res/non-res)	38.1%	7.4%	54.5%	100.0%
Subtype	17.3%	4.2%	78.5%	100.0%
Height	43.2%	0.1%	56.7%	100.0%
Floors	16.6%	3.4%	79.9%	100.0%
Construction Year	15.6%	0.3%	0.0%	15.9%

See EUBUCCO docs for details.

Scientific Data Descriptor

This work is associated with a Data Descriptor paper published in the journal Scientific Data. The manuscript provides extensive documentation on the database content and methodology.

Citation

If you use EUBUCCO in your research, please cite:

Milojevic-Dupont, N. and Wagner, F. et al. EUBUCCO v0.1: European building stock characteristics in a common and open database for 200+ million individual buildings. Sci Data 10, 147 (2023). https://doi.org/10.1038/s41597-023-02040-2

BibTeX:

@article{eubucco_2023,
	title        = {EUBUCCO v0.1: European building stock characteristics in a common and open database for 200+ million individual buildings},
	author       = {‎{Milojevic-Dupont, Nikola and Wagner, Felix} and Nachtigall, Florian and Hu, Jiawei and Br{\"u}ser, Geza Boi and Zumwald, Marius and Biljecki, Filip and Heeren, Niko and Kaack, Lynn H. and Pichler, Peter-Paul and Creutzig, Felix},
	year         = 2023,
	journal      = {Scientific Data},
	volume       = 10,
	number       = 1,
	pages        = 147,
	doi          = {10.1038/s41597-023-02040-2}
}

Processing Pipeline

The EUBUCCO data release is created through the following sequential processing steps:

1. Data Downloading (`0-downloading`)

Downloading raw building data from various sources:

Governmental datasets: Country and region-specific open data (50+ datasets)
OpenStreetMap: Building footprints via Geofabrik downloads
Microsoft: Global building footprints

2. Parsing (`1-parsing`)

Parsing heterogeneous input formats into a common structure:

Supports multiple formats: .gml, .xml, .shp, .dxf, .pbf
Extracts building footprints and attributes
Creates standardized geometry and attribute files
Performs duplicate removal and validation

3. Database Setup (`2-db-set-up`)

Organizing parsed data into a regionally partitioned dataset:

Creates consistent administrative boundaries (NUTS/LAU levels)
Organizes data by country/region/city hierarchy

4. Attribute Cleaning (`3-attrib-cleaning`)

Cleaning and harmonizing building attributes across different sources:

Height: Standardization and unit conversion
Type: Mapping to harmonizing building type categories
Construction year: Age calculation and validation
Removes duplicates and non-building structures

5. Conflation (`4-conflation`)

Conflates datasets from multiple sources using ML-based matching. This step is implemented in the eubucco-conflation repository:

Spatial alignment: Geometric correction via rubbersheeting
Matching: ML-based building footprint matching (XGBoost model)
Attribute merging: Merging of attributes across multiple sources

6. Feature Engineering (`5-feature-engineering`)

Engineering features for building attribute prediction. This step is implemented in the eubucco-features repository.

Building attributes are predicted using models from the ufo-prediction repository.

7. Release Generation (`5-release`)

Creating the final release files:

Final data packaging and schema enforcement
Regional and city-level statistics calculation
Prediction quality metrics calculation

Pipeline Orchestration

The pipeline is orchestrated on HPC Slurm clusters using the slurm-pipeline orchestrator. Pipeline configurations are defined in YAML files in /database/preprocessing/ and can be executed either through the orchestrator or via individual execution scripts.

Key Components:

Slurm Configurations: YAML files defining job parameters, resources, and dependencies
Execution Scripts: Shell scripts for submitting individual pipeline steps
Parameter Files: CSV/YAML files specifying input parameters for each dataset/region

Project Structure

eubucco/
├── database/                   # Pipeline specifications and execution scripts
│   └── preprocessing/
│       ├── 0-downloading/      # Data download scripts
│       ├── 1-parsing/          # Parsing and format harmonization
│       ├── 2-db-set-up/        # Regional partitioning
│       ├── 3-attrib-cleaning/  # Attribute cleaning and harmonization
│       ├── 5-release/          # Release file generation
│       └── 6-upload/           # Upload scripts
│   └── slurm-config.yml        # Main Slurm pipeline configuration
│
├── eubucco/                    # Core processing logic
│   ├── preproc/                
│   │   ├── parsing.py          # Parse heterogeneous input formats
│   │   ├── db_set_up.py        # Database structure creation
│   │   ├── attribs.py          # Attribute cleaning functions
│   │   ├── merge.py            # Dataset merging logic
│   │   ├── create_release.py   # Release file generation
│   │   └── create_overview.py  # Overview statistics
│   └── utils/                  
│       ├── load.py             # Data loading utilities
│       ├── validation_funcs.py # Validation functions
│       └── concate.py          # Data concatenation utilities
│
├── metadata/                   # Metadata and mappings
│   ├── building-type-categories-v1.csv
│   └── source_dataset_mapping-v1.json
│
└── ufo-map/                    # Geospatial utils submodule

Related Repositories

This repository serves as the main EUBUCCO pipeline, but several specialized components are maintained in separate repositories:

eubucco-conflation - ML-based matching and merging of building datasets from multiple sources (governmental, OSM, Microsoft)
eubucco-features - Feature engineering for building attribute prediction
ufo-prediction - Building attribute prediction models
slurm-pipeline - Orchestrator for scheduling and managing HPC Slurm cluster jobs
eubucco.com - Web platform for exploring and accessing EUBUCCO data

Contact

Email: [email protected]
Website: https://eubucco.com
Issues: https://github.com/ai4up/eubucco/issues

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
database		database
eubucco		eubucco
manuscript @ 9fd0e28		manuscript @ 9fd0e28
metadata		metadata
tutorials		tutorials
ufo-map @ da0f0e3		ufo-map @ da0f0e3
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EUBUCCO

About

Scientific Data Descriptor

Citation

Processing Pipeline

1. Data Downloading (`0-downloading`)

2. Parsing (`1-parsing`)

3. Database Setup (`2-db-set-up`)

4. Attribute Cleaning (`3-attrib-cleaning`)

5. Conflation (`4-conflation`)

6. Feature Engineering (`5-feature-engineering`)

7. Release Generation (`5-release`)

Pipeline Orchestration

Project Structure

Related Repositories

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

ai4up/eubucco

Folders and files

Latest commit

History

Repository files navigation

EUBUCCO

About

Scientific Data Descriptor

Citation

Processing Pipeline

1. Data Downloading (0-downloading)

2. Parsing (1-parsing)

3. Database Setup (2-db-set-up)

4. Attribute Cleaning (3-attrib-cleaning)

5. Conflation (4-conflation)

6. Feature Engineering (5-feature-engineering)

7. Release Generation (5-release)

Pipeline Orchestration

Project Structure

Related Repositories

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

1. Data Downloading (`0-downloading`)

2. Parsing (`1-parsing`)

3. Database Setup (`2-db-set-up`)

4. Attribute Cleaning (`3-attrib-cleaning`)

5. Conflation (`4-conflation`)

6. Feature Engineering (`5-feature-engineering`)

7. Release Generation (`5-release`)

Packages