Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Simple XML parser to shove OpenStreetMap changeset metadata dump files into a postgres database

Notifications You must be signed in to change notification settings

iandees/ChangesetMD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ChangesetMD

ChangesetMD is a simple XML parser written in python that takes the weekly changeset metadata dump file from http://planet.openstreetmap.org/ and shoves the data into a simple postgres database so it can be queried.

WARNING: This is pretty much my first python project ever beyond "hello world" so... you have been warned.

Setup

ChangesetMD works with python 2.7.

The only software requirement besides postgres itself is the python postgres library psycopg2. On Debian-based systems this means installing the python-psycopg2 package.

ChangesetMD expects a postgres database to be set up for it. It can likely co-exist within another database if desired. Otherwise, As the postgres user execute:

createdb changesets

It is easiest if your OS user has access to this database. I just created a user and made myself a superuser. Probably not best practices.

createuser <username>

Execution

The first time you run it, you will need to include the -c | --create option to create the two tables:

python changesetmd.py -d <database> -c

The create function can be combined with the file option to immediately parse a file.

To parse the file, use the -f | --file option. After the first run to create the tables, you can use -t | --truncate to clear out the tables and import a new file:

python changesetmd.py -d <database> -t -f /tmp/changeset-latest.osm

Optional database user/password/host arguments can be used to access a postgres database in other ways.

Notes

  • Does not currently support reading directly from .bz2 files. Unzip them first.
  • Prints a message every 10,000 records.
  • Takes about 4 hours to import the current dump on a decent home computer.
  • Would likely be faster to process the XML into two flat files and then use the postgres COPY command to do a bulk load
  • Needs more indexes to make querying practical. I'm waiting on a first full load to experiment with indexes

Table Structure

ChangesetMD populates two tables

osm_changeset:

  • id: changeset ID
  • created_at/closed_at: create/closed time
  • num_changes: number of objects changed
  • min_lat/max_lat/min_lon/max_lon: description of the changeset bbox in decimal degrees
  • user_name: OSM username
  • user_id: numeric OSM user ID

Note that all fields except for ID and created time can be null.

Changeset tags are in their own table since there may be an arbitrary number of them.

osm_changeset_tags:

  • changeset_id: changeset ID, foreign key to osm_changeset
  • key: tag key
  • value: tag value

Example query: count how many changesets have a created_by=* tag.

select count(*) 
from osm_changeset, osm_changeset_tags 
where changeset_id = id and key = 'created_by';

About

Simple XML parser to shove OpenStreetMap changeset metadata dump files into a postgres database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published