Tools to create a geolocation API similar to that offered by Google
- Takes list of place names from GeoNames
- Takes list of languages and prevalences by country from Wikipedia
- Parses and imports into MySQL DB
- Sets up API with Flask
- Responds to queries of text location strings with coordinates and country name
There are four tables iso, places, language and admin1
placeslists all place names and maps to coordinates and other info
name |
clean_name |
lat |
lon |
country |
population |
elevation |
admin_name |
feature |
|---|---|---|---|---|---|---|---|---|
| Suhūl az̧ Z̧afrah | suhūl az̧ z̧afrah | 22.75 | 53.1667 | AE | 0 | 119 | 00 | 00 |
admin1lists all first level administrative divisions e.g. in the US, these are states such as New York or Arizona
| code| name | ascii_name | pop | country | admin_code |
|------|------------|-------------|-------|---------|---------|------ |
| AD.06 | Sant Julià de Loria | Sant Julia de Loria | 3039162 | AD |
isolists all countries and their ISO codes
name |
iso2 |
iso3 |
|---|---|---|
| Afghanistan | AF | AFG |
languagelists countries and their languages along with ISO code
language |
country_name |
iso2 |
status |
lang_iso |
level |
|---|---|---|---|---|---|
| Brunei Malay | brunei | BN | regional | NULL | 2 |
level indicates importance of language in that country e.g. 'Significant minority' is level 2 while 'Official' is level 1
Each feature has an associated type; referring to populated places, geographical features etc. The (partial) count of most common features are
| PPLA3 | 90397 | Seat of a 3rd order division
| PPLX | 91773 | Section of a populated place
| HMSD | 99105 | Homestead
| ADM3 | 108767 | 3rd level admin division
| RSTN | 116788 | Railroad station
| LCTY | 131307 | Locality (a minor area or place of unspecified or mixed character and indefinite boundaries)
| PPLA4 | 131855 | Seat of a 4th order division
| HTL | 133210 | Hotel
| LK | 161605 | Lake
| HLL | 173397 | Hill
| STMI | 194574 | Intermittent stream
| ADM4 | 206125 | 4th level admin division
| FRM | 218814 | Farm
| ISL | 220766 | Island
| MT | 503068 | Mountain
| STM | 593570 | Stream
| PPL | 5812629 | Populated place
- Make
namethe primary key in theplacestable, this speeds up querys based onwherestatements - Eliminate all feature types except PPL and any features with zero population
Set up API with
python app.py
Which serves to http://127.0.0.1:5000/
Query DB for location with http://127.0.0.1:5000/loc=`location`
e.g. http://127.0.0.1:5000/loc=Mount%20Kpa
[{"name":"Mount Kpa","clean_name":"mount kpa","lat":6.58333,"lon":-9.35,"country":"LR","pop":0,"elevation":322,"admin_name":"11","feature":"MT"}]
Query DB for location with country hint with http://127.0.0.1:5000/loc=`location`&country=`country`
- Uses ISO-2 code for countries
Query DB for location with language hint with http://127.0.0.1:5000/loc=`location`&langs=`lang1,lang2...`
- Uses ISO-2 code for languages
Query a large messy string e.g. an entire document with http://127.0.0.1:5000/raw/loc=`rawString` and narrowed down to a single country with http://127.0.0.1:5000/raw/loc=`rawString`&country=`XX`
- Uses NLTK stopwords
Error codes follow W3 guidelines, need to be updated to Heroku spec
The following values sometimes appear in the admin level 1 column
00/0 = the entire country
Values that do not appear in admin1 table are not regular part of country
e.g. the Tunb islands of UAE: feature code is ISL and admin code is 11
Non-core Dependencies
Add in country names explicitly!- Add in clues e.g. likely country, region, timezone or language
- Add in fuzzy matching e.g. Al Raqqah/Al Raqah
- Automatically query Google API and update DB
- Add in admin level 2 as well as level 1
- Add in Google reverse geocoding for placing lat.long coords
- Need to be updated to Heroku spec
- Add sparse/verbose return option e.g. name and lat/lon