- Make a cool visualization for transit data that shows the network in interesting ways
- Show the Busnetwork of the City of Aachen
- Make a very crude prototype that has the minimum of data to function
- Demonstrate the creative use of data and quickly spin up a project
- Have a dev diary that shows the design and technical process and justifies the decisions made
- Use some familiar tech
- Shaders/Webassembly
- Use some unfamilar tech
- Boost library -It is not the goal to have a very robust program that can be extended for different cities -> for example a lot of data is just baked in
No AI was used to write this program
- Selection: Left click
- Pan: Hold Mousewheel/Mouse button 3 or Mouse 1
- Zoom: Mousewheel
- Remaing features: UI panel at the right of the screen
Have 2 Programs - one that parses .din files makes the data avialble as flat arrays to the second program that uses those to visualize the data.
- Input: .din files
- Output: .h files with a lot of data in them (in generated_data)
- Code is in transit_parsing
- Data is in transit_parsing/dino
- C++ program
- Libraries:
- fstream for file streaming
- boost/tokenizer for line parsing
- Program:
- Parse the data files
- We want connect the flat table data to more hierarchical data -> Have an array of stops in every route for example
- Save the indices into hash-maps, the key should
- Take a file and then iterate through every line
- lines are just semicolon-seperated-values, which we iterate through with the boost/tokenizer library
- save the values of the cells that interest us
PROGRAM 2 visualizes the stations and routes, I want to simply save an index into an array. F.e. a route has just an array of stop_indices, with them we can retrieve the stop structure out of an array. This programm parses the .din files and then generates these arrays and indexlists.
- I use hashmaps do to this. Following the example with the route that has an array of stop-indices, it is done the following way:
- KEY: internal stop number
- VALUE: the index into the stop_array, where that stop is located -> This means we have constant time access [O(1)] to retrieve the index, when we have the internal stop number
- When we travers the route.din file, we can simply look up (via the internal-stop-number), which index into the stop array corresponts to that stop.
- The other hierarchical data is connected in an analagous way
- Use fstream to write out .h files
- it should be arrays of the structs we need + the length of those arrays
- these structs have to correspont to structs we define in PROGRAM 2
- Input: .h files that PROGRAM 1 made (in generated_data)
- Output: interactive transit map
- C program with some shaders
- Code is flat in the project
- Libraries:
- sokol as a simple graphic library (https://github.com/floooh/sokol)
- microui for having some getting a simple ui (https://github.com/rxi/microui)
- Build as a single translation unit (thus the include.h file simply includes all the files we need)
- Build project is a simple build.bat file that can be executed from the command line
- GAME LIKE LOOP: input -> simulate/update -> render
- INDEX ARRAYS
- I use arrays of indices to connect data -> for example a route has an array of indices into the stop array
- indices vs pointers:
- performance: indices use less space (32 vs 64 bits) -> fewer chache misses, less memory footprint
- indices are human readable -> an index 3200003232 means there is some bad stuff going on, whereas that is not obvious from a pointer value
- indices are stable across runs, whereas because of ASLR (Address Space Layout Randomization) pointers will change differ -> Stop "Bushof" will always be at index 0, Pointer to stop would randomized by the OS
- Drawback: Pointers are typesave -> using the wrong index into an array is a common bug, solutions could be either getter/setter functions and wrapping the index with a type (one type per array)
- source code for the visualization program
- /transit_parsing source code for the parser writer
- /util source code that is not specific for this project
- /generated_shaders bytecode of the shaders, so they can be directly imported
- /generated_code output of the parser program
- /bin executable data will be put here
- /pictures
- /fonts
- Optimization: Having static data means the PROGRAM 2 does not have to do parsing at initialization
- PROGRAM 2 should work on the Web as a WebAssembly Image and that is easier with using a very simple C-Program
- I want to use C++ in the PROGRAM 1 and some of the Boost libraries and I do not know how well those play with WebAssembly
The goal is to get a very simple prototype working, thus the code is not hardened in almost any way:
- No Tests
- Very few Asserts
- Just use global arrays for all the data
- Does only work for the data sizes we operate on -> large changes would have to made, if I want to parse an arbitrary transit data set
At the beginning of Project I did not know anything about how transit data was published, and since I wanted this to be a short project, I chose the first thing that looked right. As a result I chose the DINO data that was published for the City of Aachen, which is a much more uncommon format then GTFS Data, so should I want to use the project for other Cities I would have to write another parser. This is one of the very first parsers I wrote and I wrote it with an unfamiliar boost library. Hence it is very brittle and breaks a bunch.
- Should have written it more defensively with tests and validators for the data structures
- there is a lot of data available and choosing what data to use is a much bigger task then expected -> should have been much more systematic about the criteria that decide what data to use
- Get a window up to draw stuff into
- Simple visualizer for routes (pic)
- Simple map (pic)
- text rendering for station names
- Where can I get transit data from?
- opendata.avv.de
- current_DINO/AVVDINO_2-1.zip seems to have all the information I need
- first line is a column description
- semicolumn seperated lines of data
- data is massively duplicated (different data providers, duplicated data, with just one datapoint changing each row, etc.) -> deduplication/reorginzation will be important
- C++ libraries to use:
- boost/filesystem for checking the files
- fstream for streaming in the data files line by line
- boost/tokenizer for parsing one line at a time
- ofstream to stream data out to seperate files
- start with this, because it is a small amount of data and thus relatively easy to debug, if something goes wrong
- stream out the deduplicated data to stop_data.h, so the Visualization Program PROGRAM 2 can use it easily
- only use Data from version 4 (this means the data provider is ASE and it is the data from Start: 09.09.2025, End: 11.10.2025)
- filter out all the data, where the stop name does not begin with Aachen
- data retrieved:
- stop names
- geo data (longitude, latitude)
- Get the line names
- Connect it to the internal line number
- for example line number 100003 has the line name "3A"
- export the line names as an array of strings to the PROGRAM 2
- Get all the stops that route visits
- get the line number so, we can connect that route to a line name such as 3A
- export an array of route structs to PROGRAM 2
- every route struct has
- corresponding line number
- variant string
- array of the stop indices in the order that a Bus using that route-variant would visit them
- There is an n to 1 relation between routes and lines (From now on those are called route-variant)
- A route-variant is defined by many rows in the route.din table -> Per stop that route visits, there is a row
- For every route-variant consequitive rows correspont to consequitive stops that route makes
- 200.000 rows in route.din -> there are duplication
- different data sets are in the same table with duplications -> only use VERSION 4 (first column)
- We have the transit network for Aachen now
- stops are the nodes
- route-variants are the edges
- Get the start times for all the trips on a route variant
- Save in the routes array, how many departures and when those happen
- departure times are saved in ascending order (seconds of the day)
- Traverse the route array and spawn a vehicle, if departure time is reached
- update vehicle position, by querying its route for the next stop, if it has reached a stop
- for now just linerarly traverse between stops (no waiting time at stops, no taking into account the distance between stops)
- Toggles for showing stops, vehicles, routes
- Selection and hover for lines & stations
- transform geo data from longitute/latitute on a sphere to meters on a plane (gets ride of the stretched map)
- ui/ux overwhole
- pause/reset simulation
- checking options
- Every Stop gets a Transit score:
- Every time a Bus/Vehicle visits that stop increase it
- Shader for Transit score
- Write a Fullscreen-Quad and compute for every pixel the transit score
- Transit score for a pixel is a sum of the transit score of all the stops, but weighted by how far away that stop is from that pixel
- Visualization: Instead of a gradient use a contour lines and color each step uniformly for a nice effect
- Shader for Next Station Tiling
- use a Voronoi algorithmn so for every pixel it is computed, which station is the next and the background is tiles accordingly
- right now different routes between the same stops render on top of each other
- Goal: render them next to each other
- we need some additional network information to be able to do this:
- parse out an array of neighboring stops into every stop
- no we can traverse the graph defined by stops very easily
- in the end this was a little heavy handed and functionality that was not really needed
- seperate vehicle & stop behaviour into different files
- in PROGRAM 1 seperate the file writing into a new file and seperated functions
- delete a bunch of stuff that was not really used
- The vehicles should have constant speed now and wait at stops
- a mode for the transit score, that changes with the simulation
- every time a vehicle visits a stop increase its score
- every update tick decrease the transit score of every stop
- the transit score shader just works with this (have to fiddle with the numbers a little)
- get the WebAssembly program working
- this requires different shader compilation to webgl
- a different compilation via emscriptem and just a bunch of little fiddling to get it working
- to build it just uncommend the last file of build.bat and run it
- better colors for the program
- seperate vehicle functionality out
- for a more cinematic effect, animate every vehicle as a shader
- a trail should follow the vehicle
- vehicle and trail should glow
- had a little bit of a bug because of shader alignment issues -> debugged via RenderDoc to get to the bottom of this
- right now the route-view is very useless
- use it as a time table for the route
- make a large table that shows all the arrival data
- until now I did not use realistic vehicle arrival data -> need to parse timing-pattern.din to get that data, which is connected with the departure times in trip.din via TIMING_GROUP_NR
- scene concept -> instead of having a bunch of options, make scenes that are collections of opions:
- Map View shows the network layout
- Cinematic View shows the vehicles moving through the map and "producing" connectivity
- Route View shows a isolated view of a route and the corresponding time table