Log Reader Solution

Usage of ./ciqth:
  -analyze
    	analyze dataset after import (answer the example stakeholder questions)
  -skipdupes
    	skip duplicate rows rather than erroring on them

To build this, you'll need Go (tested on 1.19) and SQLite3 installed (and CGO enabled). I did not end up with the time to provide a Docker image or Nix derivation to help here, sorry. A basic unit test for the isolated string-to-struct row parser is provided, and the requested questions are answered at runtime by passing -analyze, the output of which can serve as another, crude, form of correctness check (by hand-counting the relevant rows from the CSV, for now).

The code itself is commented extensively, largely with stream-of-consciousness thoughts as I went along. I won't claim this to be the most idiomatic Go code in the world after several years away from the language (spending that time mostly in TypeScript, Rust, and Ruby+Sorbet), but in general, I tried to avoid unnecessary allocations, and also anything that would be too horribly unreadable or hard to understand.

Data Storage Engines and thoughts thereon

For this solution I went with a SQLite database because it's a great general-purpose and extremely battle-tested embedded database. However, this was done as a bit of a compromise based on the questions needing answered on the output side, moreso than because a relational database is the most optimal data store for the data as formatted (it probably isn't). The data lends itself best to a time series database, of which an excellent embedded example exists for Go: given a time and a value (the size of the payload), and given attached labels of username and direction, all sorts of questions about data within a date/time range become straightforward to answer (in other words, "something like Prometheus but without the label cardinality problems" would be great here). However, the questions at hand are not time-aligned in 2/3 of cases, and TSDBs are not generally optimized for searching by labels first.

In a real-world scenario, either a combination of a TSDB-based thing (maybe that's Prometheus, maybe that's something else) and a relational DB could be appropriate, or perhaps I'd instead take the opportunity to evaluate TimescaleDB which is quite literally the fusion of PostgreSQL and efficient time-based indexing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
analytics.go		analytics.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
row_parse.go		row_parse.go
row_parse_test.go		row_parse_test.go
server_log.csv		server_log.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Log Reader Solution

Data Storage Engines and thoughts thereon

About

Uh oh!

Releases

Packages

Uh oh!

Languages

klardotsh/ciqth

Folders and files

Latest commit

History

Repository files navigation

Log Reader Solution

Data Storage Engines and thoughts thereon

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages