Thanks to visit codestin.com
Credit goes to github.com

Skip to content

klardotsh/ciqth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Reader Solution

Usage of ./ciqth:
  -analyze
    	analyze dataset after import (answer the example stakeholder questions)
  -skipdupes
    	skip duplicate rows rather than erroring on them

To build this, you'll need Go (tested on 1.19) and SQLite3 installed (and CGO enabled). I did not end up with the time to provide a Docker image or Nix derivation to help here, sorry. A basic unit test for the isolated string-to-struct row parser is provided, and the requested questions are answered at runtime by passing -analyze, the output of which can serve as another, crude, form of correctness check (by hand-counting the relevant rows from the CSV, for now).

The code itself is commented extensively, largely with stream-of-consciousness thoughts as I went along. I won't claim this to be the most idiomatic Go code in the world after several years away from the language (spending that time mostly in TypeScript, Rust, and Ruby+Sorbet), but in general, I tried to avoid unnecessary allocations, and also anything that would be too horribly unreadable or hard to understand.

Data Storage Engines and thoughts thereon

For this solution I went with a SQLite database because it's a great general-purpose and extremely battle-tested embedded database. However, this was done as a bit of a compromise based on the questions needing answered on the output side, moreso than because a relational database is the most optimal data store for the data as formatted (it probably isn't). The data lends itself best to a time series database, of which an excellent embedded example exists for Go: given a time and a value (the size of the payload), and given attached labels of username and direction, all sorts of questions about data within a date/time range become straightforward to answer (in other words, "something like Prometheus but without the label cardinality problems" would be great here). However, the questions at hand are not time-aligned in 2/3 of cases, and TSDBs are not generally optimized for searching by labels first.

In a real-world scenario, either a combination of a TSDB-based thing (maybe that's Prometheus, maybe that's something else) and a relational DB could be appropriate, or perhaps I'd instead take the opportunity to evaluate TimescaleDB which is quite literally the fusion of PostgreSQL and efficient time-based indexing.

About

CIQ Take-Home Exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages