This is a port of Andrej Karpathy’s llm.c project (the CPU version). I
  toyed around with it the day he released the initial version for a
  couple of hours, but only continued with it on a train / plane trip I
  had last weekend (<2024-04-20 Sat>). In particular we started from
  commit a22c22b. In particular this means the tokenizer is not there
  at the moment. I might add it one of these days.
Performance is ~comparable to the C version.
Note: the port was done in a bit of a hurry, so who knows what bugs lurk compared to the original! :)
- We have a very shallow abstraction of the raw pointer buffer
    interface from C in the form of a MView[T]type (which is just aptr UncheckedArray[T]in Nim lang + a few goodies, notably a{}accessor to do pointer arithmetic for a more ‘natural’ access to another buffer.
- We use Nim’s CT features to automatically assign the correct buffer
    views for the *Tensorfields based onfieldPairs, their order in theobjectand theparams/act_sizesinput.
- Generally less pointer handling.
- We use destructors for the GPT2andDataLoaderobjects so that we don’t have to free manually (and copying these is disallowed)
- Instead of relying on OpenMP’s collapseprimitive to fuse multiple nested loops, we use a custom Nim CT based loop fusion macro, see ./fuse_loops.nim. The issue is that because Nim convertsforloops intowhilestatements, it doesn’t play nice with nested loops for OpenMP. :) So I wrote a macro that wraps around for loops and performs the loop fusion manually (note that it only works forfor i in 0 ..< Xstyle loops (i.e. lower index 0 and using..<).
We have to compile with --exceptions:quirky, because otherwise the
  inserted Nim error checks break the OpenMP compilation. We could
  disable checks locally in the code, but for this here it’s fine.
See also: nim-lang/Nim#23311
The important compilation arguments are defined in a local nim.cfg
  and at the top of the train_gpt2.nim file (fast-math and OpenMP
  related).
Otherwise just compile with:
nim c -d:danger -d:openmp -d:lto --passC:"-march=native" train_gpt2.nimOtherwise follow the CPU instructions from the original repo to get started: https://github.com/karpathy/llm.c?tab=readme-ov-file#quick-start-cpu
Similar to my Nim port of his llama2.c, I had time to kill on a trip! And doing such ‘dumb’ ports is kind of meditative… lol