Experimenting with deep matrix convolution and find ways to optimize convolution on intel CPU. Using the intrinsics library (allow better vector loading and matrix multiplication): https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_popcnt&memperluas=4090
Refer to this guide about multi-core programming: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/multicore-optimizing-software.pdf