Notes, tutorials, code snippets and templates
Each notebook has two versions (all python scripts are unaffected by this):
- One where all the markdown comments are rendered in black& white. These are placed in the folder named
GitHub_MD_renderingwhere MD stands for MarkDown. - One where all the markdown comments are rendered in coloured.
- Automatic data profiling powered by Pandas
- Cheatsheet
- Complex selections
- Concatenating, merging and joining
- Efficiently iterating over rows in a Pandas DataFrame
- Example of data manipulation with pandas
- Feature engineering in Pandas
- How to use Pandas for plotting
- Introduction to Pandas object types
- Optimising Pandas - reduce memory footprint
- Pandas - code snippets
- Pandas group-by
- Pandas pivot and unstack table
- Performance aspects
- Simple Tricks To Speed up the Sum Calculation in Pandas
- polars
- Using Pandas is good for prototyping, but it can be very slow when used in a training pipeline. Typical training pipelines use a lot of indexing and row-wise operations, and Pandas is not optimized for this. E.g. compare the performance of df.iloc[i] and array[i] to estimate the difference at scale of many millions of calls. When columns-wise operations are needed, we prefer to use polars — an optimized library with an API similar to Pandas, written in Rust.
- Although it is written in Rust, Polars has a Python package, which makes it a potential alternative to Pandas. Polars has two different APIs: an eager API and a lazy API. The eager execution is similar to Pandas. That means the code is run directly, and its results are returned immediately.