Handwritten-style Notes: NumPy for Data Science & ML
Introduction to NumPy
- NumPy stands for 'Numerical Python'.
- It is the foundational package for numerical computing in Python.
- Used for working with arrays, matrices, and high-level mathematical functions.
- NumPy arrays are faster and more memory efficient than Python lists.
NumPy Arrays
- Main object: ndarray (N-dimensional array).
- Create using np.array([1, 2, 3])
- Shape: tells you the dimension of array (rows, cols).
- dtype: tells you the data type of elements (int, float, etc).
Examples:
>>> import numpy as np
>>> a = np.array([1, 2, 3])
>>> a.shape -> (3,)
>>> a.dtype -> int64
Array Creation
- np.zeros((2,3)) -> array of zeros
- np.ones((3,3)) -> array of ones
- np.arange(0, 10, 2) -> [0, 2, 4, 6, 8]
- np.linspace(0, 1, 5) -> [0. , 0.25, 0.5 , 0.75, 1. ]
Array Operations
- Element-wise operations:
a + b, a - b, a * b, a / b
- Matrix multiplication: np.dot(a, b) or a @ b
- Useful functions: np.sum, np.mean, np.std, np.max, np.min
Handwritten-style Notes: NumPy for Data Science & ML
Indexing & Slicing
- a[0] -> first element
- a[1:4] -> elements from index 1 to 3
- a[:,0] -> first column
- a[1,:] -> second row
Broadcasting
- Allows operations on arrays of different shapes.
- Smaller array is 'broadcast' to match shape.
Example:
>>> a = np.array([[1,2,3],[4,5,6]])
>>> b = np.array([1,0,1])
>>> a + b -> [[2 2 4], [5 5 7]]
NumPy for Data Science / ML
- Data preprocessing: normalize, scale, clean data.
- Input to ML models: usually NumPy arrays.
- Works under the hood in Pandas, Scikit-learn, TensorFlow.
- Fast computations for big datasets.
Random Module
- np.random.rand(3,2) -> random floats in [0,1)
- np.random.randint(0, 10, (2,2)) -> integers 0-9
- np.random.seed(42) -> for reproducibility
Reshaping & Stacking
- reshape(): change array shape
- vstack(), hstack(): combine arrays vertically/horizontally
Handwritten-style Notes: NumPy for Data Science & ML
- flatten(): convert to 1D array
Useful Tips
- Always use NumPy arrays for ML pipelines.
- Check shape before feeding to model.
- Learn to vectorize -> avoid Python loops for speed.
- Practice slicing, reshaping, broadcasting!