NumPy and Pandas
NumPy is a fundamental package for scientific computing with Python. It
provides support for arrays, matrices, and a large collection of
mathematical functions to operate on these data structures efficiently.
Key Features of NumPy
1. N-dimensional Array Object:
○ The core of NumPy is the ndarray, a powerful n-dimensional
array object.
○ Supports various data types and operations.
2. Universal Functions (ufuncs):
○ Functions that operate element-wise on arrays.
○ Includes mathematical, logical, bitwise, and other functions.
3. Broadcasting:
○ Allows arithmetic operations on arrays of different shapes.
○ Simplifies code and improves performance.
4. Linear Algebra:
○ Provides tools for performing linear algebra operations, such as
matrix multiplication, eigenvalues, and singular value
decomposition.
5. Random Number Generation:
○ Generates random numbers for various distributions.
○ Useful for simulations and statistical computations.
6. Integration with Other Libraries:
○ Works seamlessly with other scientific computing libraries like
SciPy, Pandas, and Matplotlib.
Installing NumPy
You can install NumPy using pip:
sh
pip install numpy
Basic Operations with NumPy
1. Creating Arrays
import numpy as np
# Creating a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", array_1d)
# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", array_2d)
# Creating arrays with zeros, ones, and a range of numbers
zeros_array = np.zeros((3, 3))
ones_array = np.ones((2, 2))
range_array = np.arange(10)
print("Zeros Array:\n", zeros_array)
print("Ones Array:\n", ones_array)
print("Range Array:", range_array)
Output:
1D Array: [1 2 3 4 5]
2D Array:
[[1 2 3]
[4 5 6]]
Zeros Array:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. ]]
Ones Array:
[[1. 1.]
[1. 1.]]
Range Array: [0 1 2 3 4 5 6 7 8 9]
2. Array Operations
# Arithmetic operations
array = np.array([1, 2, 3, 4])
print("Original Array:", array)
# Addition
array_add = array + 10
print("Array + 10:", array_add)
# Multiplication
array_mult = array * 2
print("Array * 2:", array_mult)
# Element-wise operations
array_square = array ** 2
print("Array squared:", array_square)
Output:
Original Array: [1 2 3 4]
Array + 10: [11 12 13 14]
Array * 2: [2 4 6 8]
Array squared: [ 1 4 9 16]
3. Universal Functions (ufuncs)
# Using ufuncs for element-wise operations
array = np.array([1, 2, 3, 4])
# Sine function
array_sin = np.sin(array)
print("Sine of Array:", array_sin)
# Exponential function
array_exp = np.exp(array)
print("Exponential of Array:", array_exp)
# Square root function
array_sqrt = np.sqrt(array)
print("Square Root of Array:", array_sqrt)
Output:
Sine of Array: [ 0.84147098 0.90929743 0.14112001 -0.7568025 ]
Exponential of Array: [ 2.71828183 7.3890561 20.08553692
54.59815003]
Square Root of Array: [1. 1.41421356 1.73205081 2. ]
4. Linear Algebra Operations
# Matrix multiplication
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
matrix_product = np.dot(matrix_a, matrix_b)
print("Matrix Product:\n", matrix_product)
# Inverse of a matrix
matrix_inv = np.linalg.inv(matrix_a)
print("Inverse of Matrix A:\n", matrix_inv)
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix_a)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
Output:
Matrix Product:
[[19 22]
[43 50]]
Inverse of Matrix A:
[[-2. 1. ]
[ 1.5 -0.5]]
Eigenvalues: [-0.37228132 5.37228132]
Eigenvectors:
[[-0.82456484 -0.41597356]
[ 0.56576746 -0.90937671]]
5. Random Number Generation
# Generating random numbers
random_array = np.random.rand(5)
print("Random Array:", random_array)
# Generating random integers
random_integers = np.random.randint(1, 10, size=5)
print("Random Integers:", random_integers)
# Generating numbers from a normal distribution
normal_array = np.random.randn(5)
print("Normal Distribution Array:", normal_array)
Output: (Note: Output will vary each time due to random generation)
Random Array: [0.85953447 0.73381974 0.37786374 0.84847527
0.64217697]
Random Integers: [4 1 6 9 7]
Normal Distribution Array: [ 0.35743143 -1.32095611 -0.61792992
0.77700679
Pandas
Pandas is a powerful and widely-used Python library for data manipulation
and analysis. It provides data structures like DataFrames and Series, which
are designed to make data cleaning, manipulation, and analysis fast and
easy. Let's explore some of the core functionalities of Pandas.
Key Features of Pandas
1. Data Structures:
○ Series: One-dimensional labeled array capable of holding any
data type.
○ DataFrame: Two-dimensional labeled data structure with
columns of potentially different types, similar to a table in a
database or an Excel spreadsheet.
2. Data Cleaning and Preparation:
○ Handling missing data, filtering, and cleaning data.
○ Data transformation and normalization.
3. Data Analysis and Exploration:
○ Aggregation, grouping, merging, and joining data.
○ Descriptive statistics and data summarization.
4. Time Series Analysis:
○ Tools for working with time-indexed data, resampling, and time-
based aggregations.
5. Data Input and Output:
○ Reading from and writing to various file formats such as CSV,
Excel, SQL databases, and more.
Installing Pandas
You can install Pandas using pip:
sh
pip install pandas
Basic Operations with Pandas
1. Creating Series and DataFrames
import pandas as pd
# Creating a Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print("Series:\n", series)
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Country': ['USA', 'UK', 'Canada', 'Australia']}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)
Output:
Series:
0 1
1 2
2 3
3 4
4 5
dtype: int64
DataFrame:
Name Age Country
0 Alice 25 USA
1 Bob 30 UK
2 Charlie 35 Canada
3 David 40 Australia
2. Reading and Writing Data
# Reading from a CSV file
# Assuming 'data.csv' exists with appropriate data
df = pd.read_csv('data.csv')
print("DataFrame from CSV:\n", df)
# Writing to a CSV file
df.to_csv('output.csv', index=False)
Output:
DataFrame from CSV:
(output will depend on the contents of 'data.csv')
3. Data Selection and Filtering
# Selecting a single column
ages = df['Age']
print("Ages:\n", ages)
# Selecting multiple columns
subset = df[['Name', 'Country']]
print("Subset of DataFrame:\n", subset)
# Filtering rows based on a condition
filtered = df[df['Age'] > 30]
print("Filtered DataFrame:\n", filtered)
Output:
Ages:
0 25
1 30
2 35
3 40
Name: Age, dtype: int64
Subset of DataFrame:
Name Country
0 Alice USA
1 Bob UK
2 Charlie Canada
3 David Australia
Filtered DataFrame:
Name Age Country
2 Charlie 35 Canada
3 David 40 Australia
4. Data Cleaning
# Handling missing values
df = pd.DataFrame({'A': [1, 2, None], 'B': [None, 4, 5]})
print("Original DataFrame:\n", df)
# Filling missing values
df_filled = df.fillna(0)
print("DataFrame with filled values:\n", df_filled)
# Dropping missing values
df_dropped = df.dropna()
print("DataFrame with dropped rows:\n", df_dropped)
Output:
Original DataFrame:
A B
0 1.0 NaN
1 2.0 4.0
2 NaN 5.0
DataFrame with filled values:
A B
0 1.0 0.0
1 2.0 4.0
2 0.0 5.0
DataFrame with dropped rows:
A B
1 2.0 4.0
5. Data Aggregation and Grouping
# Grouping data by a column and calculating aggregate statistics
grouped = df.groupby('Country').agg({'Age': 'mean'})
print("Grouped DataFrame:\n", grouped)
Output:
Grouped DataFrame:
Age
Country
Australia 40.0
Canada 35.0
UK 30.0
USA 25.0