Subject IP
NOTES
PANDAS
1. Introduction to Pandas
Pandas is a powerful, open-source Python library used for data analysis and manipulation.
It provides high-performance, easy-to-use data structures and data analysis tools.
The name "Pandas" is derived from "Panel Data System."
To use Pandas in your Python program, you need to import it:
Python
2. Why Pandas?
Data Handling: Efficiently reads and writes data in various formats (CSV, Excel, etc.).
Data Analysis: Performs calculations, statistical analysis, and data aggregation.
Data Manipulation: Allows for easy selection, filtering, sorting, reshaping, and combining of data.
Missing Data Handling: Provides tools to deal with missing values (NaN).
Time-Series Functionality: Offers advanced features for working with time-series data.
3. Pandas Data Structures
Pandas primarily uses two fundamental data structures:
Series:
One-dimensional (1-D) labeled array.
Can hold any data type (homogeneous data).
Data values are mutable (can be changed), but the size is immutable (cannot be changed after creation).
Can be thought of as a column in a spreadsheet or a single list with an index.
Creation: From lists, arrays, dictionaries, or scalar values.
Operations: Indexing, slicing, mathematical operations, statistical functions (e.g., sum())
DataFrame:
Two-dimensional (2-D) labeled data structure with columns of potentially different types (heterogeneous
data).
Similar to a spreadsheet or a SQL table, with rows and columns.
The most commonly used Pandas object for tabular data.
Creation: From dictionaries of Series/lists, lists of dictionaries, NumPy arrays, or CSV/Excel files.
Operations:
Accessing Data: Using column names, row labels (index), loc (label-based), iloc (integer-location based).
Adding/Deleting Columns/Rows: Using assignment or methods like drop().
Data Manipulation: Filtering, sorting, grouping, merging.
Descriptive Statistics: head(), tail(), describe(), info().
4. Key Concepts
Index: Labels used to identify rows in Series and DataFrames.
Column Names: Labels used to identify columns in DataFrames.
Missing Data (NaN): Represents "Not a Number" for missing values. Pandas provides methods to handle these.
Vectorization: Pandas operations are often optimized for performance through vectorized computations.
Creating a Series.
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
✓ We can say that Series is a labeled one-dimensional array
which can hold any type of data.
✓ Data of Series is always mutable, means it can be changed.
✓ But the size of Data of Series is always immutable, means it
cannot be changed.
✓ Series may be considered as a Data Structure with two
arrays out which one array works as Index (Labels) and Creating a series from Scalar value
To create a series from scalar value, an index must be provided. The
scalar value will be repeated as per the length of index
Ser2 = pd.Series([12, 23, 34, 45, 67])
>>> print(Ser2)
0 12
1 23
2 34
3 45
4 67
From a range() object.
ser3 = pd.Series(range(4))
>>> print(ser3)
0 0
1 1
2 2
3 3
Customizing the index.
ser3.index = ['One', 'Two', 'Three', 'Four']
>>> print(ser3)
One 0
Two 1
Three 2
Four 3
You need to import the Pandas module first: import pandas as pd.
From a List/Array: s = pd.Series([10, 20, 30, 40])
From a Dictionary: s = pd.Series({'a': 10, 'b': 20, 'c': 30})
With a Specific Index: s = pd.Series([10, 20, 30], index=['x', 'y', 'z'])
From a Scalar Value: s = pd.Series(5, index=[1, 2, 3])
Examples of creation of series:-
1. Creation of empty series:-
import pandas as pd
S1=pd.Series()
print (S1)
2. Creation of series using list:-
import pandas as pd
S1=pd.Series([23,45,67,99])
print (S1)
output:-
0 23
1 45
2 67
3 99
We can also assign user defined labels to index
import pandas as pd
S1=pd.Series([34,44,23] , index=[“ram” , “sham” , “ria”])
Print (S1)
Output:-
ram 34
sham 44
ria 23
3. Creation of series using Dictionaries:-
import pandas as pd
D={ 2:”abc” , 5:”qwe” , 8:”tyu”}
S2=pd.Series(D)
Print(S2)
Output:-
2 abc
5 qwe
8 tyu
4. creation of series with scalar values:-
Import pandas as pd
S3=pd.Series(5 , index=[‘YELLOW’ , ‘RED’ , ‘GREEN’])
print(S3)
OUTPUT:-
5 YELLOW
5 RED
5 GREEN
Accessing Elements
By Index Label: s['a'] (returns the value associated with label 'a')
By Position: s[0] (returns the element at the first position)
Accessing Data in a Series
Attributes of a Series
s.values: Returns the data as a NumPy array.
s.index: Returns the index labels.
s.dtype: Returns the data type of the Series elements.
s.shape: Returns the number of elements as a tuple (e.g., (5,)).
s.nbytes: Returns the memory occupied by the Series in bytes.
s.empty: Returns True if the Series is empty, False otherwise
Useful Methods
s.head(n): Returns the first n rows (default is 5).
s.tail(n): Returns the last n rows (default is 5).
Mathematical Operations
Series support various mathematical operations, which are often vectorized:
Addition: s1 + s2
Subtraction: s1 - s2
Multiplication: s1 * s2
DataFrames in Python
1. Introduction to DataFrames:
A DataFrame is a two-dimensional, labeled, heterogeneous data structure in Pandas.
It is essentially a tabular data structure with rows and columns, similar to a spreadsheet or a database table.
Each column can hold data of a different data type, but all values within a single column must be of the same
data type.
DataFrames have two indices: a row index (axis 0) and a column index (axis 1). These indices can be numeric,
string, or labels.
2. Characteristics of DataFrames:
Value Mutable: The values within a DataFrame can be changed or updated.
Size Mutable: Rows and columns can be added or deleted from a DataFrame.
Heterogeneous: Different columns can store different data types.
3. Creating DataFrames:
From Dictionary of Series.
Python
import pandas as pd
data = {'Name': pd.Series(['Alice', 'Bob', 'Charlie']),
'Age': pd.Series([25, 30, 22]),
'City': pd.Series(['NY', 'LA', 'Chicago'])}
df = pd.DataFrame(data)
print(df)
From Dictionary of Lists/Arrays.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'City': ['NY', 'LA', 'Chicago']}
df = pd.DataFrame(data)
print(df)
From a List of Dictionaries.
import pandas as pd
data = [{'Name': 'Alice', 'Age': 25},
{'Name': 'Bob', 'Age': 30},
{'Name': 'Charlie', 'Age': 22}]
df = pd.DataFrame(data)
print(df)
4. Accessing Data in DataFrames:
Accessing Columns.
print(df['Name']) # Using column name
print(df.Age) # Using dot notation (if column name is a valid identifier)
Accessing Rows by Label (loc).
print(df.loc[0]) # Accessing row with index label 0
print(df.loc[[0, 2]]) # Accessing multiple rows
Accessing Rows by Position (iloc)
print(df.iloc[0]) # Accessing row at positional index 0
print(df.iloc[[0, 2]]) # Accessing multiple rows by position
Accessing Specific Cells.
print(df.loc[0, 'Name']) # Accessing cell at row label 0, column 'Name'
print(df.iloc[1, 1]) # Accessing cell at row position 1, column position 1
5. Important DataFrame Functions:
head(n): Returns the first n rows of the DataFrame (default n=5).
tail(n): Returns the last n rows of the DataFrame (default n
shape: Returns a tuple representing the dimensions (rows, columns) of the DataFrame.
columns: Returns an Index object containing column labels.
index: Returns an Index object containing row labels.
sort_values(by='column_name', ascending=True/False): Sorts the DataFrame by the values in a specified
column.
sort_index(ascending=True/False): Sorts the DataFrame by its index.
rename(columns={'old_name': 'new_name'}, index={'old_label': 'new_label'}): Renames columns or row labels.
concat([df1, df2]): Concatenates DataFrames along an axis.
merge(df1, df2, on='common_column'): Merges DataFrames based on a common column.
6. Modifying DataFrames:
Adding a new column.
df['Country'] = ['USA', 'USA', 'USA']
Modifying existing values.
df.loc[0, 'Age'] = 26
Adding a new row (using loc or append - though append is deprecated in newer Pandas versions, loc is
preferred):
df.loc[3] = ['David', 28, 'London', 'UK']
Deleting columns.
del df['City']
# or
df.drop('City', axis=1)
Deleting rows.
df.drop(0, axis=0) # Deleting row with index label
Mathematical Functions on DataFrames
DataFrames in Pandas allow for various mathematical operations to be performed on their data, either
element-wise or across entire rows/columns. These operations are fundamental for data analysis and
manipulation.
1. Basic Arithmetic Operations:
These operations are performed element-wise on DataFrames or between a DataFrame and a scalar value.
Addition (+): Adds corresponding elements of two DataFrames or adds a scalar to each element.
Subtraction (-): Subtracts corresponding elements or subtracts a scalar from each element.
Multiplication (*): Multiplies corresponding elements or multiplies each element by a scalar.
Division (/): Divides corresponding elements or divides each element by a scalar.
Floor Division (//): Performs integer division.
Modulo (%): Returns the remainder of the division.
axis=0 (default): Performs operation column-wise (aggregates values within each column).
axis=1: Performs operation row-wise (aggregates values within each row).
Example with axis:
import pandas as pd
df = pd.DataFrame({'A': [10, 20], 'B': [30, 40]})
# Sum of each row
row_sums = df.sum(axis=1)
print("Row Sums:\n", row_sums)
IMPORTANT QUESTIONS
Important questions for Class 12 IP (Informatics Practices) focusing on Data Handling using Pandas typically
cover the following key areas:
1. Pandas Series:
Creation: Creating Series from lists, NumPy arrays, and dictionaries.
Attributes: Understanding index, values, dtype, size, nbytes, itemsize.
Indexing and Slicing: Accessing elements using integer-based indexing (iloc), label-based indexing (loc), and
direct indexing with []. Slicing Series to extract subsets.
Operations: Performing mathematical operations (addition, subtraction, etc.), applying functions, and handling
missing values (NaN).
2. Pandas DataFrame:
Creation:
Creating DataFrames from dictionaries of Series, lists of dictionaries, and external files (CSV).
Attributes:
Understanding index, columns, shape, size, dtypes.
Indexing and Slicing:
Accessing rows and columns using loc, iloc, and direct column selection with []. Slicing DataFrames by rows and
columns.
Operations:
Adding and deleting rows and columns.
Modifying data in specific cells or entire rows/columns.
Performing calculations across rows/columns (e.g., sum()
Note: this sample paper is based on entire syllabus….you have to practice questions only on pandas , data
structures , series , dataframes , mysql