Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views5 pages

FDS Exp4

Pandas is an open-source Python library essential for data manipulation and analysis, built on NumPy, and provides high-level data structures like Series and DataFrame. It offers features such as efficient data manipulation, handling missing data, and powerful group-by capabilities, making it a fundamental tool for data science. Users can easily install Pandas, import it, and perform various operations including reading data from different formats, filtering, modifying, and exporting data.

Uploaded by

harsh.pandey22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

FDS Exp4

Pandas is an open-source Python library essential for data manipulation and analysis, built on NumPy, and provides high-level data structures like Series and DataFrame. It offers features such as efficient data manipulation, handling missing data, and powerful group-by capabilities, making it a fundamental tool for data science. Users can easily install Pandas, import it, and perform various operations including reading data from different formats, filtering, modifying, and exporting data.

Uploaded by

harsh.pandey22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

What is Pandas Library?

Pandas is an open-source Python library widely used for data manipulation, analysis, and
preprocessing tasks. It is a fundamental library for data science and analytics and is built on top
of NumPy, providing high-level data structures and methods to work with structured data
efficiently.

Pandas primarily offers two data structures for handling data:-

Series: A one-dimensional labeled array capable of holding any data type.

DataFrame: A two-dimensional labeled data structure, similar to a table in relational databases


or an Excel spreadsheet, consisting of rows and columns.

Features of Pandas:-

• Fast and Efficient Data Manipulation: Pandas provides a variety of functions to


manipulate, clean, and analyze data efficiently.
• Handling Missing Data: Pandas can detect, fill, or remove missing values, making it easier
to preprocess datasets.
• Data Alignment and Merging: It supports database-like operations, such as merging,
joining, and reshaping data.
• Label-Based Slicing and Indexing: Pandas allows access to data using row/column labels
as well as positional indexing.
• Group By Functionality: It provides powerful group-by capabilities, allowing you to split
data, apply functions, and combine results.
• Data Cleaning: Pandas simplifies tasks like renaming columns, handling missing values,
or removing duplicates.
• Support for Time-Series Data: Pandas provides specialized tools for handling time-series
data, including date-based indexing, resampling, and rolling-window calculations.

1.Installing Pandas:-

Before using Pandas, ensure that it’s installed. You can install it using pip if it’s not already
installed

pip install pandas

Requirement already satisfied: pandas in


/usr/local/lib/python3.10/dist-packages (2.1.4)
Requirement already satisfied: numpy<2,>=1.22.4 in
/usr/local/lib/python3.10/dist-packages (from pandas) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in
/usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in
/usr/local/lib/python3.10/dist-packages (from pandas) (2024.2)
Requirement already satisfied: tzdata>=2022.1 in
/usr/local/lib/python3.10/dist-packages (from pandas) (2024.1)
Requirement already satisfied: six>=1.5 in
/usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2-
>pandas) (1.16.0)

2.Importing Pandas:-

To start working with Pandas, you first need to import it into your Python script

import pandas as pd

3.Pandas Data Structures:-

Series: A Series is essentially a one-dimensional array, similar to a column in a spreadsheet or a


list in Python, but with labels (called index).

import pandas as pd

# Creating a Series
data = [1, 3, 5, 7, 9]
series = pd.Series(data)
print(series)

• Indexing: You can access the elements of a Series using its index.
print(series[2]) # Outputs: 5

• Custom Index: You can also define custom labels for the Series index.
series = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])
print(series['c']) # Outputs: 5

DataFrame: A DataFrame is a two-dimensional data structure, similar to a table in a relational


database or an Excel spreadsheet. It consists of rows and columns.

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)
print(df)

4.Reading Data into Pandas:- Pandas makes it easy to load data from various file formats, such
as CSV, Excel, and SQL databases.

• Reading CSV Files


# Reading data from a CSV file
df = pd.read_csv('data.csv')
print(df.head()) # Prints the first 5 rows of the DataFrame

• Reading Excel Files


# Reading data from an Excel file
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

• Reading from SQL Databases


import sqlite3

# Connecting to a SQL database and reading data into a DataFrame


conn = sqlite3.connect('database.db')
df = pd.read_sql_query('SELECT * FROM tablename', conn)

5.Basic Operations with DataFrames:-

• Viewing Data:

.head(): Displays the first few rows of the DataFrame.

.tail(): Displays the last few rows of the DataFrame.

print(df.head()) # View the first 5 rows


print(df.tail()) # View the last 5 rows

• Inspecting Data:

.shape: Returns the dimensions of the DataFrame (rows, columns).

.columns: Returns the column names.

.info(): Provides a concise summary of the DataFrame, including the data types and
non-null counts.

.describe(): Provides descriptive statistics for numeric columns.

print(df.shape) # Get the shape (rows, columns)


print(df.columns) # Get the column names
df.info() # Get information about the DataFrame
print(df.describe()) # Get summary statistics

6.Selecting Data from a DataFrame:-

You can select specific columns or rows using loc and iloc

• Selecting Columns:
# Selecting a single column
age_column = df['Age']

# Selecting multiple columns


subset = df[['Name', 'City']]

• Selecting Rows:
– loc: Select rows and columns by label.
– iloc: Select rows and columns by position (index).
# Selecting rows by index using loc
row = df.loc[1] # Selects the second row by label (index 1)

# Selecting rows by index using iloc


row = df.iloc[1] # Selects the second row by position (index 1)

# Selecting a range of rows


subset = df.iloc[0:2] # Selects the first two rows

7.Filtering and Querying Data:-

You can filter the rows of a DataFrame by applying conditions on the data.

# Filtering rows where Age > 30


filtered_df = df[df['Age'] > 30]

# Filtering rows with multiple conditions


filtered_df = df[(df['Age'] > 25) & (df['City'] == 'New York')]

You can also use the .query() method for filtering:

# Using query method


filtered_df = df.query('Age > 25 & City == "New York"')

8.Modifying Data:-

• Adding New Columns: You can add new columns to the DataFrame by assigning values
to a new column name.
# Adding a new column
df['Salary'] = [50000, 60000, 70000]

• Modifying Existing Columns: You can modify columns by applying operations on them.
# Updating an existing column
df['Age'] = df['Age'] + 1 # Increase each age by 1

9.Handling Missing Data:- Pandas makes it easy to identify and handle missing data (NaN
values).

• Checking for Missing Data:


# Check for missing values in the DataFrame
print(df.isnull())
print(df.isnull().sum()) # Count missing values in each column

• Filling Missing Data: You can fill missing values using the .fillna() method.
# Fill missing values with a default value
df['Salary'].fillna(0, inplace=True)

• Dropping Missing Data: You can drop rows or columns with missing values
using .dropna().
# Drop rows with missing data
df.dropna(inplace=True)

10.Grouping and Aggregating Data:- You can group data based on specific columns and
perform aggregation operations like sum, mean, count, etc.

# Grouping by a column and calculating the mean of another column


grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)

11.Merging and Joining DataFrames:- Pandas supports merging multiple DataFrames using the
.merge() method (similar to SQL joins).

# Merging two DataFrames


merged_df = pd.merge(df1, df2, on='ID', how='inner') # Inner join on
the 'ID' column

12.Exporting Data:- Pandas allows you to export DataFrames to various file formats.

• Exporting to CSV:
# Save DataFrame to a CSV file
df.to_csv('output.csv', index=False)

• Exporting to Excel:
# Save DataFrame to an Excel file
df.to_excel('output.xlsx', index=False)

You might also like