0% found this document useful (0 votes)

62 views10 pages

What Can You Do With Dataframes Using Pandas?: Pandas Is A High-Level Data Manipulation Tool Developed by Wes Mckinney

Pandas is a high-level open-source data manipulation tool built on Numpy, primarily used for data science and machine learning. Its key data structures, Series and DataFrame, facilitate the storage and manipulation of tabular data, allowing for various operations such as data cleansing, merging, and statistical analysis. The document also covers practical examples of using Pandas for data manipulation, including handling missing data, grouping, and time series analysis.

Uploaded by

babjeereddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views10 pages

What Can You Do With Dataframes Using Pandas?: Pandas Is A High-Level Data Manipulation Tool Developed by Wes Mckinney

Uploaded by

babjeereddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Pandas

Pandas is a high-level data manipulation tool developed by Wes McKinney.

It is built on the top of Numpy package

Pandas key data structure is called the Series and DataFrame.

DataFrames allow you to store and manipulate tabular data in rows of observations and
columns of variables.

Pandas is an open source Python package that is most widely used for data
science/data analysis and machine learning tasks

What Can You Do With DataFrames Using Pandas?

 Data cleansing
 Data fill
 Data normalization
 Merges and joins
 Data visualization
 Statistical analysis
 Data inspection
 Loading and saving data

Series : series are similar to numpy array except we can give named or
datetime index instead of numerical index

import numpy as np

import pandas as pd

lable =['a','b','c']

lst =[10,20,30]

arr =np.array([10,20,30])

dis ={'a':10,'b':20,'c':30}

pd.Series(lst)
pd.Series(lst,lable)

pd.Series(arr,lable)

pd.Series(dis)

pd.Series([sum,print,len])

ser1 =pd.Series([1,2,3,4],['USA','CHAINA','FRANCE','GERMANY'])

ser2 =pd.Series([1,2,3,4],['USA','CHAINA','INDIA','SINGAPOOR'])

ser1

ser2

ser1['USA']

ser1 + ser2

Data frames which is but directly top of series which is used in financial data

The numpy.random.randn() function creates an array of specified shape and fills it with random

values as per standard normal distribution.

import numpy as np

import pandas as pd

from numpy.random import randn

np.random.seed(101)

df =pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z'])

columns are series all sharing common index

df['W']

type(df['W'])

type(df)

df.w

df[['W','X']]

adding new column to data frame

df['new'] =df['Y']+df['Z']
df.drop('new')

df.drop('new',axis=1)

df.drop('new',axis=1,inplace=True)

df.drop('E',inplace=True)

selecting row in two ways

df.loc['A']

df.iloc[2]

df.loc[['A','B']]

subset of rows and columns

select row a, b and column w,y

df.loc[['A','B'],['W','Y']]

df.iloc[2:,:]

df.iloc[2:,2:]

df.iloc[2:,:2]

df.iloc[:2,:2]

f.iloc[1:3,1:3]

df.iloc[-2:,-2:]

df.iloc[0:2,0:2]

df >0

booldf =df >0

df[booldf]

df[df>0]

df['W']>0

df[df['W']>0]
resultdf =df[df['W']<0]

resultdf

resultdf[['X','Z']]

Instead of doing two steps

df[df['W']<0][['X','Z']]

df[(df['W']<0) & (df['Y'] >0)]

df[(df['W']<0) | (df['Y'] >0)]

df.reset_index()

lst=['TN','AP','KA','MH','TS']

df['STATE']=lst

Multi level data frame

outside =['G1','G1','G1','G2','G2','G2']

inside =[1,2,3,1,2,3]

hier_index=list(zip(outside,inside))

hier_index=pd.MultiIndex.from_tuples(hier_index)

df =pd.DataFrame(randn(6,2),hier_index,['A','B'])

df.loc['G1']

df.loc['G1']['A']

df.index.names

df.index.names=['Groups','Num']

df.loc['G2'].loc[2]['B']

Cross Section

df.xs('G1')

df.xs(1,level='Num')

df.xs(('G1',2))
Missing data

d ={'A':[1,2,np.nan],'B':[5,np.nan,np.nan],'C':[1,2,3]}

df = pd.DataFrame(d)

df.dropna()

df.dropna(axis=1)

df.dropna(thresh=2)

fill value

df.fillna(value=0)

df['A'].fillna(df['A'].mean())

df['A'].fillna(df['A'].mean(),inplace=True)

Grouping

d ={'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],

'Person':['RAM','SHAM','SUNIL','SUDEEP','RAHEEM','SHEETAL'],

'Sales':[250,400,200,150,350,100]}

df =pd.DataFrame(d)

bycomp.mean()

bycomp.max()

bycomp.std()

bycomp.min()

bycomp.sum()

bycomp.sum().loc['FB']

bycomp.describe()

bycomp.describe().transpose()

df.groupby('Company').describe().transpose()['FB']

Merging , joining,Concatination
df1 =pd.DataFrame({'A':['A0','A1','A2','A3'],

'B':['B0','B1','B2','B3'],

'C':['C0','C1','C2','C3']},

index =[0,1,2,3])

df2=pd.DataFrame({'A':['A4','A5','A6','A7'],

'B':['B4','B5','B6','B7'],

'C':['C4','C5','C6','C7']},

index =[4,5,6,7])

df3=pd.DataFrame({'A':['A8','A9','A10','A11'],

'B':['B8','B9','B10','B11'],

'C':['C8','C9','C10','C11']},

index =[8,9,10,11])

concatinate

pd.concat([df1,df2,df2])

pd.concat([df1,df2,df2],axis=1)

left =pd.DataFrame({'key':['K0','K1','K2','K3'],

'A':['A0','A1','A2','A3'],

'B':['B0','B1','B2','B3']})

right =pd.DataFrame({'key':['K0','K1','K2','K3'],

'C':['C0','C1','C2','C3'],

'D':['D0','D1','D2','D3']})

pd.merge(left,right,how='inner',on='key')

emp=pd.DataFrame({'EMPNO':['E001','E0002','E003','E004'],

'ENAME':['BABJEE','RAM','SUNIL','SHAM'],

'DEPTNO':[10,10,20,30]})
dept=pd.DataFrame({'Dname':['Accounts','Admin','It'],'DEPTNO':[10,20,50]})

pd.merge(emp,dept,how='inner',on='DEPTNO')

emp=pd.DataFrame({'EMPNO':['E001','E0002','E003','E004'],

'ENAME':['BABJEE','RAM','SUNIL','SHAM']},

index =[10,10,20,30])

dept=pd.DataFrame({'DNAME':['Accounts','Admin','It'],

'LOCATION':['CHENNAI','MUMBAI','PUNE'] },

index=[10,20,50])

emp.join(dept,how='inner')

emp.join(dept,how='outer')

df =pd.DataFrame({'Col1':[1,2,3,4],

'Col2':[444,555,666,444],

'Col3':['abc','def','ghi','xyz']})

df.head(2)

df.tail(2)

df['Col2'].unique()

len(df['Col2'].unique())

df['Col2'].nunique()

df['Col2'].value_counts()

df[df['Col1']>2]

df[(df['Col1']>2) & (df['Col2']==444)]

df['Col1'].sum()

Customs function
def times2(x):

retrun x*x
df['Col1'].apply(times2)

calling built-in functions

df['Col3'].apply(len)

df['Col2'].apply(lambda x: x *x)

df.drop('Col1',axis=1)

df.columns

df.index

df.sort_values(by='Col2',ascending=False)

df.isnull()

input and output

pwd

pd.read_csv('d:\\demo\example.csv')

pd.read_excel("d:\demo\example.xlsx")

df.to_csv("d://demo/myoutput.csv",index=False)

pd.read_excel("d:\\demo\\example.xlsx",sheet_name='Sheet1')

df.to_excel("d:\\demo\\example1.xlsx",sheet_name='Sheet2',index=False)

table_MN = pd.read_html('https://en.wikipedia.org/wiki/Minnesota',match='Election results from

statewide races')

import pandas as pd

from sqlalchemy import create_engine

cnx = create_engine('mysql+pymysql://root:admin123@localhost:3306/demo').connect()

sql = 'select * from customers'

df = pd.read_sql(sql, cnx)
The Pandas datareader is a sub package that allows one to create a dataframe from
various internet datasources, currently including:

 Yahoo! Finance
 Google Finance
 St.Louis FED (FRED)
 Kenneth French’s data library
 World Bank
 Google Analytics

pip install pandas-datareader

import pandas_datareader.data as web

import datetime as dt

start=dt.datetime(2015,1,1)

end=dt.datetime(2015,12,31)

facebook =web.DataReader('FB','yahoo',start,end

Pandas time series

majority of data in financial analysis is time series

datatime index

import pandas as pd

import numpy as np

from datetime import datetime

first_two =[datetime(2017,1,1),datetime(2017,1,2)]

dt_ind =pd.DatetimeIndex(first_two)

data =np.random.randn(2,2)

df =pd.DataFrame(data,dt_ind,['a','b'])

df.index.argmax()

df.index.argmin()

df.index.max()
Time resampling

df = pd.read_csv("d://demo//walmart_stock.csv")

df.head()

df.info()

df['Date']=pd.to_datetime(df['Date'])

df.info()

df.set_index('Date',inplace=True)

df.index()

Руководство по эксплуатации PT6260-800R
100% (1)
Руководство по эксплуатации PT6260-800R
67 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Unit 4
No ratings yet
Unit 4
25 pages
Data Integration and Missing Values Analysis
No ratings yet
Data Integration and Missing Values Analysis
23 pages
Exp 3
No ratings yet
Exp 3
10 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Revision Notes DataFrame XII IP
No ratings yet
Revision Notes DataFrame XII IP
8 pages
Python Solutions
No ratings yet
Python Solutions
11 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Pandas
No ratings yet
Pandas
26 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Python DataFrame Techniques
No ratings yet
Python DataFrame Techniques
10 pages
Assignment 7
No ratings yet
Assignment 7
1 page
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
Python Assignment-2
No ratings yet
Python Assignment-2
3 pages
Pandas Series & DataFrame Guide
No ratings yet
Pandas Series & DataFrame Guide
60 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
LIst of Practicals 2024 - 25 Class Xii
No ratings yet
LIst of Practicals 2024 - 25 Class Xii
10 pages
Class12 Python File Work Solutions Part3 Q13-Q18
No ratings yet
Class12 Python File Work Solutions Part3 Q13-Q18
7 pages
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
No ratings yet
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
10 pages
Set 1
No ratings yet
Set 1
16 pages
Geo Python Doc (1) 7,8 Bavesh
No ratings yet
Geo Python Doc (1) 7,8 Bavesh
9 pages
"Rohit" "Janvi" "Mukesh" 'Name' 'ACC' 'BST': Import As
No ratings yet
"Rohit" "Janvi" "Mukesh" 'Name' 'ACC' 'BST': Import As
23 pages
Ip Lab File Python
No ratings yet
Ip Lab File Python
9 pages
Ip Class 12 Practical File
No ratings yet
Ip Class 12 Practical File
61 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
6 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
Python - Pandas Merging, Joining, and Concatenating
No ratings yet
Python - Pandas Merging, Joining, and Concatenating
1 page
Python Pandas and DataFrame Basics
No ratings yet
Python Pandas and DataFrame Basics
20 pages
Unit 2
No ratings yet
Unit 2
81 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Class 12 IP Practice Assignment Series 3
No ratings yet
Class 12 IP Practice Assignment Series 3
3 pages
Pandas Python Library Guide
No ratings yet
Pandas Python Library Guide
54 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
EmployeeMgmt XII IP ProjectReprot 2022 23
No ratings yet
EmployeeMgmt XII IP ProjectReprot 2022 23
16 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
Data Mining - Week - 4
No ratings yet
Data Mining - Week - 4
8 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Lab 3 - Working With Data Frames
No ratings yet
Lab 3 - Working With Data Frames
10 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
KSTV
No ratings yet
KSTV
19 pages
Pandas
No ratings yet
Pandas
44 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
Python Pandas: 12 Data Manipulation Techniques
100% (2)
Python Pandas: 12 Data Manipulation Techniques
19 pages
Pandas Assignment Questions
No ratings yet
Pandas Assignment Questions
1 page
Database Overview
No ratings yet
Database Overview
101 pages
Quiz Day5 1july
No ratings yet
Quiz Day5 1july
2 pages
Idq Questions
No ratings yet
Idq Questions
2 pages
Fzzy Lookup Example
No ratings yet
Fzzy Lookup Example
1 page
Datastage Type Conversion Functions
No ratings yet
Datastage Type Conversion Functions
21 pages
Oracle SQL: Program Duration: 7 Days. Contents
No ratings yet
Oracle SQL: Program Duration: 7 Days. Contents
11 pages
Datastage and Qualitystage Parallel Stages and Activities
No ratings yet
Datastage and Qualitystage Parallel Stages and Activities
154 pages
1 Load Data From Database Table To CSV File
No ratings yet
1 Load Data From Database Table To CSV File
43 pages
2023-IDA Custom Bootcamp Curriculum Day Wise Curriculum v0.1
No ratings yet
2023-IDA Custom Bootcamp Curriculum Day Wise Curriculum v0.1
122 pages
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
No ratings yet
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
32 pages
Datastage
0% (1)
Datastage
9 pages
Python
No ratings yet
Python
4 pages
Informatica
No ratings yet
Informatica
9 pages
Tableau
No ratings yet
Tableau
16 pages
Software Testing and QA Quiz
No ratings yet
Software Testing and QA Quiz
31 pages
Unix Examples
No ratings yet
Unix Examples
8 pages
Informatica Power Center V 10.2
No ratings yet
Informatica Power Center V 10.2
4 pages
Unix MCQ
No ratings yet
Unix MCQ
12 pages
Introduction to Databases & SQL
No ratings yet
Introduction to Databases & SQL
17 pages
Apache Spark: Fast Cluster Computing
No ratings yet
Apache Spark: Fast Cluster Computing
6 pages
Etl Informatica Training
No ratings yet
Etl Informatica Training
8 pages
User Manual Charging Permission of Electrical Installation
No ratings yet
User Manual Charging Permission of Electrical Installation
16 pages
OD122438610864961000
No ratings yet
OD122438610864961000
1 page
Sri Sai Coaching Centre: 2/25, Raja Mill Road, Madurai-1 TRB - Paper II - 2013 - (Mar) Physics (Revision - II)
No ratings yet
Sri Sai Coaching Centre: 2/25, Raja Mill Road, Madurai-1 TRB - Paper II - 2013 - (Mar) Physics (Revision - II)
11 pages
RFT - Specifications (En)
No ratings yet
RFT - Specifications (En)
4 pages
Lecture Slides CPE 676 - Absorption & Adsorption
No ratings yet
Lecture Slides CPE 676 - Absorption & Adsorption
56 pages
42HP16 Toshiba Manual de Servicio
No ratings yet
42HP16 Toshiba Manual de Servicio
89 pages
Salcedo
No ratings yet
Salcedo
4 pages
LEAP Engine Design Challenges
0% (1)
LEAP Engine Design Challenges
7 pages
WS-1000 Egg Tray Machine Without Dryer (2025-04-15 00 - 12 - 23)
No ratings yet
WS-1000 Egg Tray Machine Without Dryer (2025-04-15 00 - 12 - 23)
11 pages
Si Unit For Ethercat: Ex260 Series
No ratings yet
Si Unit For Ethercat: Ex260 Series
29 pages
VLSI Circuit Design Process-Unit-II
No ratings yet
VLSI Circuit Design Process-Unit-II
51 pages
Unscheduled Proppeller Feathering: Standar Operational Procedures
No ratings yet
Unscheduled Proppeller Feathering: Standar Operational Procedures
2 pages
DC Shunts for High Current Measurement
No ratings yet
DC Shunts for High Current Measurement
3 pages
PCM 1867
No ratings yet
PCM 1867
141 pages
Peltor Mt7h79a - 61a Wiring Config
No ratings yet
Peltor Mt7h79a - 61a Wiring Config
2 pages
10HM05S22T
No ratings yet
10HM05S22T
3 pages
KD Electricals Pune
No ratings yet
KD Electricals Pune
19 pages
10 October 1998
No ratings yet
10 October 1998
100 pages
Tiffany & Co. Social Media Strategy
100% (1)
Tiffany & Co. Social Media Strategy
22 pages
Science 10 DLL
No ratings yet
Science 10 DLL
3 pages
MCS-Magnum Version 17 Manual
100% (2)
MCS-Magnum Version 17 Manual
244 pages
Tidal Power The Future Wave of Power Generation: K.Kesava Reddy
No ratings yet
Tidal Power The Future Wave of Power Generation: K.Kesava Reddy
14 pages
New Petrophysical Model For Unconventional High-Co2 - Content Oil Reservoirs
100% (1)
New Petrophysical Model For Unconventional High-Co2 - Content Oil Reservoirs
13 pages
Sounde and Beacon Combined - FLASHNI
No ratings yet
Sounde and Beacon Combined - FLASHNI
1 page
BTEs - Business Transaction Events in SAP
No ratings yet
BTEs - Business Transaction Events in SAP
33 pages
SOP ZPrinter 310 +
No ratings yet
SOP ZPrinter 310 +
80 pages
Forguson
100% (1)
Forguson
61 pages
S6 ECE ITC, 1st Internal
No ratings yet
S6 ECE ITC, 1st Internal
2 pages
Industrial Coupling Options Guide
No ratings yet
Industrial Coupling Options Guide
24 pages

What Can You Do With Dataframes Using Pandas?: Pandas Is A High-Level Data Manipulation Tool Developed by Wes Mckinney

Uploaded by

What Can You Do With Dataframes Using Pandas?: Pandas Is A High-Level Data Manipulation Tool Developed by Wes Mckinney

Uploaded by

Pandas

Pandas is a high-level data manipulation tool developed by Wes McKinney.

It is built on the top of Numpy package

Pandas key data structure is called the Series and DataFrame.

What Can You Do With DataFrames Using Pandas?

The numpy.random.randn() function creates an array of specified shape and fills it with random

from numpy.random import randn

columns are series all sharing common index

adding new column to data frame

selecting row in two ways

subset of rows and columns

select row a, b and column w,y

booldf =df >0

Instead of doing two steps

df[(df['W']<0) & (df['Y'] >0)]

df[(df['W']<0) | (df['Y'] >0)]

Multi level data frame

df[(df['Col1']>2) & (df['Col2']==444)]

calling built-in functions

input and output

table_MN = pd.read_html('https://en.wikipedia.org/wiki/Minnesota',match='Election results from

from sqlalchemy import create_engine

sql = 'select * from customers'

pip install pandas-datareader

import pandas_datareader.data as web

Pandas time series

majority of data in financial analysis is time series

from datetime import datetime

You might also like