Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
38 views1 page

Pandas DataFrame Notes - 12pages-Pages-4

The document provides various methods for selecting, modifying, and managing rows in a DataFrame using pandas. It covers techniques such as slicing by label/index, appending rows, dropping rows, boolean selection, and sorting. Additionally, it includes traps and considerations for handling row indices and duplicates.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views1 page

Pandas DataFrame Notes - 12pages-Pages-4

The document provides various methods for selecting, modifying, and managing rows in a DataFrame using pandas. It covers techniques such as slicing by label/index, appending rows, dropping rows, boolean selection, and sorting. Additionally, it includes traps and considerations for handling row indices and duplicates.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Select a slice of rows by label/index

Working with rows [inclusive-from : inclusive–to [ : step]]


df = df['a':'c'] # rows 'a' through 'c'
Get the row index and labels Trap: cannot work for integer labelled rows – see
idx = df.index # get row index previous code snippet on integer position slicing.
label = df.index[0] # first row label
label = df.index[-1] # last row label Append a row of column totals to a DataFrame
l = df.index.tolist() # get as a list # Option 1: use dictionary comprehension
a = df.index.values # get as an array sums = {col: df[col].sum() for col in df}
sums_df = DataFrame(sums,index=['Total'])
Change the (row) index df = df.append(sums_df)
df.index = idx # new ad hoc index
df = df.set_index('A') # col A new index # Option 2: All done with pandas
df = df.set_index(['A', 'B']) # MultiIndex df = df.append(DataFrame(df.sum(),
df = df.reset_index() # replace old w new columns=['Total']).T)
# note: old index stored as a col in df
df.index = range(len(df)) # set with list Iterating over DataFrame rows
df = df.reindex(index=range(len(df))) for (index, row) in df.iterrows(): # pass
df = df.set_index(keys=['r1','r2','etc']) Trap: row data type may be coerced.
df.rename(index={'old':'new'}, inplace=True)
Sorting DataFrame rows values
Adding rows df = df.sort(df.columns[0],
df = original_df.append(more_rows_in_df) ascending=False)
Hint: convert row to a DataFrame and then append. df.sort(['col1', 'col2'], inplace=True)
Both DataFrames should have same column labels.
Sort DataFrame by its row index
Dropping rows (by name) df.sort_index(inplace=True) # sort by row
df = df.drop('row_label') df = df.sort_index(ascending=False)
df = df.drop(['row1','row2']) # multi-row
Random selection of rows
Boolean row selection by values in a column import random as r
df = df[df['col2'] >= 0.0] k = 20 # pick a number
df = df[(df['col3']>=1.0) | (df['col1']<0.0)] selection = r.sample(range(len(df)), k)
df = df[df['col'].isin([1,2,5,7,11])] df_sample = df.iloc[selection, :] # get copy
df = df[~df['col'].isin([1,2,5,7,11])] Note: this randomly selected sample is not sorted
df = df[df['col'].str.contains('hello')]
Trap: bitwise "or", "and" “not; (ie. | & ~) co-opted to be Drop duplicates in the row index
Boolean operators on a Series of Boolean df['index'] = df.index # 1 create new col
Trap: need parentheses around comparisons. df = df.drop_duplicates(cols='index',
take_last=True)# 2 use new col
Selecting rows using isin over multiple columns del df['index'] # 3 del the col
# fake up some data df.sort_index(inplace=True)# 4 tidy up
data = {1:[1,2,3], 2:[1,4,9], 3:[1,8,27]}
df = DataFrame(data) Test if two DataFrames have same row index
len(a)==len(b) and all(a.index==b.index)
# multi-column isin
lf = {1:[1, 3], 3:[8, 27]} # look for Get the integer position of a row or col index label
f = df[df[list(lf)].isin(lf).all(axis=1)] i = df.index.get_loc('row_label')
Trap: index.get_loc() returns an integer for a unique
Selecting rows using an index match. If not a unique match, may return a slice/mask.
idx = df[df['col'] >= 2].index
print(df.ix[idx]) Get integer position of rows that meet condition
a = np.where(df['col'] >= 2) #numpy array
Select a slice of rows by integer position
[inclusive-from : exclusive-to [: step]] Test if the row index values are unique/monotonic
start is 0; end is len(df)
if df.index.is_unique: pass # ...
df = df[:] # copy entire DataFrame b = df.index.is_monotonic_increasing
df = df[0:2] # rows 0 and 1 b = df.index.is_monotonic_decreasing
df = df[2:3] # row 2 (the third row)
df = df[-1:] # the last row
Find row index duplicates
df = df[:-1] # all but the last row
if df.index.has_duplicates:
df = df[::2] # every 2nd row (0 2 ..)
print(df.index.duplicated())
Trap: a single integer without a colon is a column label
Note: also similar for column label duplicates.
for integer numbered columns.
Version 30 April 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
4

You might also like