Select a slice of rows by label/index
Working with rows [inclusive-from : inclusive–to [ : step]]
df = df['a':'c'] # rows 'a' through 'c'
Get the row index and labels Trap: cannot work for integer labelled rows – see
idx = df.index # get row index previous code snippet on integer position slicing.
label = df.index[0] # first row label
label = df.index[-1] # last row label Append a row of column totals to a DataFrame
l = df.index.tolist() # get as a list # Option 1: use dictionary comprehension
a = df.index.values # get as an array sums = {col: df[col].sum() for col in df}
sums_df = DataFrame(sums,index=['Total'])
Change the (row) index df = df.append(sums_df)
df.index = idx # new ad hoc index
df = df.set_index('A') # col A new index # Option 2: All done with pandas
df = df.set_index(['A', 'B']) # MultiIndex df = df.append(DataFrame(df.sum(),
df = df.reset_index() # replace old w new columns=['Total']).T)
# note: old index stored as a col in df
df.index = range(len(df)) # set with list Iterating over DataFrame rows
df = df.reindex(index=range(len(df))) for (index, row) in df.iterrows(): # pass
df = df.set_index(keys=['r1','r2','etc']) Trap: row data type may be coerced.
df.rename(index={'old':'new'}, inplace=True)
Sorting DataFrame rows values
Adding rows df = df.sort(df.columns[0],
df = original_df.append(more_rows_in_df) ascending=False)
Hint: convert row to a DataFrame and then append. df.sort(['col1', 'col2'], inplace=True)
Both DataFrames should have same column labels.
Sort DataFrame by its row index
Dropping rows (by name) df.sort_index(inplace=True) # sort by row
df = df.drop('row_label') df = df.sort_index(ascending=False)
df = df.drop(['row1','row2']) # multi-row
Random selection of rows
Boolean row selection by values in a column import random as r
df = df[df['col2'] >= 0.0] k = 20 # pick a number
df = df[(df['col3']>=1.0) | (df['col1']<0.0)] selection = r.sample(range(len(df)), k)
df = df[df['col'].isin([1,2,5,7,11])] df_sample = df.iloc[selection, :] # get copy
df = df[~df['col'].isin([1,2,5,7,11])] Note: this randomly selected sample is not sorted
df = df[df['col'].str.contains('hello')]
Trap: bitwise "or", "and" “not; (ie. | & ~) co-opted to be Drop duplicates in the row index
Boolean operators on a Series of Boolean df['index'] = df.index # 1 create new col
Trap: need parentheses around comparisons. df = df.drop_duplicates(cols='index',
take_last=True)# 2 use new col
Selecting rows using isin over multiple columns del df['index'] # 3 del the col
# fake up some data df.sort_index(inplace=True)# 4 tidy up
data = {1:[1,2,3], 2:[1,4,9], 3:[1,8,27]}
df = DataFrame(data) Test if two DataFrames have same row index
len(a)==len(b) and all(a.index==b.index)
# multi-column isin
lf = {1:[1, 3], 3:[8, 27]} # look for Get the integer position of a row or col index label
f = df[df[list(lf)].isin(lf).all(axis=1)] i = df.index.get_loc('row_label')
Trap: index.get_loc() returns an integer for a unique
Selecting rows using an index match. If not a unique match, may return a slice/mask.
idx = df[df['col'] >= 2].index
print(df.ix[idx]) Get integer position of rows that meet condition
a = np.where(df['col'] >= 2) #numpy array
Select a slice of rows by integer position
[inclusive-from : exclusive-to [: step]] Test if the row index values are unique/monotonic
start is 0; end is len(df)
if df.index.is_unique: pass # ...
df = df[:] # copy entire DataFrame b = df.index.is_monotonic_increasing
df = df[0:2] # rows 0 and 1 b = df.index.is_monotonic_decreasing
df = df[2:3] # row 2 (the third row)
df = df[-1:] # the last row
Find row index duplicates
df = df[:-1] # all but the last row
if df.index.has_duplicates:
df = df[::2] # every 2nd row (0 2 ..)
print(df.index.duplicated())
Trap: a single integer without a colon is a column label
Note: also similar for column label duplicates.
for integer numbered columns.
Version 30 April 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
4