0% found this document useful (0 votes)

6 views13 pages

My Own Cheatsheet

The document outlines various data manipulation techniques using Python, including importing data from JSON, creating DataFrames, and performing operations such as grouping, merging, and pivoting. It also covers data cleaning steps, such as removing unwanted categories and renaming columns, as well as utilizing APIs and random number generation for data processing. Additionally, it mentions unit testing with a high coverage rate using PyTest.

Uploaded by

mariana.eurostar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

My Own Cheatsheet

Uploaded by

mariana.eurostar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 13

Pulling Files in and out

SP_hist.to_csv("/Users/maria/Desktop/csv_files/SP_hist.csv",index=True,
sep=',')

#import the data from the systembolaget API documentation (just down-
loaded the JSON)
with open(("/Users/maria/OneDrive/Documents_old/CodeOpDocs/Milestones/
Group_Project/Drinks_data/assortment.json"), 'r', encoding ='utf-8') as
our_file:
our_file_as_dictionary = json.load(our_file)
# Print the loaded data
print(our_file_as_dictionary)

#Make the dictionary into a DF

api_data=pd.DataFrame(our_file_as_dictionary)

LISTS

#make list of stocks in my universe and in SP500

combined= []

for stock in universe:

if stock in SNP['Symbol'].values:

combined.append(stock)

print(combined)

# I am putting all owners on one line, separted by commas

reporting_owner.groupby('ACCESSION_NUMBER').agg({'RPTOWNERNAME':
','.join})

#remove strange categories like flavoured wine, fruity sparkling

removal_wines=['Sparkling wines, flavored',
'Blå, sparkling',
'Fruit wine',
'Rosé wine',
'Sparkling wines, fruit wine',
'Sweet red wines',
'Rosé wines, Fruity & Flavorful',
'Flavored wine',
'Other fortified wines',
'Sweet white wines', 'MOUSSERANDE VINER röda', 'MOUSSERANDE VINER
röda', 'MOUSSERANDE VINER, smaksatt',
'Blå, mousserande', 'Fruktvin', 'Rosé Wines', 'CHAMPAGNE, söt',
'MOUSSERANDE VINER, fruktvin',
'Red Wines, Sweet', 'Rosé Wines, Fruity & Flavorful',
'Flavored Wine', 'Other Fortified Wines','White Wines, Sweet']

for i in removal_wines:
full_wine = full_wine[full_wine['Headline']!=i]

TUPLES
#pull out data from tuples

vivino_data['match_name']=vivino_data['best_match'].str[0]
vivino_data['match_percentage']= vivino_data['best_match'].str[1]
vivino_data['match_reference']= vivino_data['best_match'].str[2]

DICTIONARIES
Using a dictionary to rename

#aggregate the dictionaries

translation_dict_1.update(another_translation_dict)
translation_dict_1.update(yet_another_dictionary)
#use dictionaries to translate both content and column headers
full_set = full_set.replace(translation_dict_1)
full_set = full_set.rename(columns=translation_dict_1)

DataFrame
#make a copy of the dataframe
piv_table_control=piv_table.copy()

making a column filled with NA

piv_table_control['transacted_avg']=np.nanwor

DF singular calculations
#This is an audit that the number of BAC shares really is that large

result=combo.groupby('ISSUERTRADINGSYMBOL')['TRANS_SHARES'].sum()

bac_count = result[result.index == 'BAC']

bac_count

#calculate how many data points are of each transaction type

combo['TRANS_CODE'].value_counts(True)

#remove stocks where there is no data available

piv_table=piv_table.dropna(how='any',subset=['STOCK_t0'])

# I need to adjust the ones I'm using in graphs to make all

Bershire/Buffett be the same

combo['RPTOWNERNAME']=combo['RPTOWNERNAME'].str.replace('BERKSHIRE
HATHAWAY INC,BUFFETT WARREN E','BUFFETT WARREN E,BERKSHIRE HATHAWAY
INC')

combo

DF Column calculations
#make a field to say whether data in SP500 in the submission data

combo['is_sp500']= combo['ISSUERTRADINGSYMBOL'].isin(combined)
# Calculating cost of transacted shares per line item (to be used later
for getting avg cost of transacted stock)

combo['value_transacted']= combo['TRANS_SHARES'] *
combo['TRANS_PRICEPERSHARE']

Renaming Columns

stock_prices=stock_prices.rename(columns={'variable':'ref_time','value'
:'price'})

#I will calculate the max and min

piv_table['max']= np.amax(piv_table[['STOCK_-
1','STOCK_t0','STOCK_t1','STOCK_t2','STOCK_t3','STOCK_t4','STOCK_t5']],
axis=1)

piv_table['min']= np.amin(piv_table[['STOCK_-
1','STOCK_t0','STOCK_t1','STOCK_t2','STOCK_t3','STOCK_t4','STOCK_t5']],
axis=1)

Groupby

full_wine.groupby('Product_Group_Details')['Units_sold'].sum()

Sort_values

#Order the wine Ascending in Price

full_wine = full_wine.sort_values(by='Actual_Price', ascending=True)

DF FULL calculations
Merge dataframes
# merge dataFrames

combo=submission.merge(non_deriv_trans,on='ACCESSION_NUMBER',
how='inner')

combo=combo.merge(owner_agg,on='ACCESSION_NUMBER', how='inner')

Pivot Table

piv_table_buys=pd.pivot_table(combo_buys,index=['TRANS_DATE','ISSUERTRA
DINGSYMBOL'],aggfunc={'TRANS_SHARES':np.sum,'value_transacted':np.sum})

join df

#join the data again ***HORIZONTAL

piv_table=pd.concat([piv_table_sells,piv_table_buys], axis=0)

iterarrows:

#calc t-1 (day before the transaction)

for idx, row in piv_table.iterrows():

trans_date=row['TRANS_DATE']

symbol= row['ISSUERTRADINGSYMBOL']

try:

value=data_neg_1.loc[trans_date,symbol]

piv_table.loc[idx,"STOCK_-1"]=value

except:

piv_table.loc[idx,"STOCK_-1"]=np.nan

MELT to collapse columns

#make this long form

stock_prices_returns= pd.melt(piv_combined_normalized_prices,
id_vars=['TRANS_DATE', 'ISSUERTRADINGSYMBOL', 'TRANS_SHARES',

'direction', 'is_control',], value_vars=['rTime_0', 'rTime_1',

'rTime_2', 'rTime_3', 'rTime_4', 'rTime_5'])

Index calculations
#We will now make a new index that we can later reference

data_control=data

ref=-1

for idx, row in data_control.iterrows():

ref=ref+1

data_control.loc[idx,"ref_num"]=ref

#we will pull out the dates as our index and replace them with our
ref_num

data_control['Date']=data_control.index

data_control= data_control.set_index('ref_num')
Using Apply:
def bucket(x):
if x > 500:
return 'Over 500'
elif x > 249:
return '250 to 499'
elif x > 199:
return '200 to 249'
elif x > 149:
return '150 to 199'
elif x > 99:
return '100 to 149'
elif x > 74:
return '75-99'
else:
return 'less than 75'

bucket_wine['price_bucket']= bucket_wine['Actual_Price'].apply(bucket)

#using the match reference (as recieved from fuzzy match) to pull out
the correct dictionary and put it into best match
vivino_data['best_match_details']=vivino_data.apply(lambda row:
row['wine_matches'][row['match_reference']], axis=1)

# then use search in the dictionary for the correct field and make that
into a column
vivino_data['vivino_name'] = vivino_data['best_match_details'].ap-
ply(lambda x: x['name'])

vivino_data['vivino_link'] = vivino_data['best_match_details'].ap-
ply(lambda x: x['link'])

vivino_data['vivino_country'] = vivino_data['best_match_details'].ap-
ply(lambda x: x['country'])

vivino_data['vivino_region'] = vivino_data['best_match_details'].ap-
ply(lambda x: x['region'])

vivino_data['vivino_average_rating'] = vivino_data['best_match_de-
tails'].apply(lambda x: x['average_rating'])
vivino_data['vivino_price'] = vivino_data['best_match_details'].ap-
ply(lambda x: x['price'])

DateTime
# Putting date to datetime format

combo['TRANS_DATE']=pd.to_datetime(combo['TRANS_DATE'],format="%d-%b-
%Y")

#remove time-zone data from 'data' so it can be indexed

data=data.tz_convert(None)

shifting

#getting stock prices in the day before and the 5 days after

data_neg_1=data.shift(periods=1)

YFinance
#pull in stock data from what was Yahoo Finance

data = yf.download(stock_selection, period="1y")

Statistics

stats.mannwhitneyu(Experiment_returns, Control_returns,
alternative='less')

Random numbers
#we will now go back to pivot_table_control to give some random dates
to our transactions

#we will now assign a random time variable within our reference range
of dates from 150-216 correponding to ref and the period sampled.

random.seed(42)

for idx, row in piv_table_control.iterrows():

a=random.randrange(start=150, stop=216, step=1)

piv_table_control.loc[idx,'random_date_index']=int(a)

Working with JS files

#unfortunately the data is in string format and i need to change them

all to dictionary so that i can put them in a dataframe

#as running it took forever, i will instead have to transform the data
unfortunately using a different package....

vivino_data['wine_matches']=vivino_data['wine_matches'].apply(lambda x:
ast.literal_eval(x))

# Define the command to run the Node.js script

def retrieve_wine_matches(wine_name):

command = [
"node",

"C:/Users/maria/OneDrive/Documents_old/CodeOpDocs/Milestones/Group_Proj
ect/Vivino_api/vivino-api/vivino.js",
f"--name='{wine_name}'"
]
try:
result = subprocess.run(command, capture_output=True,
text=True, check=True)
print("Node.js script output:", result.stdout)

# Parse the output JSON file

with
open("C:/Users/maria/OneDrive/Documents_old/CodeOpDocs/Milestones/
Group_Project/Drinks_data/vivino-out.json", "r", encoding="utf-8", er-
rors="ignore") as f:
data = json.load(f)
return data['vinos']

except subprocess.CalledProcessError:
data = np.nan
return data

FUZZ Data

def find_best_match(row):

dataf_choices = pd.DataFrame(row['wine_matches'])
the_name = row['full_name']

if 'name' in dataf_choices.columns:
match = process.extractOne(the_name, dataf_choices['name'],
scorer=fuzz.token_set_ratio)
return match
else:
return np.nan
API
REST API, SOAP, RPC, U

#unique values from the "Bottle Type" column as a list

unique_values_list = df['Bottle Type'].dropna().unique().tolist()
# Create a new DataFrame from the unique values list
unique_values_df = pd.DataFrame(unique_values_list, columns=['Bottle Type'])

print(unique_values_df)column_to_translate = list(unique_values_df["Bottle
Type"].unique()
# Your DeepL API key (replace with your own API key)
DEEPL_API_KEY = '***'
# Replace with your actual DeepL API
keydl_translator = Translator(DEEPL_API_KEY)
#put unique values in a dictionary as keys
translation_dict = {}
for type_name in column_to_translate:
if type_name.strip() != "": # Skip if the value is empty or just whitespace
try:
# Translate the text and extract the actual translated text from the TextResult
object
translated_result = dl_translator.translate_text(type_name, target_lang="EN-
US").text
if hasattr(translated_result, 'text'):
translation_dict[type_name] = translated_result # If it's a TextResult object
else:
translation_dict[type_name] = translated_result # If it's already a string
except deepl.DeepLException as e:
print(f"Error translating value '{type_name}': {e}")
translation_dict[type_name] = type_name
else:
# If the type_name is empty, keep it empty in the dictionary
translation_dict[type_name] = type_name# Display the translation dictionary
print("Translation dictionary:")
print(translation_dict)
Unit Tests- with a coverage of 98%
Can also do unit tests in PyTest (pip install pytest)
In the repository
Src-
Tests-
Data
examples

COT Datjdjdjsjsjsjsjsjsjjsjsjsjsjsjsjsjsjsjjss
No ratings yet
COT Datjdjdjsjsjsjsjsjsjjsjsjsjsjsjsjsjsjsjjss
29 pages
COT Data
No ratings yet
COT Data
18 pages
Pandas - Cheat - Sheet
No ratings yet
Pandas - Cheat - Sheet
6 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
Financial Time Series Analisys For Raizen Company
No ratings yet
Financial Time Series Analisys For Raizen Company
19 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
6 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Final Ai M225187154i
No ratings yet
Final Ai M225187154i
25 pages
Module 3
No ratings yet
Module 3
5 pages
Pandas Operations Guide
No ratings yet
Pandas Operations Guide
6 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
Practicals
No ratings yet
Practicals
42 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Different Methods of Plotting
No ratings yet
Different Methods of Plotting
4 pages
Project ML Code
No ratings yet
Project ML Code
132 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Task 6
No ratings yet
Task 6
14 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Prototype 13
No ratings yet
Prototype 13
1 page
Sales Management Project Report
No ratings yet
Sales Management Project Report
18 pages
10 - Jayesh - Prakash - Rane
No ratings yet
10 - Jayesh - Prakash - Rane
26 pages
Code
No ratings yet
Code
1 page
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Gold Price Forecasting with AI
No ratings yet
Gold Price Forecasting with AI
44 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Python & Pandas Cheat Sheet Guide
100% (2)
Python & Pandas Cheat Sheet Guide
5 pages
AM19 EDA Assignment5
No ratings yet
AM19 EDA Assignment5
19 pages
Anagh-Desai BigDataAssignments NYSE Airlines Using DF
No ratings yet
Anagh-Desai BigDataAssignments NYSE Airlines Using DF
9 pages
Quantium Task 2
No ratings yet
Quantium Task 2
30 pages
Zomoto Data Analysis Using Python - 1
No ratings yet
Zomoto Data Analysis Using Python - 1
10 pages
Python & Data Science Cheat Sheet
100% (4)
Python & Data Science Cheat Sheet
11 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
Pyhtonpractice Questions
No ratings yet
Pyhtonpractice Questions
5 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Extract Transform Load
No ratings yet
Extract Transform Load
80 pages
Gestión de Carteras Mapa de Calor
No ratings yet
Gestión de Carteras Mapa de Calor
1 page
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
No ratings yet
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
10 pages
Data Preprocessing Tasks in Pandas PYTHON
No ratings yet
Data Preprocessing Tasks in Pandas PYTHON
2 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
Laptop Dataset Analysis & Visualization
No ratings yet
Laptop Dataset Analysis & Visualization
1 page
Python Interview Cheat Sheet Moodys
No ratings yet
Python Interview Cheat Sheet Moodys
2 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Basics of Pandas
No ratings yet
Basics of Pandas
5 pages
Moving Average Cross Strategy
No ratings yet
Moving Average Cross Strategy
1 page
Experiment No 11
No ratings yet
Experiment No 11
19 pages
Excel To Pandas Advanced Data Techniques For BI Devs 1729266352
No ratings yet
Excel To Pandas Advanced Data Techniques For BI Devs 1729266352
9 pages
Import As Import As
No ratings yet
Import As Import As
18 pages
Exercise#9 Instructions 2021
No ratings yet
Exercise#9 Instructions 2021
5 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Time Series Forcast
No ratings yet
Time Series Forcast
18 pages
Report
No ratings yet
Report
25 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Freda Song Drechsler - Fama-French
No ratings yet
Freda Song Drechsler - Fama-French
7 pages
OB - Product Design - Eng
No ratings yet
OB - Product Design - Eng
29 pages
FLSmidth - HPGR - High Pressure Grinding Roll - Brochure
No ratings yet
FLSmidth - HPGR - High Pressure Grinding Roll - Brochure
9 pages
Milling Parameters Guide
No ratings yet
Milling Parameters Guide
1 page
Bộ đề kiểm tra định kì - lớp 6 - global success
No ratings yet
Bộ đề kiểm tra định kì - lớp 6 - global success
38 pages
SPSS: A Tool For Survey Analysis: Alok Kumar PGDM 2 Year
No ratings yet
SPSS: A Tool For Survey Analysis: Alok Kumar PGDM 2 Year
3 pages
Dostojewski Notatki Z Podziemia (Całość)
No ratings yet
Dostojewski Notatki Z Podziemia (Całość)
102 pages
Quaternion Cheat Sheet and Problems Quaternion Arithmetic: 0 X y Z I 0 X y Z
No ratings yet
Quaternion Cheat Sheet and Problems Quaternion Arithmetic: 0 X y Z I 0 X y Z
2 pages
Sem 4 - Writing and Editing For Media - Bammc
No ratings yet
Sem 4 - Writing and Editing For Media - Bammc
28 pages
Technical Catalogue PQC
No ratings yet
Technical Catalogue PQC
2 pages
ADHD Assessment
No ratings yet
ADHD Assessment
6 pages
Viviane Namaste - Undoing Theory
No ratings yet
Viviane Namaste - Undoing Theory
23 pages
Subject G11-Goodyear Tvl-Ia Eclassrecord 1stsem 2018-19
No ratings yet
Subject G11-Goodyear Tvl-Ia Eclassrecord 1stsem 2018-19
29 pages
Diffuse Double Layer
No ratings yet
Diffuse Double Layer
14 pages
演讲技巧与主题选择
100% (1)
演讲技巧与主题选择
6 pages
Vickers Hardness Test
No ratings yet
Vickers Hardness Test
3 pages
Dos and Donts
100% (1)
Dos and Donts
4 pages
RGS404 Rpa2030 Ep 1
No ratings yet
RGS404 Rpa2030 Ep 1
37 pages
ATI FT Sensor Catalog 2005
No ratings yet
ATI FT Sensor Catalog 2005
32 pages
Linear Array Operations in C++
No ratings yet
Linear Array Operations in C++
4 pages
Faktor Pengeboran Sumur Make Up
No ratings yet
Faktor Pengeboran Sumur Make Up
16 pages
MLGS Ii
No ratings yet
MLGS Ii
505 pages
Irc 096-1987
No ratings yet
Irc 096-1987
9 pages
Mayoral Et Al. 2018. Geobrary
No ratings yet
Mayoral Et Al. 2018. Geobrary
5 pages
Teaching Tools for Parsing Education
No ratings yet
Teaching Tools for Parsing Education
5 pages
SBM Assessment Tool For Online Validation With Essential MOVs
No ratings yet
SBM Assessment Tool For Online Validation With Essential MOVs
10 pages
Technical Data Sheet: Art. 630 Art. 630/1 - 630/2 - 630/3 Art. W51 Description
No ratings yet
Technical Data Sheet: Art. 630 Art. 630/1 - 630/2 - 630/3 Art. W51 Description
4 pages
Presentation SEM
No ratings yet
Presentation SEM
25 pages
Custom DateTimePicker - Custom Controls WinForm C # - RJ Code Advance
No ratings yet
Custom DateTimePicker - Custom Controls WinForm C # - RJ Code Advance
12 pages
Experiment No.4 Atterberg Limits: Object
No ratings yet
Experiment No.4 Atterberg Limits: Object
3 pages
QMS Internal Audit - 1 Day Trainng
100% (2)
QMS Internal Audit - 1 Day Trainng
104 pages

My Own Cheatsheet

Uploaded by

My Own Cheatsheet

Uploaded by

Pulling Files in and out

#Make the dictionary into a DF

#make list of stocks in my universe and in SP500

for stock in universe:

# I am putting all owners on one line, separted by commas

#remove strange categories like flavoured wine, fruity sparkling

#aggregate the dictionaries

making a column filled with NA

bac_count = result[result.index == 'BAC']

#calculate how many data points are of each transaction type

#remove stocks where there is no data available

# I need to adjust the ones I'm using in graphs to make all

#I will calculate the max and min

#Order the wine Ascending in Price

#join the data again ***HORIZONTAL

#calc t-1 (day before the transaction)

for idx, row in piv_table.iterrows():

MELT to collapse columns

'direction', 'is_control',], value_vars=['rTime_0', 'rTime_1',

for idx, row in data_control.iterrows():

#remove time-zone data from 'data' so it can be indexed

data = yf.download(stock_selection, period="1y")

for idx, row in piv_table_control.iterrows():

Working with JS files

#unfortunately the data is in string format and i need to change them

# Define the command to run the Node.js script

# Parse the output JSON file

#unique values from the "Bottle Type" column as a list

You might also like