Pandas Data Handling Guide
Pandas Data Handling Guide
1. ----------------- csv
Out[9]: # Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
Out[6]: # Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
795 719 Diancie Rock Fairy 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 80 160 60 170 130 80 6 True
In [7]: len(df)
Out[7]: 800
In [ ]: ## via link
0 Microbiome Project American Gut (Microbiome Project) https://github.com/biocore/American-Gut Biology GitHub NaN
2 Global Climate Global Climate Data Since 1929 http://en.tutiempo.net/climate Climate/Weather NaN 1929.0
CommonCraw Computer
3 3.5B Web Pages from CommonCraw 2012 http://www.bigdatanews.com/profiles/blogs/big-... NaN 2012.0
2012 Networks
In [ ]:
2. --------------- txt
In [121]: df.head()
Out[121]: # Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
3. --------------- excel
In [1]: # df.head()
In [2]: # len(df)
Out[3]: list
In [4]: df_list
3 5 49ers[note 8] Giants Athletics Warriors
4 4 Cowboys[note 10] Rangers Mavericks
5 4 Commanders[note 11] Nationals[note 12] Wizards[note 13]
6 4 Eagles[note 14] Phillies[note 15] 76ers[note 16]
7 4 Dolphins Marlins Heat
8 4 Patriots[note 19] Red Sox[note 20] Celtics
9 4 Vikings[note 21] Twins Timberwolves[note 22]
10 4 Broncos Rockies Nuggets[note 24]
11 4 Cardinals Diamondbacks Suns
12 4 Lions[note 26] Tigers[note 27] Pistons[note 28]
13 3 — [note 29] Blue Jays Raptors[note 30]
14 3 Texans[note 31] Astros Rockets
15 3 Falcons Braves Hawks
16 3 Seahawks Mariners[note 33] [note 34]
17 3 Buccaneers Rays [note 35]
18 3 Steelers Pirates [note 37]
19 3 Browns[note 39] Guardians[note 40] Cavaliers[note 41]
20 2 [note 43] Cardinals[note 44] [note 45]
21 2 Panthers — Hornets[note 47]
22 2 Bengals[note 48] Reds[note 49] [note 50]
Out[6]: Population
Unnamed: Metropolitan Pop.
Country (2022 est.) B4 NFL MLB NBA NHL B6 MLS CFL
0 area rank
[8]
Yankees Rangers
United Giants Knicks Red Bulls New
0 0 New York City 1 19617869 9 Mets[note Islanders 11 —
States Jets[note 1] Nets York City FC
2] Devils[note 3]
Out[4]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
In [5]: df.columns
Out[5]: Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk',
'Sp. Def', 'Speed', 'Generation', 'Legendary'],
dtype='object')
Out[6]: # 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
Name: 0, dtype: object
In [7]: df.loc[10:14]
Out[7]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
In [8]: df.iloc[10:14]
Out[8]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
In [73]: #exemple
df.head()
Out[73]: # Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
Out[75]: 'Ivysaur'
In [76]: df.iloc[:, [1, 2, 5]] #Select les colonnes dans les positions 1, 2 and 5
0 Bulbasaur Grass 49
1 Ivysaur Grass 62
2 Venusaur Grass 82
4 Charmander Fire 52
In [77]: df[['Name', 'Type 1', 'Attack']] #select les column via leurs noms
0 Bulbasaur Grass 49
1 Ivysaur Grass 62
2 Venusaur Grass 82
4 Charmander Fire 52
Water
In [15]: len(df)
Out[15]: 800
In [14]: a = df.loc[ df['Attack'] > 150] #Select des lignes qui vérifient une condution
len(a)
Out[14]: 18
Out[80]: # Name Type 1 Type 2 HP Defense Sp. Atk Sp. Def Speed Generation Legendary
In [81]: # df.head() #la colonne Attack est encore dans df (mais pas dans b)
Out[16]: 'Grass'
Out[17]: 39
Out[84]: # 4
Name Charmander
Type 1 Fire
Type 2 NaN
HP 39
Attack 52
Defense 43
Sp. Atk 60
Sp. Def 50
Speed 65
Generation 1
Legendary False
Name: 4, dtype: object
Out[18]: 0 45
1 60
2 80
3 80
4 39
..
795 50
796 50
797 80
798 80
799 80
Name: HP, Length: 800, dtype: int64
Out[19]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
Out[20]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
Out[21]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
552 493 Arceus Normal NaN 120 120 120 120 120 120 4 True
groupby
Prendre une variable catégorique
Voir la distribution d'autres variables continues selon les différents catégories qui existent
In [96]: df
Out[96]: # Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 80 160 60 170 130 80 6 True
Attack Defense
Generation
1 76.638554 70.861446
2 72.028302 73.386792
3 81.625000 74.100000
4 82.867769 78.132231
5 82.066667 72.327273
6 75.804878 76.682927
Speed
Generation
1 72.584337
2 61.811321
3 66.925000
4 71.338843
5 68.078788
6 66.439024
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
8 9 i
9 10 j
0 1 a 0
1 2 b 10
2 3 c 20
3 4 d 30
4 5 e 40
5 6 f 50
6 7 g 60
7 8 h 70
8 9 i 80
9 10 j 90
10 11 k 100
In [ ]: