Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 1cf47cd

Browse files
authored
Update Chapter 2
1 parent b7601e7 commit 1cf47cd

File tree

1 file changed

+367
-0
lines changed

1 file changed

+367
-0
lines changed

Chapter 2

Lines changed: 367 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,5 +245,372 @@ print(cars.loc[:,['cars_per_cap','drives_right']])
245245

246246

247247

248+
Boolean operators with Numpy
249+
250+
Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the boolean operators and, or, and not.
251+
252+
To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here's an example on the my_house and your_house arrays from before to give you an idea:
253+
254+
np.logical_and(my_house > 13,
255+
your_house < 15)
256+
257+
258+
# Create arrays
259+
import numpy as np
260+
my_house = np.array([18.0, 20.0, 10.75, 9.50])
261+
your_house = np.array([14.0, 24.0, 14.25, 9.0])
262+
263+
# my_house greater than 18.5 or smaller than 10
264+
print(np.logical_or(my_house>18.5,my_house<10))
265+
266+
# Both my_house and your_house smaller than 11
267+
print(np.logical_and(my_house<11, your_house<11))
268+
269+
270+
271+
272+
Driving right (1)
273+
274+
Remember that cars dataset, containing the cars per 1000 people (cars_per_cap) and whether people drive right (drives_right) for different countries (country)? The code that imports this data in CSV format into Python as a DataFrame is available on the right.
275+
276+
In the video, you saw a step-by-step approach to filter observations from a DataFrame based on boolean arrays. Let's start simple and try to find all observations in cars where drives_right is True.
277+
278+
drives_right is a boolean column, so you'll have to extract it as a Series and then use this boolean Series to select observations from cars.
279+
280+
281+
Extract the drives_right column as a Pandas Series and store it as dr.
282+
Use dr, a boolean Series, to subset the cars DataFrame. Store the resulting selection in sel.
283+
Print sel, and assert that drives_right is True for all observations.
284+
285+
# Import cars data
286+
import pandas as pd
287+
cars = pd.read_csv('cars.csv', index_col = 0)
288+
print(cars)
289+
# Extract drives_right column as Series: dr
290+
dr=cars['drives_right']
291+
292+
# Use dr to subset cars: sel
293+
sel = cars[cars['drives_right']]
294+
295+
# Print sel
296+
print(cars[cars['drives_right']])
297+
298+
299+
300+
301+
Driving right (2)
302+
303+
The code in the previous example worked fine, but you actually unnecessarily created a new variable dr. You can achieve the same result without this intermediate variable. Put the code that computes dr straight into the square brackets that select observations from cars
304+
305+
306+
Convert the code on the right to a one-liner that calculates the variable sel as before.
307+
308+
309+
# Import cars data
310+
import pandas as pd
311+
cars = pd.read_csv('cars.csv', index_col = 0)
312+
313+
# Convert code to a one-liner
314+
sel = cars[cars['drives_right']]
315+
316+
# Print sel
317+
print(cars[cars['drives_right']])
318+
319+
320+
321+
322+
323+
324+
325+
326+
Exercise
327+
Exercise
328+
Cars per capita (1)
329+
330+
Let's stick to the cars data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars.
331+
332+
Similar to the previous example, you'll want to build up a boolean Series, that you can then use to subset the cars DataFrame to select certain observations. If you want to do this in a one-liner, that's perfectly fine
333+
334+
335+
336+
Select the cars_per_cap column from cars as a Pandas Series and store it as cpc.
337+
Use cpc in combination with a comparison operator and 500. You want to end up with a boolean Series that's True if the corresponding country has a cars_per_cap of more than 500 and False otherwise. Store this boolean Series as many_cars.
338+
Use many_cars to subset cars, similar to what you did before. Store the result as car_maniac.
339+
Print out car_maniac to see if you got it right.
340+
341+
342+
343+
# Import cars data
344+
import pandas as pd
345+
cars = pd.read_csv('cars.csv', index_col = 0)
346+
347+
# Create car_maniac: observations that have a cars_per_cap over 500
348+
cpc = cars['cars_per_cap']
349+
many_cars=cars['cars_per_cap']>500
350+
351+
352+
# Print car_maniac
353+
cars_maniac = cars[many_cars]
354+
print(cars_maniac)
355+
356+
357+
358+
Cars per capita (2)
359+
360+
Remember about np.logical_and(), np.logical_or() and np.logical_not(), the Numpy variants of the and, or and not operators? You can also use them on Pandas Series to do more advanced filtering operations.
361+
362+
Take this example that selects the observations that have a cars_per_cap between 10 and 80. Try out these lines of code step by step to see what's happening.
363+
364+
cpc = cars['cars_per_cap']
365+
between = np.logical_and(cpc > 10, cpc < 80)
366+
medium = cars[between]
367+
368+
369+
Use the code sample above to create a DataFrame medium, that includes all the observations of cars that have a cars_per_cap between 100 and 500.
370+
Print out medium.
371+
372+
# Import cars data
373+
import pandas as pd
374+
cars = pd.read_csv('cars.csv', index_col = 0)
375+
376+
# Import numpy, you'll need this
377+
import numpy as np
378+
379+
# Create medium: observations with cars_per_cap between 100 and 500
380+
cpc = cars['cars_per_cap']
381+
between = np.logical_and(cars['cars_per_cap']>100,cars['cars_per_cap']<500)
382+
medium = cars[between]
383+
# Print medium
384+
print(medium)
385+
386+
387+
388+
389+
Basic while loop
390+
391+
Below you can find the example from the video where the error variable, initially equal to 50.0, is divided by 4 and printed out on every run:
392+
393+
error = 50.0
394+
while error > 1 :
395+
error = error / 4
396+
print(error)
397+
This example will come in handy, because it's time to build a while loop yourself! We're going to code a while loop that implements a very basic control system for an inverted pendulum. If there's an offset from standing perfectly straight, the while loop will incrementally fix this offset.
398+
399+
Note that if your while loop takes too long to run, you might have made a mistake. In particular, remember to indent the contents of the loop!
400+
401+
402+
403+
Create the variable offset with an initial value of 8.
404+
Code a while loop that keeps running as long as offset is not equal to 0. Inside the while loop:
405+
Print out the sentence "correcting...".
406+
Next, decrease the value of offset by 1. You can do this with offset = offset - 1.
407+
Finally, still within your loop, print out offset so you can see how it changes.
408+
409+
410+
# Initialize offset
411+
offset = 8
412+
413+
# Code the while loop
414+
while offset != 0 :
415+
print("correcting...")
416+
offset = offset - 1
417+
print(offset)
418+
419+
420+
Indexes and values (2)
421+
422+
For non-programmer folks, room 0: 11.25 is strange. Wouldn't it be better if the count started at 1?
423+
424+
425+
426+
427+
428+
Loop over list of lists
429+
430+
Remember the house variable from the Intro to Python course? Have a look at its definition on the right. It's basically a list of lists, where each sublist contains the name and area of a room in your house.
431+
432+
It's up to you to build a for loop from scratch this time!
433+
434+
435+
Write a for loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room.
436+
# house list of lists
437+
house = [["hallway", 11.25],
438+
["kitchen", 18.0],
439+
["living room", 20.0],
440+
["bedroom", 10.75],
441+
["bathroom", 9.50]]
442+
443+
# Build a for loop from scratch
444+
445+
for x,y in house:
446+
print('the '+str(x)+' is '+ str(y) +' sqm')
447+
448+
449+
450+
451+
452+
Loop over dictionary
453+
454+
In Python 3, you need the items() method to loop over a dictionary:
455+
456+
world = { "afghanistan":30.55,
457+
"albania":2.77,
458+
"algeria":39.21 }
459+
460+
for key, value in world.items() :
461+
print(key + " -- " + str(value))
462+
Remember the europe dictionary that contained the names of some European countries as key and their capitals as corresponding value? Go ahead and write a loop to iterate over it!
463+
464+
Instructions
465+
100 XP
466+
Write a for loop that goes through each key:value pair of europe. On each iteration, "the capital of x is y" should be printed out, where x is the key and y is the value of the pai
467+
468+
469+
470+
# Definition of dictionary
471+
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
472+
'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
473+
474+
# Iterate over europe
475+
for key,value in europe.items():
476+
print('the capital of '+str(key)+' '+'is '+str(value))
477+
478+
479+
Loop over Numpy array
480+
481+
If you're dealing with a 1D Numpy array, looping over all elements can be as simple as:
482+
483+
for x in my_array :
484+
...
485+
If you're dealing with a 2D Numpy array, it's more complicated. A 2D array is built up of multiple 1D arrays. To explicitly iterate over all separate elements of a multi-dimensional array, you'll need this syntax:
486+
487+
for x in np.nditer(my_array) :
488+
...
489+
Two Numpy arrays that you might recognize from the intro course are available in your Python session: np_height, a Numpy array containing the heights of Major League Baseball players, and np_baseball, a 2D Numpy array that contains both the heights (first column) and weights (second column) of those players.
490+
491+
492+
# Import numpy as np
493+
import numpy as np
494+
495+
# For loop over np_height
496+
for x in np.nditer(np_height):
497+
print(str(x)+' inches')
498+
499+
# For loop over np_baseball
500+
for x in np.nditer(np_baseball):
501+
print(x)
502+
503+
Loop over DataFrame (1)
504+
505+
Iterating over a Pandas DataFrame is typically done with the iterrows() method. Used in a for loop, every observation is iterated over and on every iteration the row label and actual row contents are available:
506+
507+
for lab, row in brics.iterrows() :
508+
...
509+
In this and the following exercises you will be working on the cars DataFrame. It contains information on the cars per capita and whether people drive right or left for seven countries in the world.
510+
511+
512+
513+
514+
# Import cars data
515+
import pandas as pd
516+
cars = pd.read_csv('cars.csv', index_col = 0)
517+
518+
# Iterate over rows of cars
519+
for lab,row in cars.iterrows():
520+
print(lab)
521+
print(row)
522+
523+
524+
525+
526+
Loop over DataFrame (2)
527+
528+
The row data that's generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets:
529+
530+
for lab, row in brics.iterrows() :
531+
print(row['country'])
532+
533+
534+
# Import cars data
535+
import pandas as pd
536+
cars = pd.read_csv('cars.csv', index_col = 0)
537+
538+
# Adapt for loop
539+
for lab, row in cars.iterrows():
540+
print(lab+':'+ str(row['cars_per_cap']))
541+
542+
543+
544+
Add column (1)
545+
546+
In the video, Hugo showed you how to add the length of the country names of the brics DataFrame in a new column:
547+
548+
for lab, row in brics.iterrows() :
549+
brics.loc[lab, "name_length"] = len(row["country"])
550+
You can do similar things on the cars DataFrame.
551+
552+
553+
554+
555+
Use a for loop to add a new column, named COUNTRY, that contains a uppercase version of the country names in the "country" column. You can use the string method upper() for this.
556+
To see if your code worked, print out cars. Don't indent this code, so that it's not part of the for loop.
557+
558+
559+
# Import cars data
560+
import pandas as pd
561+
cars = pd.read_csv('cars.csv', index_col = 0)
562+
563+
# Code for loop that adds COUNTRY column
564+
for lab,row in cars.iterrows() :
565+
cars.loc[lab,'COUNTRY']=row['country'].upper()
566+
567+
568+
# Print cars
569+
print(cars)
570+
571+
572+
573+
Add column (2)
574+
575+
Using iterrows() to iterate over every observation of a Pandas DataFrame is easy to understand, but not very efficient. On every iteration, you're creating a new Pandas Series.
576+
577+
If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you'll want to use apply().
578+
579+
Compare the iterrows() version with the apply() version to get the same result in the brics DataFrame:
580+
581+
for lab, row in brics.iterrows() :
582+
brics.loc[lab, "name_length"] = len(row["country"])
583+
584+
brics["name_length"] = brics["country"].apply(len)
585+
We can do a similar thing to call the upper() method on every name in the country column. However, upper() is a method, so we'll need a slightly different approach:
586+
587+
Replace the for loop with a one-liner that uses .apply(str.upper). The call should give the same result: a column COUNTRY should be added to cars, containing an uppercase version of the country names.
588+
As usual, print out cars to see the fruits of your hard labor
589+
590+
591+
592+
# Import cars data
593+
import pandas as pd
594+
cars = pd.read_csv('cars.csv', index_col = 0)
595+
596+
# Use .apply(str.upper)
597+
for lab, row in cars.iterrows() :
598+
cars.loc[lab, "COUNTRY"] = row["country"].upper()
599+
600+
cars['COUNTRY']=cars['country'].apply(str.upper)
601+
print(cars)
602+
603+
604+
605+
606+
607+
608+
609+
610+
611+
612+
613+
614+
248615

249616

0 commit comments

Comments
 (0)