Update Chapter 2

gxzpython · web-flow · commit 1cf47cd7ea89 · 2020-09-23T23:08:11.000-05:00
diff --git a/Chapter 2 b/Chapter 2
@@ -245,5 +245,372 @@ print(cars.loc[:,['cars_per_cap','drives_right']])
 
 
 
+Boolean operators with Numpy
+
+Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the boolean operators and, or, and not.
+
+To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here's an example on the my_house and your_house arrays from before to give you an idea:
+
+np.logical_and(my_house > 13, 
+               your_house < 15)
+               
+               
+# Create arrays
+import numpy as np
+my_house = np.array([18.0, 20.0, 10.75, 9.50])
+your_house = np.array([14.0, 24.0, 14.25, 9.0])
+
+# my_house greater than 18.5 or smaller than 10
+print(np.logical_or(my_house>18.5,my_house<10))
+
+# Both my_house and your_house smaller than 11
+print(np.logical_and(my_house<11, your_house<11))
+
+
+
+
+Driving right (1)
+
+Remember that cars dataset, containing the cars per 1000 people (cars_per_cap) and whether people drive right (drives_right) for different countries (country)? The code that imports this data in CSV format into Python as a DataFrame is available on the right.
+
+In the video, you saw a step-by-step approach to filter observations from a DataFrame based on boolean arrays. Let's start simple and try to find all observations in cars where drives_right is True.
+
+drives_right is a boolean column, so you'll have to extract it as a Series and then use this boolean Series to select observations from cars.
+
+
+Extract the drives_right column as a Pandas Series and store it as dr.
+Use dr, a boolean Series, to subset the cars DataFrame. Store the resulting selection in sel.
+Print sel, and assert that drives_right is True for all observations.
+
+# Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+print(cars)
+# Extract drives_right column as Series: dr
+dr=cars['drives_right']
+
+# Use dr to subset cars: sel
+sel = cars[cars['drives_right']]
+
+# Print sel
+print(cars[cars['drives_right']])
+
+
+
+
+Driving right (2)
+
+The code in the previous example worked fine, but you actually unnecessarily created a new variable dr. You can achieve the same result without this intermediate variable. Put the code that computes dr straight into the square brackets that select observations from cars
+
+
+Convert the code on the right to a one-liner that calculates the variable sel as before.
+
+
+# Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+
+# Convert code to a one-liner
+sel = cars[cars['drives_right']]
+
+# Print sel
+print(cars[cars['drives_right']])
+
+
+
+
+
+
+
+
+Exercise
+Exercise
+Cars per capita (1)
+
+Let's stick to the cars data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars.
+
+Similar to the previous example, you'll want to build up a boolean Series, that you can then use to subset the cars DataFrame to select certain observations. If you want to do this in a one-liner, that's perfectly fine
+
+
+
+Select the cars_per_cap column from cars as a Pandas Series and store it as cpc.
+Use cpc in combination with a comparison operator and 500. You want to end up with a boolean Series that's True if the corresponding country has a cars_per_cap of more than 500 and False otherwise. Store this boolean Series as many_cars.
+Use many_cars to subset cars, similar to what you did before. Store the result as car_maniac.
+Print out car_maniac to see if you got it right.
+
+
+
+# Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+
+# Create car_maniac: observations that have a cars_per_cap over 500
+cpc = cars['cars_per_cap']
+many_cars=cars['cars_per_cap']>500
+
+
+# Print car_maniac
+cars_maniac = cars[many_cars]
+print(cars_maniac)
+
+
+
+Cars per capita (2)
+
+Remember about np.logical_and(), np.logical_or() and np.logical_not(), the Numpy variants of the and, or and not operators? You can also use them on Pandas Series to do more advanced filtering operations.
+
+Take this example that selects the observations that have a cars_per_cap between 10 and 80. Try out these lines of code step by step to see what's happening.
+
+cpc = cars['cars_per_cap']
+between = np.logical_and(cpc > 10, cpc < 80)
+medium = cars[between]
+
+
+Use the code sample above to create a DataFrame medium, that includes all the observations of cars that have a cars_per_cap between 100 and 500.
+Print out medium.
+
+# Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+
+# Import numpy, you'll need this
+import numpy as np
+
+# Create medium: observations with cars_per_cap between 100 and 500
+cpc = cars['cars_per_cap']
+between = np.logical_and(cars['cars_per_cap']>100,cars['cars_per_cap']<500)
+medium = cars[between]
+# Print medium
+print(medium)
+
+
+
+
+Basic while loop
+
+Below you can find the example from the video where the error variable, initially equal to 50.0, is divided by 4 and printed out on every run:
+
+error = 50.0
+while error > 1 :
+    error = error / 4
+    print(error)
+This example will come in handy, because it's time to build a while loop yourself! We're going to code a while loop that implements a very basic control system for an inverted pendulum. If there's an offset from standing perfectly straight, the while loop will incrementally fix this offset.
+
+Note that if your while loop takes too long to run, you might have made a mistake. In particular, remember to indent the contents of the loop!
+
+
+
+Create the variable offset with an initial value of 8.
+Code a while loop that keeps running as long as offset is not equal to 0. Inside the while loop:
+Print out the sentence "correcting...".
+Next, decrease the value of offset by 1. You can do this with offset = offset - 1.
+Finally, still within your loop, print out offset so you can see how it changes.
+
+
+# Initialize offset
+offset = 8
+
+# Code the while loop
+while offset != 0 :
+    print("correcting...")
+    offset = offset - 1
+    print(offset)
+    
+    
+    Indexes and values (2)
+
+For non-programmer folks, room 0: 11.25 is strange. Wouldn't it be better if the count started at 1?
+
+
+
+
+
+Loop over list of lists
+
+Remember the house variable from the Intro to Python course? Have a look at its definition on the right. It's basically a list of lists, where each sublist contains the name and area of a room in your house.
+
+It's up to you to build a for loop from scratch this time!
+
+
+Write a for loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room.
+# house list of lists
+house = [["hallway", 11.25], 
+         ["kitchen", 18.0], 
+         ["living room", 20.0], 
+         ["bedroom", 10.75], 
+         ["bathroom", 9.50]]
+         
+# Build a for loop from scratch
+
+for x,y in house:
+    print('the '+str(x)+' is '+ str(y) +' sqm')
+    
+    
+
+
+
+Loop over dictionary
+
+In Python 3, you need the items() method to loop over a dictionary:
+
+world = { "afghanistan":30.55, 
+          "albania":2.77,
+          "algeria":39.21 }
+
+for key, value in world.items() :
+    print(key + " -- " + str(value))
+Remember the europe dictionary that contained the names of some European countries as key and their capitals as corresponding value? Go ahead and write a loop to iterate over it!
+
+Instructions
+100 XP
+Write a for loop that goes through each key:value pair of europe. On each iteration, "the capital of x is y" should be printed out, where x is the key and y is the value of the pai
+
+
+
+# Definition of dictionary
+europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
+          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
+          
+# Iterate over europe
+for key,value in europe.items():
+    print('the capital of '+str(key)+' '+'is '+str(value))
+    
+    
+ Loop over Numpy array
+
+If you're dealing with a 1D Numpy array, looping over all elements can be as simple as:
+
+for x in my_array :
+    ...
+If you're dealing with a 2D Numpy array, it's more complicated. A 2D array is built up of multiple 1D arrays. To explicitly iterate over all separate elements of a multi-dimensional array, you'll need this syntax:
+
+for x in np.nditer(my_array) :
+    ...
+Two Numpy arrays that you might recognize from the intro course are available in your Python session: np_height, a Numpy array containing the heights of Major League Baseball players, and np_baseball, a 2D Numpy array that contains both the heights (first column) and weights (second column) of those players.
+
+
+# Import numpy as np
+import numpy as np
+
+# For loop over np_height
+for x in np.nditer(np_height):
+    print(str(x)+' inches')
+
+# For loop over np_baseball
+for x in np.nditer(np_baseball):
+    print(x)
+    
+Loop over DataFrame (1)
+
+Iterating over a Pandas DataFrame is typically done with the iterrows() method. Used in a for loop, every observation is iterated over and on every iteration the row label and actual row contents are available:
+
+for lab, row in brics.iterrows() :
+    ...
+In this and the following exercises you will be working on the cars DataFrame. It contains information on the cars per capita and whether people drive right or left for seven countries in the world.
+
+
+
+
+# Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+
+# Iterate over rows of cars
+for lab,row in cars.iterrows():
+    print(lab)
+    print(row)
+    
+    
+    
+    
+Loop over DataFrame (2)
+
+The row data that's generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets:
+
+for lab, row in brics.iterrows() :
+    print(row['country'])
+    
+    
+  # Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+
+# Adapt for loop
+for lab, row in cars.iterrows():
+    print(lab+':'+ str(row['cars_per_cap']))
+
+
+
+Add column (1)
+
+In the video, Hugo showed you how to add the length of the country names of the brics DataFrame in a new column:
+
+for lab, row in brics.iterrows() :
+    brics.loc[lab, "name_length"] = len(row["country"])
+You can do similar things on the cars DataFrame.
+
+
+
+
+Use a for loop to add a new column, named COUNTRY, that contains a uppercase version of the country names in the "country" column. You can use the string method upper() for this.
+To see if your code worked, print out cars. Don't indent this code, so that it's not part of the for loop.
+
+
+# Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+
+# Code for loop that adds COUNTRY column
+for lab,row in cars.iterrows() :
+    cars.loc[lab,'COUNTRY']=row['country'].upper()
+
+
+# Print cars
+print(cars)
+
+
+
+Add column (2)
+
+Using iterrows() to iterate over every observation of a Pandas DataFrame is easy to understand, but not very efficient. On every iteration, you're creating a new Pandas Series.
+
+If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you'll want to use apply().
+
+Compare the iterrows() version with the apply() version to get the same result in the brics DataFrame:
+
+for lab, row in brics.iterrows() :
+    brics.loc[lab, "name_length"] = len(row["country"])
+
+brics["name_length"] = brics["country"].apply(len)
+We can do a similar thing to call the upper() method on every name in the country column. However, upper() is a method, so we'll need a slightly different approach:
+
+Replace the for loop with a one-liner that uses .apply(str.upper). The call should give the same result: a column COUNTRY should be added to cars, containing an uppercase version of the country names.
+As usual, print out cars to see the fruits of your hard labor
+
+
+
+# Import cars data
+import pandas as pd
+cars = pd.read_csv('cars.csv', index_col = 0)
+
+# Use .apply(str.upper)
+for lab, row in cars.iterrows() :
+    cars.loc[lab, "COUNTRY"] = row["country"].upper()
+    
+cars['COUNTRY']=cars['country'].apply(str.upper)
+print(cars)
+
+
+
+
+    
+    
+
+
+
+
+
+
+