1)i. Write a NumPy program to convert a list and tuple into arrays.
Program:
import numpy as np
# Convert a list to a NumPy array
list_data = [1, 2, 3, 4, 5]
array_from_list = np.array(list_data)
print("Array from list:", array_from_list)
# Convert a tuple to a NumPy array
tuple_data = (10, 20, 30, 40, 50)
array_from_tuple = np.array(tuple_data)
print("Array from tuple:", array_from_tuple)
Array from list: [1 2 3 4 5]
Array from tuple: [10 20 30 40 50]
ii.Write a NumPy program to convert the values of Centigrade degrees into
Fahrenheit degrees and vice versa. Values have to be stored into a NumPy
array.
Program:
import numpy as np
# Function to convert Centigrade to Fahrenheit
def centigrade_to_fahrenheit(celsius):
return (celsius * 9/5) + 32
# Function to convert Fahrenheit to Centigrade
def fahrenheit_to_centigrade(fahrenheit):
return (fahrenheit - 32) * 5/9
# Create a NumPy array of Centigrade temperatures
centigrade_values = np.array([0, 10, 20, 30, 40, 50])
# Convert Centigrade to Fahrenheit
fahrenheit_values = centigrade_to_fahrenheit(centigrade_values)
# Create a NumPy array of Fahrenheit temperatures
fahrenheit_array = np.array([32, 50, 68, 86, 104, 122])
# Convert Fahrenheit to Centigrade
centigrade_from_fahrenheit = fahrenheit_to_centigrade(fahrenheit_array)
# Print the results
print("Centigrade values:", centigrade_values)
print("Converted Fahrenheit values:", fahrenheit_values)
print("\nFahrenheit values:", fahrenheit_array)
print("Converted Centigrade values:", centigrade_from_fahrenheit)
output:
Centigrade values: [ 0 10 20 30 40 50]
Converted Fahrenheit values: [ 32. 50. 68. 86. 104. 122.]
Fahrenheit values: [ 32 50 68 86 104 122]
Converted Centigrade values: [ 0. 10. 20. 30. 40. 50.]
2. i. Write a NumPy program to find the real and imaginary parts of an array of
complex numbers.
Program:
import numpy as np
# Create a NumPy array of complex numbers
complex_array = np.array([2 + 3j, 4 - 5j, -1 + 2j, 3 + 4j])
# Extract the real parts of the complex numbers
real_parts = np.real(complex_array)
# Extract the imaginary parts of the complex numbers
imaginary_parts = np.imag(complex_array)
# Print the results
print("Complex array:", complex_array)
print("Real parts:", real_parts)
print("Imaginary parts:", imaginary_parts)
output:
Complex array: [ 2.+3.j 4.-5.j -1.+2.j 3.+4.j]
Real parts: [ 2. 4. -1. 3.]
Imaginary parts: [ 3. -5. 2. 4.]
ii. Write a NumPy program to convert a NumPy array into a csv file
program:
import numpy as np
# Create a NumPy array
array_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Save the NumPy array into a CSV file
np.savetxt('array_data.csv', array_data, delimiter=',', fmt='%d')
print("Array has been saved to 'array_data.csv'.")
output:
1,2,3
4,5,6
7,8,9
3. i. Write a NumPy program to perform the basic arithmetic operations
Program:
import numpy as np
# Create two NumPy arrays
array1 = np.array([10, 20, 30, 40, 50])
array2 = np.array([1, 2, 3, 4, 5])
# Addition
addition_result = array1 + array2
# Subtraction
subtraction_result = array1 - array2
# Multiplication
multiplication_result = array1 * array2
# Division
division_result = array1 / array2
# Exponentiation (array1 raised to the power of array2)
exponentiation_result = array1 ** array2
# Print the results
print("Array 1:", array1)
print("Array 2:", array2)
print("\nAddition (Array1 + Array2):", addition_result)
print("Subtraction (Array1 - Array2):", subtraction_result)
print("Multiplication (Array1 * Array2):", multiplication_result)
print("Division (Array1 / Array2):", division_result)
print("Exponentiation (Array1 ** Array2):", exponentiation_result)
output:
Array 1: [10 20 30 40 50]
Array 2: [1 2 3 4 5]
Addition (Array1 + Array2): [11 22 33 44 55]
Subtraction (Array1 - Array2): [ 9 18 27 36 45]
Multiplication (Array1 * Array2): [ 10 40 90 160 250]
Division (Array1 / Array2): [10. 10. 10. 10. 10.]
Exponentiation (Array1 ** Array2): [ 10 400 27000 1600000 9765625]
ii.Write a NumPy program to transpose an array.
Program:
import numpy as np
# Create a 2D NumPy array
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Transpose the array
transposed_array = np.transpose(array)
# Alternatively, you can also use the shorthand `.T` to transpose
# transposed_array = array.T
# Print the original and transposed arrays
print("Original Array:")
print(array)
print("\nTransposed Array:")
print(transposed_array)
output:
Original Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Transposed Array:
[[1 4 7]
[2 5 8]
[3 6 9]]
4) i. Use NumPy , Create an array with 5 dimensions and verify that it has 5
dimensions.
Program:
import numpy as np
# Create a 5-dimensional NumPy array with random integers
array_5d = np.random.randint(1, 10, size=(2, 3, 4, 5, 6))
# Verify the number of dimensions using .ndim
print("Array Shape:", array_5d.shape)
print("Number of Dimensions:", array_5d.ndim)
output:
Array Shape: (2, 3, 4, 5, 6)
Number of Dimensions: 5
ii. Using NumPy, Sort a boolean array.
Program:
import numpy as np
# Create a boolean NumPy array
boolean_array = np.array([True, False, True, False, True, False])
# Sort the boolean array
sorted_array = np.sort(boolean_array)
# Print the original and sorted arrays
print("Original Boolean Array:", boolean_array)
print("Sorted Boolean Array:", sorted_array)
output:
Original Boolean Array: [ True False True False True False]
Sorted Boolean Array: [False False False True True True]
5) i. Create your own simple Pandas DataFrame and print its values.
Program:
import pandas as pd
# Create a simple dictionary with data
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
output:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
4 Eve 29 Phoenix
ii. Create your own DataFrame from dict of narray/list.
Program:
import pandas as pd
import numpy as np
# Create a dictionary with NumPy arrays or lists
data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
'Price': np.array([1000, 600, 300, 250, 100]),
'Stock': np.array([50, 200, 150, 80, 500])
# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
output:
Product Price Stock
0 Laptop 1000 50
1 Phone 600 200
2 Tablet 300 150
3 Monitor 250 80
4 Keyboard 100 500
6. Perform appending, slicing, addition and deletion of rows with a Pandas
DataFrame.
Program:
import pandas as pd
# Create a simple DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
df = pd.DataFrame(data)
# Print the original DataFrame
print("Original DataFrame:")
print(df)
# 1. Appending a new row to the DataFrame
new_row = {'Name': 'Eve', 'Age': 29, 'City': 'Phoenix'}
df = df.append(new_row, ignore_index=True)
print("\nDataFrame after appending a new row:")
print(df)
# 2. Slicing the DataFrame (selecting specific rows)
sliced_df = df[1:3] # Selecting rows 1 and 2 (indexing starts from 0)
print("\nSliced DataFrame (rows 1 to 2):")
print(sliced_df)
# 3. Adding a new row with 'loc'
df.loc[len(df)] = ['Frank', 30, 'Dallas']
print("\nDataFrame after adding a new row with 'loc':")
print(df)
# 4. Deleting a row (deleting row with index 2)
df = df.drop(2)
print("\nDataFrame after deleting row with index 2:")
print(df)
output:
Original DataFrame:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
DataFrame after appending a new row:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
4 Eve 29 Phoenix
Sliced DataFrame (rows 1 to 2):
Name Age City
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
DataFrame after adding a new row with 'loc':
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
4 Eve 29 Phoenix
5 Frank 30 Dallas
DataFrame after deleting row with index 2:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
3 David 32 Houston
4 Eve 29 Phoenix
5 Frank 30 Dallas
7.i. Using Pandas, Create a DataFrame with a list of dictionaries, row indices,
and column indices.
Program:
import pandas as pd
# Create a list of dictionaries
data = [
{'Name': 'Alice', 'Age': 24, 'City': 'New York'},
{'Name': 'Bob', 'Age': 27, 'City': 'Los Angeles'},
{'Name': 'Charlie', 'Age': 22, 'City': 'Chicago'},
{'Name': 'David', 'Age': 32, 'City': 'Houston'}
# Define custom row indices and column indices
row_indices = ['A', 'B', 'C', 'D']
column_indices = ['Name', 'Age', 'City']
# Create the DataFrame
df = pd.DataFrame(data, index=row_indices, columns=column_indices)
# Print the DataFrame
print(df)
output:
Name Age City
A Alice 24 New York
B Bob 27 Los Angeles
C Charlie 22 Chicago
D David 32 Houston
ii. Use index label to delete or drop rows from a Pandas DataFrame.
Program:
import pandas as pd
# Create a simple DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
df = pd.DataFrame(data)
# Set custom row indices
df.index = ['A', 'B', 'C', 'D']
# Print the original DataFrame
print("Original DataFrame:")
print(df)
# 1. Drop a row by index label (e.g., drop row with index 'B')
df_dropped = df.drop('B')
print("\nDataFrame after dropping row with index 'B':")
print(df_dropped)
# 2. Drop multiple rows by index labels (e.g., drop rows with index 'A' and 'D')
df_dropped_multiple = df.drop(['A', 'D'])
print("\nDataFrame after dropping rows with index 'A' and 'D':")
print(df_dropped_multiple)
# 3. Drop a row in-place (this will modify the original DataFrame)
df.drop('C', inplace=True)
print("\nDataFrame after dropping row with index 'C' in-place:")
print(df)
output:
Original DataFrame:
Name Age City
A Alice 24 New York
B Bob 27 Los Angeles
C Charlie 22 Chicago
D David 32 Houston
DataFrame after dropping row with index 'B':
Name Age City
A Alice 24 New York
C Charlie 22 Chicago
D David 32 Houston
DataFrame after dropping rows with index 'A' and 'D':
Name Age City
B Bob 27 Los Angeles
C Charlie 22 Chicago
DataFrame after dropping row with index 'C' in-place:
Name Age City
A Alice 24 New York
B Bob 27 Los Angeles
D David 32 Houston
8.Using Pandas library,
i.Load the iris.CSV file
ii.Convert it into the data frame and read it .
iii.Display records only with species "Iris-setosa"
program:
import pandas as pd
# Step 1: Load the iris CSV file into a Pandas DataFrame
# Replace 'iris.csv' with the correct file path if necessary
df = pd.read_csv('iris.csv')
# Step 2: Display the entire DataFrame or the first few rows to ensure it's loaded correctly
print("First few records of the DataFrame:")
print(df.head())
# Step 3: Display only the records with species 'Iris-setosa'
setosa_df = df[df['species'] == 'Iris-setosa']
# Display the filtered DataFrame
print("\nRecords with species 'Iris-setosa':")
print(setosa_df)
output:
First few records of the DataFrame:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Records with species 'Iris-setosa':
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
...
9. Use the diabetes data set from UCI, Perform Univariate analysis.
Program:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Step 1: Load the diabetes dataset from the UCI repository
# You can replace this URL with the actual URL of the dataset or load it from a local file.
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-
diabetes.data.csv'
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
df = pd.read_csv(url, names=columns)
# Step 2: Check the first few rows of the dataset
print(df.head())
# Step 3: Summary statistics for numerical features
print("\nSummary Statistics:")
print(df.describe())
# Step 4: Visualizing the distribution of each feature (Univariate Analysis)
# Histograms for all features
df.hist(bins=20, figsize=(15,10))
plt.tight_layout()
plt.show()
# Step 5: Boxplots for all features to check for outliers
plt.figure(figsize=(15, 10))
sns.boxplot(data=df)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Step 6: Checking the distribution of 'Outcome' (Diabetes status)
sns.countplot(x='Outcome', data=df)
plt.title('Distribution of Outcome (Diabetes Status)')
plt.show()
output:
10.Use the diabetes data set from Pima Indians Diabetes , Perform Bivariate
analysis.
Program:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset from the UCI repository or local file
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-
diabetes.data.csv'
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
df = pd.read_csv(url, names=columns)
# Display first few rows of the dataset
print(df.head())
# Step 1: Correlation Heatmap to analyze relationships between numerical features
plt.figure(figsize=(10, 8))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap of Diabetes Dataset')
plt.show()
# Step 2: Scatter plots between features and target variable 'Outcome'
plt.figure(figsize=(15, 10))
# Plotting scatter plot for 'Glucose' vs 'Outcome'
plt.subplot(2, 3, 1)
sns.scatterplot(x='Glucose', y='Outcome', data=df)
plt.title('Glucose vs Outcome')
# Plotting scatter plot for 'BMI' vs 'Outcome'
plt.subplot(2, 3, 2)
sns.scatterplot(x='BMI', y='Outcome', data=df)
plt.title('BMI vs Outcome')
# Plotting scatter plot for 'Age' vs 'Outcome'
plt.subplot(2, 3, 3)
sns.scatterplot(x='Age', y='Outcome', data=df)
plt.title('Age vs Outcome')
# Plotting scatter plot for 'Insulin' vs 'Outcome'
plt.subplot(2, 3, 4)
sns.scatterplot(x='Insulin', y='Outcome', data=df)
plt.title('Insulin vs Outcome')
# Plotting scatter plot for 'BloodPressure' vs 'Outcome'
plt.subplot(2, 3, 5)
sns.scatterplot(x='BloodPressure', y='Outcome', data=df)
plt.title('BloodPressure vs Outcome')
# Plotting scatter plot for 'Pregnancies' vs 'Outcome'
plt.subplot(2, 3, 6)
sns.scatterplot(x='Pregnancies', y='Outcome', data=df)
plt.title('Pregnancies vs Outcome')
plt.tight_layout()
plt.show()
# Step 3: Pairplot to visualize the relationships between multiple features and 'Outcome'
sns.pairplot(df, hue='Outcome', diag_kind='hist', markers=["o", "s"])
plt.suptitle('Pairplot of Features with Outcome', y=1.02)
plt.show()
output:
11.Perform Multiple Regression analysis on your own dataset ( For example,
Car dataset with information Company Name, Model, Volume, Weight, CO2)
with more than one independent value to predict a value based on two or
more variable.
Program:
# Import necessary libraries
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Step 1: Create or Load your Dataset
# Sample data representing car information
data = {
'Company Name': ['Toyota', 'Honda', 'Ford', 'BMW', 'Audi'],
'Model': ['Corolla', 'Civic', 'Focus', 'X5', 'A4'],
'Volume': [1.8, 2.0, 1.5, 3.0, 2.5], # Engine volume in liters
'Weight': [1300, 1200, 1400, 2000, 1800], # Weight in kilograms
'CO2': [120, 110, 140, 200, 180] # CO2 emissions in grams per km
# Convert to DataFrame
df = pd.DataFrame(data)
# Step 2: Preprocess the Data
# Since we are predicting CO2 based on Volume and Weight, we can drop 'Company Name' and
'Model' for now
df = df.drop(columns=['Company Name', 'Model'])
# Independent variables (Volume, Weight)
X = df[['Volume', 'Weight']]
# Dependent variable (CO2)
y = df['CO2']
# Step 3: Add a constant to the independent variables (for intercept)
X = sm.add_constant(X)
# Step 4: Perform Multiple Regression using statsmodels
model = sm.OLS(y, X).fit()
# Step 5: Display the summary of the regression analysis
print("Multiple Regression Analysis Summary (statsmodels):")
print(model.summary())
# Step 6: Perform Multiple Regression using scikit-learn
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['Volume', 'Weight']], df['CO2'], test_size=0.2,
random_state=42)
# Initialize the Linear Regression model
regressor = LinearRegression()
# Train the model
regressor.fit(X_train, y_train)
# Predict on the test set
y_pred = regressor.predict(X_test)
# Step 7: Evaluate the model
print("\nMultiple Regression Analysis using scikit-learn:")
print(f"Coefficients: {regressor.coef_}")
print(f"Intercept: {regressor.intercept_}")
# Calculate R-squared value and Mean Squared Error (MSE)
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
print(f"R-squared: {r2}")
print(f"Mean Squared Error: {mse}")
# Step 8: Plotting the results
plt.scatter(y_test, y_pred)
plt.xlabel("Actual CO2")
plt.ylabel("Predicted CO2")
plt.title("Actual vs Predicted CO2")
plt.show()
output:
12.Perform Bivariate analysis using the pandas DataFrame that contains
information about two variables: (1) Hours spent studying and (2) Exam score
received by 20 different students
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
# Step 1: Create the DataFrame
data = {
'Hours Studying': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'Exam Score': [35, 40, 50, 60, 65, 70, 75, 80, 85, 88, 90, 92, 94, 95, 96, 98, 99, 99, 100, 100]
# Convert the dictionary to a pandas DataFrame
df = pd.DataFrame(data)
# Step 2: Descriptive Statistics
print("Descriptive Statistics:")
print(df.describe())
# Step 3: Calculate Correlation
correlation, _ = pearsonr(df['Hours Studying'], df['Exam Score'])
print(f"\nCorrelation between Hours Studying and Exam Score: {correlation:.2f}")
# Step 4: Scatter Plot
plt.figure(figsize=(8, 6))
plt.scatter(df['Hours Studying'], df['Exam Score'], color='blue', label='Data Points')
plt.title('Hours Studying vs Exam Score')
plt.xlabel('Hours Studying')
plt.ylabel('Exam Score')
plt.grid(True)
plt.legend()
plt.show()
# Step 5: Linear Regression Line (Fit a regression line)
sns.regplot(x='Hours Studying', y='Exam Score', data=df, scatter_kws={'color':'blue'},
line_kws={'color':'red'})
plt.title('Linear Regression Line: Hours Studying vs Exam Score')
plt.xlabel('Hours Studying')
plt.ylabel('Exam Score')
plt.show()
output:
13 . Perform Univariate analysis with the following pandas DataFrame 'points':
[1, 1, 2, 3.5, 4, 4, 4, 5, 5, 6.5, 7, 7.4, 8, 13, 14.2] 'assists': [5, 7, 7, 9, 12, 9, 9, 4, 6,
8, 8, 9, 3, 2, 6] 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 6, 6, 7, 8, 7, 9, 15].
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Step 1: Create the DataFrame
data = {
'points': [1, 1, 2, 3.5, 4, 4, 4, 5, 5, 6.5, 7, 7.4, 8, 13, 14.2],
'assists': [5, 7, 7, 9, 12, 9, 9, 4, 6, 8, 8, 9, 3, 2, 6],
'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 6, 6, 7, 8, 7, 9, 15]
# Convert the dictionary to a pandas DataFrame
df = pd.DataFrame(data)
# Step 2: Descriptive Statistics for each column
print("Descriptive Statistics:")
print(df.describe())
# Step 3: Visualizing the Distribution of each variable
# Plot histograms for each variable
plt.figure(figsize=(12, 6))
# Histogram for 'points'
plt.subplot(1, 3, 1)
sns.histplot(df['points'], kde=True, color='blue', bins=10)
plt.title('Distribution of Points')
plt.xlabel('Points')
plt.ylabel('Frequency')
# Histogram for 'assists'
plt.subplot(1, 3, 2)
sns.histplot(df['assists'], kde=True, color='green', bins=10)
plt.title('Distribution of Assists')
plt.xlabel('Assists')
plt.ylabel('Frequency')
# Histogram for 'rebounds'
plt.subplot(1, 3, 3)
sns.histplot(df['rebounds'], kde=True, color='red', bins=10)
plt.title('Distribution of Rebounds')
plt.xlabel('Rebounds')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
# Step 4: Box plots to visualize outliers
plt.figure(figsize=(12, 6))
# Box plot for 'points'
plt.subplot(1, 3, 1)
sns.boxplot(y=df['points'], color='blue')
plt.title('Boxplot of Points')
# Box plot for 'assists'
plt.subplot(1, 3, 2)
sns.boxplot(y=df['assists'], color='green')
plt.title('Boxplot of Assists')
# Box plot for 'rebounds'
plt.subplot(1, 3, 3)
sns.boxplot(y=df['rebounds'], color='red')
plt.title('Boxplot of Rebounds')
plt.tight_layout()
plt.show()
# Step 5: Skewness and Kurtosis
from scipy.stats import skew, kurtosis
# Skewness and Kurtosis for 'points'
points_skew = skew(df['points'])
points_kurt = kurtosis(df['points'])
# Skewness and Kurtosis for 'assists'
assists_skew = skew(df['assists'])
assists_kurt = kurtosis(df['assists'])
# Skewness and Kurtosis for 'rebounds'
rebounds_skew = skew(df['rebounds'])
rebounds_kurt = kurtosis(df['rebounds'])
print("\nSkewness and Kurtosis:")
print(f"Points: Skewness = {points_skew:.2f}, Kurtosis = {points_kurt:.2f}")
print(f"Assists: Skewness = {assists_skew:.2f}, Kurtosis = {assists_kurt:.2f}")
print(f"Rebounds: Skewness = {rebounds_skew:.2f}, Kurtosis = {rebounds_kurt:.2f}")
output:
14. i) Using various functions in numpy library, mathematically calculate the
values for a normal distribution and create Histograms to plot the probability
distribution curve.
Program:
import numpy as np
import matplotlib.pyplot as plt
# Step 1: Parameters for the normal distribution
mu = 0 # Mean of the distribution
sigma = 1 # Standard deviation
size = 10000 # Number of data points to generate
# Step 2: Generate random samples from a normal distribution
data = np.random.normal(mu, sigma, size)
# Step 3: Plot the histogram
plt.figure(figsize=(10, 6))
count, bins, ignored = plt.hist(data, bins=30, density=True, alpha=0.6, color='g')
# Step 4: Calculate the Probability Density Function (PDF)
# Define the normal distribution function
def normal_distribution(x, mu, sigma):
return (1/np.sqrt(2 * np.pi * sigma**2)) * np.exp(-0.5 * ((x - mu) / sigma)**2)
# Step 5: Generate points for the normal distribution curve
x_values = np.linspace(min(bins), max(bins), 100)
pdf_values = normal_distribution(x_values, mu, sigma)
# Step 6: Plot the PDF curve over the histogram
plt.plot(x_values, pdf_values, 'k', linewidth=2)
plt.title("Normal Distribution with Histogram")
plt.xlabel("Data points")
plt.ylabel("Density")
plt.grid(True)
plt.show()
output:
14.ii) Using plt.contour(), plt.contourf(), plt.imshow(), plt.colorbar(), plt.clabel()
functions visualize a contour plot.
Program:
import numpy as np
import matplotlib.pyplot as plt
# Create some sample data
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(X**2 + Y**2) / (X**2 + Y**2)
# Create a contour plot
plt.contour(X, Y, Z, levels=20, cmap='viridis')
# Create a filled contour plot
plt.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.7)
# Add a colorbar
plt.colorbar()
# Add labels to the contour lines
plt.clabel(plt.contour(X, Y, Z, levels=20, colors='k'), inline=True, fontsize=10)
# Display the plot
plt.show()
output:
15 Make a three-dimensional plot with randomly generate 50 data points for x,
y, and z. Set the point color as red, and size of the point as 50.
Program:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
# Generate 50 random data points for x, y, and z
np.random.seed(42) # Set a seed for reproducibility
x = np.random.rand(50) * 10
y = np.random.rand(50) * 10
z = np.random.rand(50) * 10
# Create a 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot the points with specified color and size
ax.scatter(x, y, z, c='red', s=50)
# Set labels for axes
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
# Show the plot
plt.show()
output: