DECISION TREE
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
import matplotlib.pyplot as plt
# Load the dataset
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny',
'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild',
'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal',
'Normal', 'Normal', 'High', 'Normal', 'High'],
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak',
'Strong', 'Strong', 'Weak', 'Strong'],
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
# Convert the dictionary to a DataFrame
df = pd.DataFrame(data)
# Convert categorical variables to numerical using one-hot encoding
df = pd.get_dummies(df, columns=['Outlook', 'Temperature', 'Humidity', 'Wind'])
# Separate features and target variable
X = df.drop('PlayTennis', axis=1)
y = df['PlayTennis']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize Decision Tree classifier
decision_tree = DecisionTreeClassifier()
# Train the model
decision_tree.fit(X_train, y_train)
# Make predictions on the testing set
y_pred = decision_tree.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Print confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
# Print classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# Convert feature names Index to a list
feature_names = X.columns.tolist()
# Plot the decision tree
plt.figure(figsize=(12, 8))
plot_tree(decision_tree, feature_names=feature_names, class_names=['No', 'Yes'], filled=True)
plt.show()
Output
Accuracy: 1.0
Confusion Matrix:
[[1 0]
[0 2]]
Classification Report:
Precision recall f1-score support
No 1.00 1.00 1.00 1
Yes 1.00 1.00 1.00 2
accuracy 1.00 3
macro avg 1.00 1.00 1.00 3
weighted avg 1.00 1.00 1.00 3
Step by Step Explanation
1. Import Necessary Libraries:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
import matplotlib.pyplot as plt
Explanation:
pandas: Library for data manipulation and analysis.
train_test_split: Function to split the dataset into training and testing sets.
DecisionTreeClassifier: Class for decision tree classification model.
plot_tree: Function to visualize the decision tree.
confusion_matrix, accuracy_score, classification_report: Functions to evaluate the model's
performance.
matplotlib.pyplot: Library for plotting graphs.
2. Load the Dataset:
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny',
'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild',
'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal',
'Normal', 'Normal', 'High', 'Normal', 'High'],
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak',
'Strong', 'Strong', 'Weak', 'Strong'],
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
Df=pd.DataFrame(data)
Explanation:
We define a dictionary containing the "Play Tennis" dataset.
Then we convert this dictionary to a pandas DataFrame.
3. Data Preprocessing:
df = pd.get_dummies(df, columns=['Outlook', 'Temperature', 'Humidity', 'Wind'])
X = df.drop('PlayTennis', axis=1)
y = df['PlayTennis']
Explanation:
We use one-hot encoding to convert categorical variables into numerical format.
X contains the features, and y contains the target variable.
4. Split Data into Training and Testing Sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Explanation:
We split the dataset into training and testing sets using train_test_split function.
We use 80% of the data for training and 20% for testing.
5. Initialize and Train Decision Tree Model:
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, y_train)
Explanation:
We initialize a DecisionTreeClassifier object.
Then we train the model using the training data.
6. Make Predictions and Evaluate Model:
y_pred = decision_tree.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
Explanation:
We make predictions on the testing data using predict method.
Then we calculate accuracy using accuracy_score.
We also compute the confusion matrix and classification report.
7. Print Model Evaluation Metrics:
print("Accuracy:", accuracy)
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)
Explanation:
We print the accuracy, confusion matrix, and classification report to evaluate the
model's performance.
8. Plot the Decision Tree:
plt.figure(figsize=(12, 8))
plot_tree(decision_tree, feature_names=X.columns, class_names=['No', 'Yes'], filled=True)
plt.show()
Explanation:
Finally, we plot the decision tree using plot_tree function to visualize the model's
decision-making process.
We specify feature names and class names for better interpretation of the tree.
**************************