Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views18 pages

Summary of SkLearn

This is a quick summary for SKlearn library.

Uploaded by

alex.soufi59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views18 pages

Summary of SkLearn

This is a quick summary for SKlearn library.

Uploaded by

alex.soufi59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Sci-kit Learn

Summary
Ch 1. Basic Concepts.....................................................................................................................................................4
Ch 1.1 Loading Data............................................................................................................................................................................. 4
sklearn.datasets (Dataset Loading & Fetching)............................................................................................................................4

sklearn.model_selection (Data Splitting).....................................................................................................................................4

Ch 1.2 Data Preprocessing...................................................................................................................................................................5


sklearn.preprocessing (Feature Scaling & Transformation).........................................................................................................5

Ch 1.3 Feature Selection...................................................................................................................................................................... 5


sklearn.feature_selection (Feature Selection Methods).............................................................................................................5

Ch 1.4 Model Training..........................................................................................................................................................................5


sklearn.linear_model (Regression & Classification Models).........................................................................................................5

sklearn.ensemble (Ensemble Models).........................................................................................................................................6

Ch 1.5 Model Evaluation......................................................................................................................................................................6


sklearn.metrics (Performance Metrics)........................................................................................................................................6

Ch 2. Clustering............................................................................................................................................................6
Ch 2.1 Clustering KMeans....................................................................................................................................................................6
sklearn.cluster (K-Means Clustering)............................................................................................................................................6

Other Common Methods in KMeans:.............................................................................................................................................7


Ch 2.2 Clustering MeanShift................................................................................................................................................................ 7
sklearn.cluster (MeanShift Clustering).........................................................................................................................................7

Other Common Methods in MeanShift:..........................................................................................................................................7


Ch 2.3 Clustering DBSCAN................................................................................................................................................................... 7
sklearn.cluster (DBSCAN Clustering)............................................................................................................................................7

Other Common Methods in DBSCAN:.............................................................................................................................................8


Ch 2.4 Clustering GMM........................................................................................................................................................................8
sklearn.mixture (Gaussian Mixture Model Clustering).................................................................................................................8

Other Common Methods in GMM:.................................................................................................................................................8

Ch 3. Classifying............................................................................................................................................................9
Ch 3.1 Classifying KNN......................................................................................................................................................................... 9
sklearn.neighbors (K-Nearest Neighbors Classification)...............................................................................................................9

Other Common Methods in KNN:...................................................................................................................................................9


Ch 3.2 Classifying Naive Bayes.............................................................................................................................................................9
sklearn.naive_bayes (Naive Bayes Classification)........................................................................................................................9

Other Common Methods in Naive Bayes:.....................................................................................................................................10


Ch 3.3 Classifying Logistic Reg........................................................................................................................................................... 10
sklearn.linear_model (Logistic Regression)................................................................................................................................10

Other Common Methods in Logistic Regression:..........................................................................................................................10


Ch 3.4 Classifying SVM.......................................................................................................................................................................10
sklearn.svm (Support Vector Machine Classification).................................................................................................................10
Other Common Methods in SVM:.................................................................................................................................................11
Ch 3.5 Classifying Decision Tree.........................................................................................................................................................11
sklearn.tree (Decision Tree Classification)...................................................................................................................................11

Other Common Methods in Decision Tree:...................................................................................................................................11


Ch 3.6 Classifying MLP....................................................................................................................................................................... 12
sklearn.neural_network (Multi-Layer Perceptron Classification)...............................................................................................12

Other Common Methods in MLP..................................................................................................................................................12

Ch 4. Regression.........................................................................................................................................................12
Ch 4.1 Regression KNN...................................................................................................................................................................... 12
sklearn.neighbors (K-Nearest Neighbors Regression)...................................................................................................................12
Other Common Methods in KNN Regression:...............................................................................................................................13
Ch 4.2 Regression LR..........................................................................................................................................................................13
sklearn.linear_model (Linear Regression).....................................................................................................................................13
Other Common Methods in Linear Regression:............................................................................................................................13
Ch 4.3 Regression SVM...................................................................................................................................................................... 13
sklearn.svm (Support Vector Machine Regression).......................................................................................................................13
Other Common Methods in SVM Regression:...............................................................................................................................14
Ch 4.4 Regression Decision Tree........................................................................................................................................................14
Ch 4.5 Regression MLP...................................................................................................................................................................... 14

Ch 5. Dimensionality Reduction..................................................................................................................................14
Ch 1. Basic Concepts
Ch 1.1 Loading Data
sklearn.datasets (Dataset Loading & Fetching)
To work with datasets, import: import sklearn.datasets as ds

 data = ds.load_iris()
o Returns: A dictionary-like dataset with .data, .target, and .feature_names.
 data = ds.load_digits()
o Returns: A dataset for handwritten digit recognition.
 data = ds.load_wine()
o Returns: A dataset for wine classification.
 data = ds.load_breast_cancer()
o Returns: A dataset for diagnosing breast cancer.
 data = ds.fetch_openml(name, version)
o Returns: A dataset from OpenML in Pandas or NumPy format.
o Parameters:
 name (str) – Name of the dataset.
 version (int, default=None) – Version of the dataset (if multiple exist).
 X, y = ds.make_classification(n_samples, n_features, n_classes)
o Returns: A synthetic dataset for classification.
o Parameters:
 n_samples (int, default=100) – Number of generated samples.
 n_features (int, default=20) – Total number of input features.
 n_classes (int, default=2) – Number of distinct target classes.

 Example:
iris = ds.load_iris()

sklearn.model_selection (Data Splitting)


To split datasets, import: import sklearn.model_selection as ms

 X_train, X_test, y_train, y_test = ms.train_test_split(X, y, test_size, random_state)


o Returns: Training and test sets.
o Parameters:
 test_size (float, default=0.25) – Proportion of data for testing.
 random_state (int, default=None) – Ensures reproducibility.

 Example:
X_train, X_test, y_train, y_test = ms.train_test_split(X, y, test_size=0.2, random_state=42)

Ch 1.2 Data Preprocessing


sklearn.preprocessing (Feature Scaling & Transformation)
To preprocess data, import: import sklearn.preprocessing as pp

 scaler = pp.StandardScaler()
o Returns: A StandardScaler instance for normalizing features.

 X_scaled = scaler.fit_transform(X)

o Returns: The scaled feature matrix.

 encoder = pp.OneHotEncoder()

o Returns: An encoder instance for categorical variables.

 X_encoded = encoder.fit_transform(X_categorical)

o Returns: A transformed sparse matrix representing one-hot encoded categorical values.


o Parameters:
 X_categorical (array-like) – Input categorical features to encode.

 Example:
X_scaled = pp.StandardScaler().fit_transform(X)

Ch 1.3 Feature Selection


sklearn.feature_selection (Feature Selection Methods)
To select important features, import: import sklearn.feature_selection as fs

 selector = fs.SelectKBest(score_func, k)

o Returns: A selector instance that picks the top k features.


o Parameters:
 score_func (callable, e.g., f_classif) – Scoring function to evaluate features.
 k (int, default=10) – Number of top features to select.

 X_selected = selector.fit_transform(X, y)

o Returns: The dataset with selected features.

 Example:
X_selected = fs.SelectKBest(fs.f_classif, k=5).fit_transform(X, y)

Ch 1.4 Model Training


sklearn.linear_model (Regression & Classification Models)
To train models, import: import sklearn.linear_model as lm

 model = lm.LogisticRegression()

o Returns: A logistic regression model instance.

 model.fit(X_train, y_train)

o Returns: None. Trains the model.


sklearn.ensemble (Ensemble Models)
To ensemble models, import:import sklearn.ensemble as en

 model = en.RandomForestClassifier(n_estimators)
o Returns: A random forest classifier.
o Parameters:
 n_estimators (int, default=100) – Number of trees in the forest.

 Example:
model = lm.LogisticRegression().fit(X_train, y_train)

Ch 1.5 Model Evaluation


sklearn.metrics (Performance Metrics)
To evaluate models, import: import sklearn.metrics as mt

 score = mt.accuracy_score(y_true, y_pred)

o Returns: The accuracy of classification.


o Parameters:
 y_true (array-like) – True labels.
 y_pred (array-like) – Predicted labels.

 cm = mt.confusion_matrix(y_true, y_pred)

o Returns: The confusion matrix.


o Parameters:
 y_true (array-like) – True labels.
 y_pred (array-like) – Predicted labels.

 r2 = mt.r2_score(y_true, y_pred)

o Returns: The R² score for regression.

 Example:
score = mt.accuracy_score(y_test, model.predict(X_test))
Ch 2. Clustering
Ch 2.1 Clustering KMeans
sklearn.cluster (K-Means Clustering)
To use KMeans, first import: import sklearn.cluster as cl

 kmn = cl.KMeans(n_clusters, init, n_init, max_iter, random_state)


o Returns: A KMeans clustering model instance.
o Parameters:
 n_clusters (int, default=8) – The number of clusters to form.
 init ({'k-means++', 'random'} or ndarray, default='k-means++') – Initialization method for cluster centers.
 n_init (int, default=10) – Number of times KMeans runs with different centroid seeds.
 max_iter (int, default=300) – Maximum iterations per run.
 random_state (int, default=None) – Controls random number generation for centroid initialization.

 Exmpl:kmn = cl.KMeans(n_clusters=3, init='k-means++', n_init=10, max_iter=300, random_state=42)

Other Common Methods in KMeans:


 kmn.fit(X) – Trains the KMeans model.
 labels = kmn.predict(X) – Assigns clusters to data points.

 labels = kmn.fit_predict(X) – Directly assigns cluster labels while training.

 labels = kmn.labels_ – Retrieves assigned cluster labels.

 centroids = kmn.cluster_centers_ – Retrieves centroid coordinates.

 inertia = kmn.inertia_ – Measures clustering performance via squared distances.

Ch 2.2 Clustering MeanShift


sklearn.cluster (MeanShift Clustering)
To use MeanShift, first import: import sklearn.cluster as cl

 ms = cl.MeanShift(bandwidth, bin_seeding, cluster_all)


o Returns: A MeanShift clustering model instance.
o Parameters:
 bandwidth (float, default=None) – Kernel size controlling cluster granularity.
 bin_seeding (bool, default=False) – If True, uses initial bins for seed selection.
 cluster_all (bool, default=True) – If True, assigns all points to a cluster.

 Example: ms = cl.MeanShift(bandwidth=2.0, bin_seeding=True, cluster_all=True)

Other Common Methods in MeanShift:


 ms.fit(X) – Trains the MeanShift model.
 labels = ms.predict(X) – Predicts cluster assignments.

 labels = ms.fit_predict(X) – Assigns cluster labels during training.


 centroids = ms.cluster_centers_ – Retrieves estimated cluster centers

Ch 2.3 Clustering DBSCAN


sklearn.cluster (DBSCAN Clustering)
To use DBSCAN, first import: import sklearn.cluster as cl

 db = cl.DBSCAN(eps, min_samples, metric, algorithm)


o Returns: A DBSCAN clustering model instance.
o Parameters:
 eps (float, default=0.5) – Maximum distance for points to be in the same cluster.
 min_samples (int, default=5) – Minimum points required to form a dense region.
 metric (str, default='euclidean') – Distance metric to compute point similarity.
 algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto') – Algorithm used for nearest neighbor search.

 Example: db = cl.DBSCAN(eps=0.3, min_samples=10, metric='euclidean', algorithm='auto')

Other Common Methods in DBSCAN:


 db.fit(X) – Trains the DBSCAN model.
 labels = db.fit_predict(X) – Assigns cluster labels during training.

 labels = db.labels_ – Retrieves assigned cluster labels.

 core_samples = db.core_sample_indices_ – Gets indices of core points.

Ch 2.4 Clustering GMM


sklearn.mixture (Gaussian Mixture Model Clustering)
To use GMM, first import: import sklearn.mixture as mix

 gmm = mix.GaussianMixture(n_components, covariance_type, tol, max_iter, random_state)


o Returns: A Gaussian Mixture Model clustering instance.
o Parameters:
 n_components (int, default=1) – Number of mixture components (clusters).
 covariance_type ({'full', 'tied', 'diag', 'spherical'}, default='full') – Specifies the form of the covariance matrix.
 tol (float, default=1e-3) – Convergence threshold.
 max_iter (int, default=100) – Maximum iterations for Expectation-Maximization (EM).
 random_state (int, default=None) – Controls random number generation.

 Example:
gmm = mix.GaussianMixture(n_components=3, covariance_type='full', max_iter=200, random_state=42)

Other Common Methods in GMM:


 gmm.fit(X) – Trains the Gaussian Mixture Model.
 labels = gmm.predict(X) – Assigns cluster labels to data points.

 probs = gmm.predict_proba(X) – Returns probabilities of each point belonging to a cluster.

 means = gmm.means_ – Retrieves cluster means.

 covariances = gmm.covariances_ – Retrieves covariance matrices for clusters.


Ch 3. Classifying
Ch 3.1 Classifying KNN
sklearn.neighbors (K-Nearest Neighbors Classification)
To use KNN, first import: import sklearn.neighbors as nb

 knn = nb.KNeighborsClassifier(n_neighbors, weights, algorithm, metric)


o Returns: A KNN classification model instance.
o Parameters:
 n_neighbors (int, default=5) – Number of nearest neighbors to consider.
 weights ({'uniform', 'distance'} or callable, default='uniform') – Determines how neighbors are weighted.
 algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto') – Algorithm used to compute nearest neighbors.
 metric (str, default='minkowski') – Distance metric for neighbor calculation.

 Example:
knn = nb.KNeighborsClassifier(n_neighbors=3, weights='distance', algorithm='auto', metric='euclidean')

Other Common Methods in KNN:


 knn.fit(X_train, y_train) – Trains the KNN model.
 y_pred = knn.predict(X_test) – Predicts class labels for new data.

 probs = knn.predict_proba(X_test) – Returns class probabilities for each sample.

 score = knn.score(X_test, y_test) – Returns the model accuracy.

Ch 3.2 Classifying Naive Bayes


sklearn.naive_bayes (Naive Bayes Classification)
To use Naive Bayes, first import: import sklearn.naive_bayes as nb

 nbc = nb.GaussianNB()
o Returns: A Gaussian Naive Bayes classifier instance.

 Example:
nbc = nb.GaussianNB()

 nbc = nb.MultinomialNB(alpha, fit_prior)


o Returns: A multinomial Naive Bayes classifier for discrete features.
o Parameters:
 alpha (float, default=1.0) – Additive smoothing parameter.
 fit_prior (bool, default=True) – Whether to learn class priors from the data.

 Example:
nbc = nb.MultinomialNB(alpha=0.5, fit_prior=True)

Other Common Methods in Naive Bayes:


 nbc.fit(X_train, y_train) – Trains the Naive Bayes model.
 y_pred = nbc.predict(X_test) – Predicts class labels.

 probs = nbc.predict_proba(X_test) – Returns class probabilities.

 score = nbc.score(X_test, y_test) – Computes model accuracy.

Ch 3.3 Classifying Logistic Reg


sklearn.linear_model (Logistic Regression)
To use Logistic Regression, first import: import sklearn.linear_model as lm

 lr = lm.LogisticRegression(penalty, C, solver, max_iter, random_state)


o Returns: A Logistic Regression model instance.
o Parameters:
 penalty ({'l1', 'l2', 'elasticnet', 'none'}, default='l2') – Regularization type.
 C (float, default=1.0) – Inverse of regularization strength.
 solver ({'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, default='lbfgs') – Algorithm for optimization.
 max_iter (int, default=100) – Maximum number of iterations.
 random_state (int, default=None) – Controls random number generation.

 Example:
lr = lm.LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', max_iter=200, random_state=42)

Other Common Methods in Logistic Regression:


 lr.fit(X_train, y_train) – Trains the Logistic Regression model.
 y_pred = lr.predict(X_test) – Predicts class labels.

 probs = lr.predict_proba(X_test) – Returns class probabilities.

 coefficients = lr.coef_ – Retrieves learned feature coefficients.

 score = lr.score(X_test, y_test) – Computes model accuracy.

Ch 3.4 Classifying SVM


sklearn.svm (Support Vector Machine Classification)
To use SVM, first import: import sklearn.svm as svm

 svm_model = svm.SVC(C, kernel, gamma, degree, probability, random_state)


o Returns: An SVM classifier instance.
o Parameters:
 C (float, default=1.0) – Regularization strength (higher values reduce misclassification).
 kernel ({'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'}, default='rbf') – Kernel function used in decision boundary.
 gamma ({'scale', 'auto'} or float, default='scale') – Kernel coefficient for non-linear models.
 degree (int, default=3) – Degree of the polynomial kernel (used only if kernel='poly').
 probability (bool, default=False) – If True, enables probability estimates.
 random_state (int, default=None) – Controls random number generation.

 Example:
svm_model = svm.SVC(C=1.0, kernel='rbf', gamma='scale', probability=True, random_state=42)

Other Common Methods in SVM:


 svm_model.fit(X_train, y_train) – Trains the SVM model.
 y_pred = svm_model.predict(X_test) – Predicts class labels.

 probs = svm_model.predict_proba(X_test) – Returns class probabilities (only if probability=True).

 score = svm_model.score(X_test, y_test) – Computes model accuracy.

Ch 3.5 Classifying Decision Tree


sklearn.tree (Decision Tree Classification)
To use Decision Trees, first import: import sklearn.tree as tr

 dt = tr.DecisionTreeClassifier(criterion, max_depth, min_samples_split, random_state)


o Returns: A Decision Tree classifier instance.
o Parameters:
 criterion ({'gini', 'entropy', 'log_loss'}, default='gini') – Function used to measure the quality of splits.
 max_depth (int, default=None) – Maximum depth of the tree.
 min_samples_split (int or float, default=2) – Minimum number of samples required to split an internal node.
 random_state (int, default=None) – Controls random number generation for reproducibility.

 Example:
dt = tr.DecisionTreeClassifier(criterion='entropy', max_depth=5, min_samples_split=4, random_state=42)

Other Common Methods in Decision Tree:


 dt.fit(X_train, y_train) – Trains the Decision Tree model.
 y_pred = dt.predict(X_test) – Predicts class labels.

 probs = dt.predict_proba(X_test) – Returns class probabilities.

 score = dt.score(X_test, y_test) – Computes model accuracy.

 feature_importance = dt.feature_importances_ – Retrieves feature importance scores.

Ch 3.6 Classifying MLP


sklearn.neural_network (Multi-Layer Perceptron Classification)
To use MLP, first import: import sklearn.neural_network as nn

 mlp = nn.MLPClassifier(hidden_layer_sizes, activation, solver, alpha, learning_rate, max_iter,


random_state)
o Returns: An MLP classifier instance.
o Parameters:
 hidden_layer_sizes (tuple, default=(100,)) – Number of neurons in hidden layers.
 activation ({'identity', 'logistic', 'tanh', 'relu'}, default='relu') – Activation function for hidden layers.
 solver ({'lbfgs', 'sgd', 'adam'}, default='adam') – Optimization algorithm for weight updates.
 alpha (float, default=0.0001) – L2 regularization parameter.
 learning_rate ({'constant', 'invscaling', 'adaptive'}, default='constant') – Learning rate schedule.
 max_iter (int, default=200) – Maximum number of training iterations.
 random_state (int, default=None) – Controls random number generation.

 Example:
mlp = nn.MLPClassifier(hidden_layer_sizes=(50, 50), activation='relu', solver='adam', max_iter=300,
random_state=42)
Other Common Methods in MLP
 mlp.fit(X_train, y_train) – Trains the MLP model.
 y_pred = mlp.predict(X_test) – Predicts class labels.

 probs = mlp.predict_proba(X_test) – Returns class probabilities.

 score = mlp.score(X_test, y_test) – Computes model accuracy.

 loss_curve = mlp.loss_curve_ – Retrieves loss values over training iterations.


Ch 4. Regression
Ch 4.1 Regression KNN
sklearn.neighbors (K-Nearest Neighbors Regression)
To use KNN for regression, first import: import sklearn.neighbors as nb

 knn_reg = nb.KNeighborsRegressor(n_neighbors, weights, algorithm, metric)


o Returns: A KNN regression model instance.
o Parameters:
 n_neighbors (int, default=5) – Number of nearest neighbors to consider.
 weights ({'uniform', 'distance'} or callable, default='uniform') – Determines how neighbors are weighted.
 algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto') – Algorithm used to compute nearest neighbors.
 metric (str, default='minkowski') – Distance metric for neighbor calculation.

 Example:
knn_reg = nb.KNeighborsRegressor(n_neighbors=3, weights='distance', algorithm='auto', metric='euclidean')

Other Common Methods in KNN Regression:


 knn_reg.fit(X_train, y_train) – Trains the KNN regressor.
 y_pred = knn_reg.predict(X_test) – Predicts target values.

 score = knn_reg.score(X_test, y_test) – Computes R2R^2R2 regression score.

Ch 4.2 Regression LR
sklearn.linear_model (Linear Regression)
To use Linear Regression, first import: import sklearn.linear_model as lm

 lr_reg = lm.LinearRegression(fit_intercept, normalize, copy_X, n_jobs)


o Returns: A Linear Regression model instance.
o Parameters:
 fit_intercept (bool, default=True) – If False, forces model to pass through the origin.
 normalize (bool, default=False) – If True, normalizes features before training (deprecated).
 copy_X (bool, default=True) – If False, allows modifying X during fitting.
 n_jobs (int, default=None) – Number of parallel computations (if None, uses one core).

 Example:
lr_reg = lm.LinearRegression(fit_intercept=True, copy_X=True, n_jobs=-1)

Other Common Methods in Linear Regression:


 lr_reg.fit(X_train, y_train) – Trains the Linear Regression model.
 y_pred = lr_reg.predict(X_test) – Predicts target values.

 coefficients = lr_reg.coef_ – Retrieves learned feature coefficients.

 intercept = lr_reg.intercept_ – Retrieves the model intercept.

 score = lr_reg.score(X_test, y_test) – Computes R2R^2R2 regression score.


Ch 4.3 Regression SVM
sklearn.svm (Support Vector Machine Regression)
To use SVM for regression, first import: import sklearn.svm as svm

 svr = svm.SVR(kernel, C, epsilon, gamma, degree)


o Returns: An SVM regression model instance.
o Parameters:
 kernel ({'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'}, default='rbf') – Kernel function used for regression.
 C (float, default=1.0) – Regularization strength (higher values reduce misclassification).
 epsilon (float, default=0.1) – Defines margin within which no penalty is given.
 gamma ({'scale', 'auto'} or float, default='scale') – Kernel coefficient for non-linear models.
 degree (int, default=3) – Degree of the polynomial kernel (used only if kernel='poly').

 Example:
svr = svm.SVR(kernel='rbf', C=1.0, epsilon=0.1, gamma='scale')

Other Common Methods in SVM Regression:


 svr.fit(X_train, y_train) – Trains the SVR model.
 y_pred = svr.predict(X_test) – Predicts target values.

 score = svr.score(X_test, y_test) – Computes R2R^2R2 regression score.

Ch 4.4 Regression Decision Tree


sklearn.tree (Decision Tree Regression)
To use Decision Trees for regression, first import: import sklearn.tree as tr

 dt_reg = tr.DecisionTreeRegressor(criterion, max_depth, min_samples_split, random_state)


o Returns: A Decision Tree regression model instance.
o Parameters:
 criterion ({'squared_error', 'friedman_mse', 'absolute_error', 'poisson'}, default='squared_error') – Function
used to measure the quality of a split.
 max_depth (int, default=None) – Maximum depth of the tree.
 min_samples_split (int or float, default=2) – Minimum number of samples required to split an internal node.
 random_state (int, default=None) – Controls random number generation for reproducibility.

 Example:
dt_reg = tr.DecisionTreeRegressor(criterion='squared_error', max_depth=5, min_samples_split=4,
random_state=42)

Other Common Methods in Decision Tree Regression:


 dt_reg.fit(X_train, y_train) – Trains the Decision Tree regressor.
 y_pred = dt_reg.predict(X_test) – Predicts target values.

 score = dt_reg.score(X_test, y_test) – Computes R2R^2R2 regression score.

 feature_importance = dt_reg.feature_importances_ – Retrieves feature importance scores.


Ch 4.5 Regression MLP
sklearn.neural_network (Multi-Layer Perceptron Regression)
To use MLP for regression, first import: import sklearn.neural_network as nn

 mlp_reg = nn.MLPRegressor(hidden_layer_sizes, activation, solver, alpha, learning_rate, max_iter,


random_state)
o Returns: An MLP regression model instance.
o Parameters:
 hidden_layer_sizes (tuple, default=(100,)) – Number of neurons in hidden layers.
 activation ({'identity', 'logistic', 'tanh', 'relu'}, default='relu') – Activation function for hidden layers.
 solver ({'lbfgs', 'sgd', 'adam'}, default='adam') – Optimization algorithm for weight updates.
 alpha (float, default=0.0001) – L2 regularization parameter.
 learning_rate ({'constant', 'invscaling', 'adaptive'}, default='constant') – Learning rate schedule.
 max_iter (int, default=200) – Maximum number of training iterations.
 random_state (int, default=None) – Controls random number generation.

 Example:
mlp_reg = nn.MLPRegressor(hidden_layer_sizes=(50, 50), activation='relu', solver='adam', max_iter=300,
random_state=42)

Other Common Methods in MLP Regression:


 mlp_reg.fit(X_train, y_train) – Trains the MLP model.
 y_pred = mlp_reg.predict(X_test) – Predicts target values.

 score = mlp_reg.score(X_test, y_test) – Computes R2R^2R2 regression score.

 loss_curve = mlp_reg.loss_curve_ – Retrieves loss values over training iterations.


Ch 5. Dimensionality Reduction
sklearn.decomposition (Principal Component Analysis - PCA)
To use PCA, first import: import sklearn.decomposition as dc

 pca = dc.PCA(n_components, svd_solver, whiten, random_state)


o Returns: A PCA transformation instance.
o Parameters:
 n_components (int, float, default=None) – Number of principal components to retain.
 svd_solver ({'auto', 'full', 'arpack', 'randomized'}, default='auto') – Algorithm used for Singular Value
Decomposition.
 whiten (bool, default=False) – If True, normalizes transformed features.
 random_state (int, default=None) – Controls random number generation.

 Example:
pca = dc.PCA(n_components=2, svd_solver='auto', whiten=True, random_state=42)

sklearn.manifold (t-SNE for Non-Linear Dimensionality Reduction)


To use t-SNE, first import: import sklearn.manifold as mf

 tsne = mf.TSNE(n_components, perplexity, learning_rate, n_iter, random_state)


o Returns: A t-SNE transformation instance.
o Parameters:
 n_components (int, default=2) – Number of dimensions to reduce data to.
 perplexity (float, default=30.0) – Controls balance between local and global aspects of data.
 learning_rate (float, default=200.0) – Step size during optimization.
 n_iter (int, default=1000) – Number of optimization iterations.
 random_state (int, default=None) – Controls random number generation.

 Example:
tsne = mf.TSNE(n_components=2, perplexity=30, learning_rate=200, n_iter=1000, random_state=42)

Other Common Methods in Dimensionality Reduction:


 pca.fit(X) – Computes the principal components.
 X_transformed = pca.transform(X) – Applies PCA transformation.

 X_transformed = tsne.fit_transform(X) – Applies t-SNE transformation.

 explained_variance = pca.explained_variance_ratio_ – Retrieves variance explained by each component.


Evaluation Metrics in Scikit-Learn
Clustering Metrics
Used to assess the quality of clusters when no ground truth labels are available.
 silhouette_score(X, labels) – Measures how well-separated clusters are. Ranges from -1 (poor clustering) to 1 (well-clustered).
 davies_bouldin_score(X, labels) – Lower values indicate better-defined clusters.
 adjusted_rand_score(labels_true, labels_pred) – Compares predicted labels to ground truth (if available). Values range
from -1 (random) to 1 (perfect match).

from sklearn.metrics import silhouette_score, davies_bouldin_score,


adjusted_rand_score

sil_score = silhouette_score(X, labels)


db_score = davies_bouldin_score(X, labels)
ari_score = adjusted_rand_score(y_true, labels_pred)

Classification Metrics
Used to evaluate model performance when the ground truth labels are available.
 accuracy_score(y_true, y_pred) – Measures overall correctness (ratio of correct predictions).
 precision_score(y_true, y_pred, average='macro') – Measures how many predicted positives are actually correct.
 recall_score(y_true, y_pred, average='macro') – Measures how many actual positives were correctly identified.
 f1_score(y_true, y_pred, average='macro') – Harmonic mean of precision and recall, balancing both.
 confusion_matrix(y_true, y_pred) – Displays true positive, false positive, false negative, and true negative values.

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,


confusion_matrix

acc = accuracy_score(y_test, y_pred)


prec = precision_score(y_test, y_pred, average='macro')
rec = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')
cm = confusion_matrix(y_test, y_pred)

Regression Metrics
Used to measure the error between predicted and actual continuous values.
 r2_score(y_true, y_pred) – Measures how well the model explains variance (ranges from -∞ to 1).
 mean_squared_error(y_true, y_pred) – Penalizes large errors by squaring them (lower is better).
 mean_absolute_error(y_true, y_pred) – Measures average absolute differences (less sensitive to large errors).
 median_absolute_error(y_true, y_pred) – Measures median of absolute errors, robust to outliers.

from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error,


median_absolute_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
med_ae = median_absolute_error(y_test, y_pred)
Dimensionality Reduction Metrics
Used to assess how much information is retained after reducing dimensions.
 explained_variance_ratio_ (PCA) – Measures the proportion of variance retained per component.
 reconstruction_error_ (PCA) – Measures information loss from compression.
 kl_divergence_ (t-SNE) – Measures the difference between original and reduced distributions (lower is better).

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(X)
explained_variance = pca.explained_variance_ratio_

You might also like