Data Science Cheatsheet tiRepost
I. Core Concepts & Mathematics
Linear Algebra: Calculus:
• Derivatives: Rules (power, product, quotient, chain), Partial Derivatives,
• Vectors: Operations (addition, scalar multiplication, dot product, cross product), Gradients, Jacobian, Hessian. Understanding optimization (finding
Norms (LI, L2), Linear Independence, Span, Basis. minima/maxima).
•Matrices: Operations (addition, multiplication,
Trace, Rank, Eigenvalues, Eigenvectors, Matrix Decompositions
transpose, inverse), Detorminant,
(SVD, PCA - brief
Integrals
critical for
:Basic
daily
integration
DS work
rules ,Definite
than derivatives)
vs .Indefinite integrals (briefly , less
mention, full details in Machine Learning). • Limits and Continuity (fundamental concepts, but rarely used explicitly in daily
•Systems of Linear Equations: Gaussian Elimination, LU Decomposition (briefly). DS work).
•Vector Spaces and Subspaces • Taylor Series: (Important for understanding some model approximations).
Probability & Statistics:
Basic Probability: Definitions (sample space, events, probability axioms), Conditional Probability, Independence, Bayes' Theorem.
Random Variables: Discrete vs. Continuous, Probability Distributions (PDF, PMF,CDF), Expectation, Variance, Standard Deviation, Covariance, Correlation.
CommonDistributions
:Discrete: Bernoulli, Binomial, Poisson, Geometric.
Continuous: Uniform, Normal (Gaussian), Exponential, Chi-squared, t-distribution, F-distribution.
Descriptive Statistics: Measures of Central Tendency(Mean,Median, Mode),Measures of Dispersion (Variance, Standard Deviation, Range,IOR), Quantiles,
Percentiles.
Inforontial Statistics:
• Hypothesis Testingg: and Alternative Hypotheses, p-value, Significance Level,Type
Null and Type Errors, t-tests, ANOVA, Chi-squared tests,
I
IlI
•Confidence Intervals: and calculation.
Interpretation
Sampling: Simple Random Sampling, Stratified Sampling, etc. (brief overview).
Central Limit Theorem (crucial for understanding sampling distributions).
Maximum Likolihood Estimation (MLE)
•Bayesian Statistics (basics): Prior, Likelihood, Posterior, MAP estimation.
Optimization
• Gradient Descent: Variants (Batch, Stochastic, Mini-batch), Learning Rate, Momentum,Adam, RMSprop.
• Convex Optimization: (brief overview - knowing when a problem is convex is helpful).
• Regularization Concepts: L1 (Lasso), L2 (Ridge) (can also be listed in Machine Learning).
• Constraint Optimisation (Lagrange Multipliers)
Supervised Learning: Unsupervised Learning
Regrossion:
Machine Learning Clustering:
.k-Means Clustering.
Linear Regression (Simple, Multiple), Polynomial Regression,
Model Selection & Evaluation:
Hierarchical Clustering.
Regularization (Lasso,Ridge, Elastic Net)., Evaluation .CuValdation: k-lold.Legve-One-Out, Stratfied DBSCAN.
k-fold.
Metrics: MSE, RMSE, MAE,R-squared, Adjusted R-squared Hyperparameter Tuning:Gnd Search, Random Search, Bayesian Evaluation Metrics: Silhouette Score, Davies
Optimization.
(Mentioned speoificolgorithma,but good have a Bouldin
in Index.
Regularization:
to
In
classification:
in
Dimonsionality Reduction:
k-Nearest Neighbors (k-NN), Logistic Regression, Support Vector .itnoond Jnderfirting
•Principal Component Analysis (PCA).
Machines (sVM): Kernels (Linear, Polynomial, RBF), Decision Trees.,
Foature Engineering: Time Sorios: *-distributed Stochastic Neighbor Embedding (1-SNE).
Random Forests, Gradient Boosting Machines (GBM): XGBoost,
LightGBM, CatBoost, Naive Bayos., Evaluation Metrios: Accuracy, • Scalng Standardization,Normaization. Stationarity, ARIMA, .Adm Alyss (LDA).
ncoders (brief mention, more detail Deep Learning
in
Precision, Recall, Fl-score, ROC AUC, Confusion Matrix, PR AUC.
Wropper, Fmbedded Exponential Smoothing
Selection: section)
Filter,
•
DEEP LEARNING PROGRAMMING LANGUAGE & ToOLS COMMUNICATION, DEPLOYMENT& ETHICs
Noural Networks: Perceoptron, MLP, Activation (RelU, Sigmoid, Python:
Softmax), Backpropagation, Loss Functions. • Data Structures: Lists,Dictionaries. Communication & Deployment
.CNNS: Convolutional Layers, Pooling. •Control Flow: Loops,Conditionals. • Storytelling, Presentation, Reports.
•RNNs: LSTM, GRU. • Deployment: Model serialization, REST APIs,
•Functions, Lambdas.
Other: Batch Normalization, Dropout, Transfer Learning. Libraries: Cloud.
• NumPy: Array operations.
Data Handling • Pandas: DataFrames
Ethics
• Scikit-learn:Machine learning.
•Bias, Fairness, Privacy, Transparency.
Visualization
pioib/Seaborn:
Wrangling: Missing Values (Imputation), Outliers, Data Cleaning.
•
•Formats: CSV, JSON, SOL.
Visualzation: Histograms, Scatter Plots, Box Plots, Heatmaps.
. sOL: SELECT,WHERE, GROUP BY, JOIN.
add, commit, push,
•
• Git: pull,branch, merge.
Lubraries:Matplotlib, Seaborn, • Environments: Conda, Virtualenv By Shailesh
in
Plotly.
@beginnersblog.org