Data Science
1.Python Statistics for Data Science Course
Module 1: Understanding the
Topics:
• Introduction to Data Types
• Numerical parameters to represent data
• Mean
• Mode
• Median
• Sensitivity
• Information Gain
• Entropy
• Statistical parameters to represent data
Module2:Probability and its uses
Topics:
• Uses of probability
• Need of probability
• Bayesian Inference
• Density Concepts
• Normal Distribution Curve
Module 3: Statistical Inference
Topics:
• Point Estimation
• Confidence Margin
• Hypothesis Testing
• Levels of Hypothesis Testing
Module 4: Testing the Data
Learning Objectives:
At the end of this module, you should be able to:
• Understand Parametric and Non-parametric Testing • Learn various types of parametric
testing
• Discuss experimental designing
• Explain a/b testing Topics
• Parametric Test
• Parametric Test Types
• Non- Parametric Test
• Experimental Designing
• A/B testing
Module 5: Data Clustering
Topics:
• Association and Dependence
• Causation and Correlation
• Covariance
• Simpson’s Paradox
• Clustering Techniques
Module 6: Regression Modelling
Topics:
• Logistic and Regression Techniques
• Problem of Collinearity
• WOE and IV
• Residual Analysis
• Heteroscedasticity
• Homoscedasticity rse Curriculum
2.R Statistics for Data science course
Module 1: Understanding the Data
Topics:
• Introduction to Data Types
• Numerical parameters to represent data
• Mean
• Mode
• Median
• Sensitivity
• Information Gain
• Entropy
• Statistical parameters to represent data
Module 2: Probability and its Uses
Topics
• Uses of probability
• Need of probability
• Bayesian Inference
• Density Concepts
• Normal Distribution Curve
Module 3: Statistical Inference
Topics
• Point Estimation
• Confidence Margin
• Hypothesis Testing
• Levels of Hypothesis Testing
Module 4: Testing the Data
Topics
• Parametric Test
• Parametric Test Types
• Non- Parametric Test
• A/B testing
Module 5: Data Clustering
Topics
• Association and Dependence
• Causation and Correlation
• Covariance
• Simpson’s Paradox
• Clustering Techniques
Module 6: Regression Modelling
Topics
• Logistic and Regression Techniques
• Problem of Collinearity
• WOE and IV
• Residual Analysis
• Heteroscedasticity
• Homoscedasticity
3.Data Science
Module 1: Introduction to Data Science
Topics
• What is Data Science?
• What does Data Science involve?
• Era of Data Science
• Business Intelligence vs Data Science
• Life cycle of Data Science
• Tools of Data Science
• Introduction to Big Data and Hadoop
• Introduction to R
• Introduction to Spark
• Introduction to Machine Learning
Module 2: Statistical Inference
Topics:
• What is Statistical Inference?
• Terminologies of Statistics
• Measures of Centers
• Measures of Spread
• Probability
• Normal Distribution
• Binary Distribution
Module 3: Data Extraction, Wrangling and Exploration
Topics
• Data Analysis Pipeline
• What is Data Extraction
• Types of Data
• Raw and Processed Data
• Data Wrangling
• Exploratory Data Analysis
• Visualization of Data
Module 4: Introduction to Machine Learning
Topics
• What is Machine Learning?
• Machine Learning Use-Cases
• Machine Learning Process Flow
• Machine Learning Categories
• Supervised Learning algorithm: Linear Regression and Logistic
• Regression
Module 5: Classification Techniques
Topics
• What are classification and its use cases?
• What is Decision Tree?
• Algorithm for Decision Tree Induction
• Creating a Perfect Decision Tree
• Confusion Matrix
• What is Random Forest?
• What is Navies Bayes?
• Support Vector Machine: Classification
Module 6: Unsupervised Learning
Topics
• What is Clustering & its use cases
• What is K-means Clustering?
• What is C-means Clustering?
• What is Canopy Clustering
• What is Hierarchical Clustering?
Module 7: Recommender Engines
Topics
• What is Association Rules & its Use Cases?
• What is Recommendation Engine & its Workings?
• Types of Recommendations
• User-Based Recommendation
• Item-Based Recommendation
• Difference: User-Based and Item-Based Recommendation
• Recommendation Use Cases
Module 8: Text Mining
Topics
• The concepts of text-mining
• Use cases
• Text Mining Algorithms
• Quantifying text
• TF-IDF
• Beyond TF-IDF
Module 9: Time Series
Topics
• What is Time Series data?
• Time Series variables
• Different components of Time Series data
• Visualize the data to identify Time Series Components
• Implement ARIMA model for forecasting
• Exponential smoothing models
• Identifying different time series scenario based on which different Exponential Smoothing model can be
applied
• Implement respective ETS model for forecasting
Module 10: Deep Learning
Topics
• Reinforced Learning
• Reinforcement learning Process Flow
• Reinforced Learning Use cases
• Deep Learning
• Biological Neural Networks
• Understand Artificial Neural Networks
• Building an Artificial Neural Network
• How ANN works
• Important Terminologies of ANN’s
4.Python for Data Science
Module 1: Introduction to Python
Topics
• Overview of Python
• The Companies using Python
• Different Applications where Python is used
• Discuss Python Scripts on UNIX/Windows
• Values, Types, Variables
• Operands and Expressions
• Conditional Statements
• Loops
• Command Line Arguments
• Writing to the screen
Module 2: Sequences and File Operations
Topics
• Python files I/O Functions
• Numbers
• Strings and related operations
• Tuples and related operations
• Lists and related operations
• Dictionaries and related operations
• Sets and related operations
Module 3: Deep Dive – Functions, OOPs, Modules, Errors and Exceptions
Topics
• Functions
• Function Parameters
• Global Variables
• Variable Scope and Returning Values
• Lambda Functions
• Object-Oriented Concepts
• Standard Libraries
• The Import Statements
• Module Search Path
• Package Installation Ways
• Errors and Exception Handling
• Handling Multiple Exceptions
Module 4: Introduction to NumPy, Pandas and Matplotlib
Topics
• NumPy - arrays
• Operations on arrays
• Indexing slicing and iterating
• Reading and writing arrays on files
• Pandas - data structures & index operations
• Reading and Writing data from Excel/CSV formats into Pandas
• matplotlib library
• Grids, axes, plots
• Markers, colors, fonts and styling
• Types of plots - bar graphs, pie charts, histograms
• Contour plots
Module 5: Data Manipulation
Topics
• Basic Functionalities of a data object
• Merging of Data objects
• Concatenation of data objects
• Types of Joins on data objects
• Exploring a Dataset
• Analyzing a dataset
Module 6: Introduction to Machine Learning with Python
Topics
• Python Revision (NumPy, Pandas, scikit learn, matplotlib)
• What is Machine Learning?
• Machine Learning Use-Cases
Module 7: Supervised Learning - I
Topics
• What are Classification and its use cases?
• What is Decision Tree?
• Algorithm for Decision Tree Induction
• Creating a Perfect Decision Tree
• Confusion Matrix
• What is Random Forest?
Module 8: Dimensionality Reduction
Topics
• Introduction to Dimensionality
• Why Dimensionality Reduction
• PCA
• Factor Analysis
• Scaling dimensional model
• LDA
Module 9: Supervised Learning - II
Topics
• What is Naïve Bayes?
• How Naïve Bayes works?
• Implementing Naïve Bayes Classifier
• What is Support Vector Machine?
• Illustrate how Support Vector Machine works?
• Hyperparameter Optimization
• Grid Search vs Random Search
• Implementation of Support Vector Machine for Classification
Module 10: Unsupervised Learning
Topics
• What is Clustering & its Use Cases?
• What is K-means Clustering?
• How does K-means algorithm work?
• How to do optimal clustering
• What is C-means Clustering?
• What is Hierarchical Clustering?
• How Hierarchical Clustering works?
Module 11: Association Rules Mining and Recommendation Systems
Topics
• What are Association Rules?
• Association Rule Parameters
• Calculating Association Rule Parameters
• Recommendation Engines
• How does Recommendation Engines work?
• Collaborative Filtering
• Content-Based Filtering
Module 12: Reinforcement Learning
Topics
• What is Reinforcement Learning
• Why Reinforcement Learning
• Elements of Reinforcement Learning
• Exploration vs Exploitation dilemma
• Epsilon Greedy Algorithm
• Markov Decision Process (MDP)
• Q values and V values
• Q – Learning
• α values
Module 13: Time Series Analysis
Topics
• What is Time Series Analysis?
• Importance of TSA
• Components of TSA
• White Noise
• AR model
• MA model
• ARMA model
• ARIMA model
• Stationarity
• ACF & PACF
Module 14: Model Selection and Boosting
Topics
• What is Model Selection?
• The need for Model Selection
• Cross-Validation
• What is Boosting?
• How Boosting Algorithms work?
• Types of Boosting Algorithms
• Adaptive Boosting
5.Apache Spark and Scala
Module 1: Introduction to Big Data Hadoop and Spark
Topics
• What is Big Data?
• Big Data Customer Scenarios
• Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case • How Hadoop
Solves the Big Data Problem?
• What is Hadoop?
• Hadoop’s Key Characteristics
• Hadoop Ecosystem and HDFS
• Hadoop Core Components
• Rack Awareness and Block Replication YARN and its Advantage
• Hadoop Cluster and its Architecture
• Hadoop: Different Cluster Modes
• Big Data Analytics with Batch & Real-time Processing
• Why Spark is needed?
• What is Spark?
• How Spark differs from other frameworks?
• Spark at Yahoo!
Module 2: Introduction to Scala and Apache Spark
Topics
• What is Scala?
• Scala in other Frameworks
• Basic Scala Operations
• Control Structures in Scala
• Collections in Scala- Array
• Why Scala for Spark?
• Introduction to Scala REPL
• Variable Types in Scala
• Foreach loop, Functions and Procedures
• ArrayBuffer, Map, Tuples, Lists, and more
Module 3: Functional Programming and OOPs Concepts in Scala
Topics
• Functional Programming
• Anonymous Functions
• Getters and Setters
• Properties with only Getters
• Singletons
• Overriding Methods
• Higher Order Functions
• Class in Scala
• Custom Getters and Setters
• Auxiliary Constructor and Primary Constructor
• Extending a Class
• Traits as Interfaces
• and Layered Traits
Module 4: Deep Dive into Apache Spark Framework
Topics
• Spark’s Place in Hadoop Ecosystem
• Spark Components & its Architecture
• Spark Deployment Modes
• Introduction to Spark Shell
• Writing your first Spark Job Using SBT
• Submitting Spark Job
• Spark Web UI
• Data Ingestion using Sqoop
Module 5: Playing with Spark RDDs
Topics
• Challenges in Existing Computing Methods
• Probable Solution & How RDD Solves the Problem
• What is RDD, Its Functions, Transformations & Actions?
• Data Loading and Saving Through RDDs
• Key-Value Pair RDDs
• Other Pair RDDs o RDD Lineage
• RDD Lineage
• RDD Persistence
• WordCount Program Using RDD Concepts
• RDD Partitioning & How It Helps Achieve Parallelization
• Passing Functions to Spark
Module 6: DataFrames and Spark SQL
Topisc
• Need for Spark SQL
• What is Spark SQL?
• Spark SQL Architecture
• SQL Context in Spark SQL
• User Defined Functions
• Data Frames & Datasets
• Interoperating with RDDs
• JSON and Parquet File Formats
• Loading Data through Different Sources
• Spark – Hive Integration
Module 7: Machine Learning using Spark MLlib
Topics
• Why Machine Learning?
• What is Machine Learning?
• Where Machine Learning is Used?
• Face Detection: USE CASE
• Different Types of Machine Learning Techniques
• Introduction to MLlib
• Features of MLlib and MLlib Tools
• Various ML algorithms supported by MLlib
Module 8: Deep Dive into Spark MLlib
Topics
• Supervised Learning - Linear Regression, Logistic Regression, DecisionmTree, Random Forest
• Unsupervised Learning - K-Means Clustering & How It Workswith MLlib
• Analysis on US Election Data using MLlib (K-Means)
Module 9: Understanding Apache Kafka & Apache Flume
Topics
• Need for Kafka
• Core Concepts of Kafka
• Where is Kafka Used?
• What is Kafka?
• Kafka Architecture
• Understanding the Components of Kafka Cluster
• Configuring Kafka Cluster
• Need of Apache Flume
• What is Apache Flume?
• Flume Sources
• Flume Channels
• Integrating Apache Flume and Apache Kafka
• Basic Flume Architecture
• Flume Sinks
• Flume Configuration
Module 10: Apache Spark Streaming- Processing Multiple Batches
Topics
• Drawbacks in Existing Computing Methods
• Why Streaming is Necessary?
• What is Spark Streaming?
• Spark Streaming Features
• Spark Streaming Workflow
• How Uber Uses Streaming Data
• Streaming Context & DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators
Module 11: Apache Spark Streaming- Data Sources
Topics
• Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source
• Perform Twitter Sentimental Analysis Using Spark Streaming
Module 12: In Class Project
Learning Objectives
Work on an end-to-end Financial domain project covering all the major concepts of Spark taught during the
course.
Module 13: Spark GraphX(Self-Paced)
6.Deep Learning with TensorFlow 2.0
Module 1: Introduction to Deep Learning
Topics
• What is Deep Learning?
• Curse of Dimensionality
• Machine Learning vs. Deep Learning
• Use cases of Deep Learning
• Human Brain vs. Neural Network
• What is Perceptron?
• Learning Rate
• Epoch
• Batch Size
• Activation Function
• Single Layer Perceptron
Module 2: Getting Started with TensorFlow 2.0
Topics
• Introduction to TensorFlow 2.x
• Installing TensorFlow 2.x
• Defining Sequence model layers
• Activation Function
• Layer Types
• Model Compilation
• Model Optimizer
• Model Loss Function
• Model Training
• Digit Classification using Simple Neural Network in TensorFlow 2.x
• Improving the model
• Adding Hidden Layer
• Adding Dropout
• Using Adam Optimizer
Module 3: Convolution Neural Network
Topics
• Image Classification Example
• What is Convolution
• Convolutional Layer Network
• Convolutional Layer
• Filtering
• ReLU Layer
• Pooling
• Data Flattening
• Fully Connected Layer
• Predicting a cat or a dog
• Saving and Loading a Model
• Face Detection using OpenCV
Module 4: Regional CNN
Topics
• Regional-CNN
• Selective Search Algorithm
• Bounding Box Regression
• SVM in RCNN
• Pre-trained Model
• Model Accuracy
• Model Inference Time
• Model Size Comparison
• Transfer Learning
• Object Detection – Evaluation
• mAP
• IoU
• RCNN – Speed Bottleneck
• Fast R-CNN
• RoI Pooling
• Fast R-CNN – Speed Bottleneck
• Faster R-CNN
• Feature Pyramid Network (FPN)
• Regional Proposal Network (RPN)
• Mask R-CNN
Module 5: Boltzmann Machine & Autoencoder
Topics
• What is Boltzmann Machine (BM)?
• Identify the issues with BM
• Why did RBM come into picture?
• Step by step implementation of RBM
• Distribution of Boltzmann Machine
• Understanding Autoencoders
• Architecture of Autoencoders
• Brief on types of Autoencoders
• Applications of Autoencoders
Module 6: Generative Adversarial Network(GAN)
Topics
• What is Boltzmann Machine (BM)?
• Identify the issues with BM
• Why did RBM come into picture?
• Step by step implementation of RBM
• Distribution of Boltzmann Machine
• Understanding Autoencoders
• Architecture of Autoencoders
• Brief on types of Autoencoders
• Applications of Autoencoders
Module 7: Emotion and Gender Detection
Topics
• What is Boltzmann Machine (BM)?
• Identify the issues with BM
• Why did RBM come into picture?
• Step by step implementation of RBM
• Distribution of Boltzmann Machine
• Understanding Autoencoders
• Architecture of Autoencoders
• Brief on types of Autoencoders
• Applications of Autoencoders
Module 8: Introduction RNN and GRU
Topics
• What is Boltzmann Machine (BM)?
• Identify the issues with BM
• Why did RBM come into picture?
• Step by step implementation of RBM
• Distribution of Boltzmann Machine
• Understanding Autoencoders
• Architecture of Autoencoders
• Brief on types of Autoencoders
• Applications of Autoencoders
Module 9: LSTM
Topics
• What is Boltzmann Machine (BM)?
• Identify the issues with BM
• Why did RBM come into picture?
• Step by step implementation of RBM
• Distribution of Boltzmann Machine
• Understanding Autoencoders
• Architecture of Autoencoders
• Brief on types of Autoencoders
• Applications of Autoencoders
Module 10: Auto Image Captioning Using CNN LSTM
Topics
• Auto Image Captioning
• COCO dataset
• Pre-trained model
• Inception V3 model
• Architecture of Inception V3
• Modify last layer of pre-trained model
• Freeze model
• CNN for image processing
• LSTM or text processing
7.Tableau Training
Module 1: Data Preparation using Tableau Prep
Topics:
• Data Visualization
• Business Intelligence tools
• Introduction to Tableau
• Tableau Architecture
• Tableau Server Architecture
• VizQL
• Introduction to Tableau Prep
• Tableau Prep Builder User Interface
• Data Preparation techniques using Tableau Prep Builder tool
Module 2: Data Connection with Tableau Desktop
Topics:
• Features of Tableau Desktop
• Connect to data from File and Database
• Types of Connections
• Joins and Unions
• Data Blending
• Tableau Desktop User Interface
• Basic project: Create a workbook and publish it on Tableau Online
Module 3: Basic Visual Analytics
Topics:
• Visual Analytics
• Basic Charts: Bar Chart, Line Chart, and Pie Chart
• Hierarchies
• Data Granularity
• Highlighting
• Sorting
• Filtering
• Grouping
• Sets
Module 4: Calculations in Tableau
Topics:
• Types of Calculations
• Built-in Functions (Number, String, Date, Logical and Aggregate)
• Operators and Syntax Conventions
• Table Calculations
• Level Of Detail (LOD) Calculations
• Using R within Tableau for Calculations
Module 5: Advanced Visual Analytics
Topics:
• Parameters
• Tool tips
• Trend lines
• Reference lines
• Forecasting
• Clustering
Module 6: Level of Detail (LOD) Expressions in Tableau
Topics:
• Use Case I - Count Customer by Order
• Use Case II - Profit per Business Day
• Use Case III - Comparative Sales
• Use Case IV - Profit Vs Target
• Use Case V - Finding the second order date
• Use Case VI - Cohort Analysis
Module 7: Geographic Visualizations in Tableau
Topics:
• Introduction to Geographic Visualizations
• Manually assigning Geographical Locations
• Types of Maps
• Spatial Files
• Custom Geocoding
• Polygon Maps
• Web Map Services
• Background Images
Module 8: Advanced Charts in Tableau
Topics:
• Box and Whisker’s Plot
• Bullet Chart
• Bar in Bar Chart
• Gantt Chart
• Waterfall Chart
• Pareto Chart
• Control Chart
• Funnel Chart
• Bump Chart
• Step and Jump Lines
• Word Cloud
• Donut Chart
Module 9: Dashboards and Stories
Topics:
• Introduction to Dashboards
• The Dashboard Interface
• Dashboard Objects
• Building a Dashboard
• Dashboard Layouts and Formatting
• Interactive Dashboards with actions
• Designing Dashboards for devices
• Story Points
Module 10: Get Industry Ready
Topics:
• Tableau Tips and Tricks
• Choosing the right type of Chart
• Format Style
• Data Visualization best practices
• Prepare for Tableau Interview
Module 11: Exploring Tableau Online
Topics:
• Publishing Workbooks to Tableau Online
• Interacting with Content on Tableau Online
• Data Management through Tableau Catalog
• AI-Powered features in Tableau Online (Ask Data and Explain Data)
• Understand Scheduling
• Managing Permissions on Tableau Online
• Data Security with Filters in Tableau Online