Training Table of Contents (TOC): DataBricks ML and Generative AI for
Engineers
Overview
Duration: 40 Hours (5 Days, 8 Hours/Day)
Objective: Provide hands-on training for engineers to understand and work with DataBricks
Machine Learning and Generative AI tools and techniques.
Day 1: Foundations of DataBricks and Machine Learning
Objective: Familiarize participants with the DataBricks platform and the basics of Machine
Learning.
1. Introduction to DataBricks (2 Hours)
- What is DataBricks and its Role in ML?
- Overview of the DataBricks Workspace
- Introduction to Spark and Delta Lake
- Hands-On: Setting up a DataBricks Workspace, creating and managing notebooks,
exploring cluster management.
2. Basics of Machine Learning (2 Hours)
- What is Machine Learning? Types and Use Cases.
- Overview of ML Workflow in DataBricks.
- Hands-On: Loading datasets into DataBricks, basic data exploration using Pandas and
PySpark.
3. Feature Engineering and Data Preprocessing (2 Hours)
- Handling missing values, scaling, and encoding.
- Introduction to PySpark MLlib for feature transformations.
- Hands-On: Cleaning and preprocessing data using DataBricks notebooks.
4. Introduction to ML Models (2 Hours)
- Overview of regression, classification, and clustering.
- Hands-On: Building a simple regression model in DataBricks, evaluating model accuracy
using MLlib.
Day 2: DataBricks ML in Practice
Objective: Build ML pipelines and workflows on the DataBricks platform.
1. Building ML Pipelines (3 Hours)
- What are ML Pipelines?
- Using PySpark MLlib for pipeline creation.
- Hands-On: Creating and executing an end-to-end ML pipeline, visualizing pipeline outputs.
2. Hyperparameter Tuning and Model Optimization (2 Hours)
- Basics of hyperparameter tuning.
- Using DataBricks MLflow for tracking experiments.
- Hands-On: Setting up and tracking multiple model runs in MLflow, comparing model
performance.
3. Model Deployment Basics (3 Hours)
- Introduction to DataBricks ML model serving.
- Hands-On: Deploying a trained model using DataBricks serving, testing the deployed
model with sample inputs.
Day 3: Introduction to Generative AI
Objective: Learn foundational concepts and simple applications of Generative AI.
1. What is Generative AI? (1 Hour)
- Overview of Generative AI and its applications.
- Key types of generative models (GANs, VAEs, Transformers).
2. Introduction to Pretrained Generative Models (2 Hours)
- Using pretrained models for Generative AI tasks.
- Hands-On: Loading a GPT-like model for text generation using Hugging Face, generating
simple text outputs.
3. Basics of Neural Networks (3 Hours)
- What are Neural Networks?
- Basics of feedforward networks and backpropagation.
- Hands-On: Building a simple neural network in PyTorch or TensorFlow in DataBricks.
4. Introduction to Transfer Learning (2 Hours)
- How transfer learning works in Generative AI.
- Hands-On: Fine-tuning a pretrained generative model on custom data.
Day 4: Practical Generative AI with DataBricks
Objective: Apply Generative AI techniques to practical tasks.
1. Generative AI for Text (3 Hours)
- Fine-tuning GPT-like models for domain-specific tasks.
- Hands-On: Fine-tuning a text-generation model on small custom datasets, generating
summaries or custom outputs.
2. Generative AI for Images (3 Hours)
- Introduction to GANs and image generation.
- Hands-On: Generating simple synthetic images using a pretrained GAN model.
3. Basic Evaluation Techniques for Generative AI (2 Hours)
- Evaluating the quality of generative outputs.
- Hands-On: Comparing generated outputs using metrics like BLEU (text) or FID (images).
Day 5: Bringing It All Together
Objective: Integrate DataBricks ML and Generative AI knowledge into real-world
workflows.
1. End-to-End ML Workflow (3 Hours)
- Designing and executing a complete ML project.
- Hands-On: Preprocessing data, training an ML model, and deploying it.
2. End-to-End Generative AI Workflow (3 Hours)
- Solving a real-world Generative AI problem.
- Hands-On: Fine-tuning and deploying a generative model (text or image).
3. Q&A and Discussion (2 Hours)
- Resolving queries and feedback.
- Discussion on real-world challenges and applications.
- Career and project opportunities in DataBricks ML and Generative AI.