0% found this document useful (0 votes)

22 views27 pages

W01 PracticalProblemsProjects

Uploaded by

eddieguo1128

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views27 pages

W01 PracticalProblemsProjects

Uploaded by

eddieguo1128

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Practical Problems & Projects

11-767: On-Device Machine Learning

Prof. Emma Strubell

Recognizing People in Photos
Through Private On-Device Machine
Learning
Floris Chabert, Jingwen Zhu, Brett Keating, and Vinay Sharma

Apple Research Blog July 2021

https://machinelearning.apple.com/research/recognizing-people-photos

Yonatan Bisk & Emma Strubell

Task
Find faces of contacts

Why is this hard?

Lighting, perspective,
skin color, age, gender, …

Why is this hard On-Device?

Motivation
On-Device face recognition is privacy preserving

Context:

• Competitors in the market (e.g. Google) use cloud based services and so your
data is shared.

• Apple has their own neural engine for acceleration.

• Quality vs battery.

What is a naive algorithm we might use?

Notes:

Inference Pipeline 1. Two feature representations (2 * model)

2. Agglomerative Clustering (naively expensive)
3. Use of external metadata (can correct for weak model)
Clustering
1. Conservative embedding clusters (very few merges - within moments?)
Relies on hand-tuned weighting for face (vs mean face) and body
Dij = min(Fij, α ⋅ Fij + β ⋅ Tij) where F & T are face and body, respectively
2. Agglomerative Clustering (Faces only)
First pass (ideal): “median distance between the members of two HAC clusters”
After threshold: “random sampling” ← Maintain linear runtime (no guarantees)

Note: runs periodically,

typically overnight during device charging
Assigning Identity
• Every cluster has c “canonical” exemplars:
1 1 1 2 2 2 K K
D= [X0 , X1 , . . . Xc , X0 , X1 , . . . Xc , X1 , . . . Xc ]
• Construct a representation for the input as a function of the dictionary (existing
2
clusters): min ∥y − D ⋅ x∥2 + λ ⋅ ∥x∥1
x
This reduces to a convex optimization (least squares) for the values in x

• So this is quickly learnable (optimally)

i
• Now the values in xj for a given X* de ne the cluster
fi
Network Design
Network Design
“highest accuracy possible while running ef ciently on-device, with low latency and a thin memory pro le”

• Skipping important details here because the model is

largely based around MobileNet which will be discussed
later, BUT

• Double channels “within limits of computation”

• Bottleneck expansions are smaller and added attention

at every layer

• PReLU
fi
fi
Network Design
“highest accuracy possible while running ef ciently on-device, with low latency and a thin memory pro le”

• Wider ~= performance to deeper (but faster)

Zagoruyko, S., Komodakis, N.: Wide residual networks

• Attention adds performance with little to no new parameters

fi
fi
Performance of Attention
Training (Focus on normalization and cos)

Margin ensures weighting on hard examples

Training (Focus on normalization and cos)
Other Considerations

1. Filtering Unclear Faces (no details)

2. Augmentations: “pixel-level changes such as color jitter or grayscale conversion, structural

changes like left-right ipping or distortion, Gaussian blur, random compression artifacts and
cutout regularization”

3. COVID-19: “we designed a synthetic mask augmentation. We used face landmarks to

generate a realistic shape corresponding to a face mask. We then overlaid random samples
from clothing and other textures in the inferred mask area over the input face”
fl
Qualitative
Key Components
• Optimized clustering (constant time)

• Assignment via Convex coding (minimal updates)

• Wider (shallower) networks

Questions:
1. Was the attention worth it?
2. Was this only possible because of the neural engine?
Course Project
Anatomy of the Course Project

We provide: You decide:

• Lab 2: Benchmarking • Hardware: Laptop, robot, RPi…

• Lab 3: Quantization • Model: ResNet, Transformer,

encoder vs. decoder…
• Lab 4: Pruning
• Data: Language, vision, …
Same as training data, or transfer/
adaptation setting?
Example Projects
• AnySurface: Converting any surface into a
controller by compressing UNet, run on RPi4.

• Speech-to-text translation: Automatic speech

recognition and translation on RPi4.

• Im2Cal: Estimating food calories from image by

compressing Segformer, on RPi4.

• Hey where’s that thing: Temporal localization in

videos by compressing 2D-TAN on laptop.

• Shazaam: On-device music recognition w/

FAISS, separable convolutions.
Example Projects
• Plant Jones: Smart assistant, who is also a plant.

• v1.0 (2015): Find tweets with positive/negative

sentiment about water, post positive sentiment ones
when well watered, negative when thirsty (dry).

• V2.0 (2023): Use an LLM to generate thirst-related

conversation. Also:
— Custom wake-word detection (“hey plant!”)
— Text-to-speech
— Speech-to-text
— Tiny LCD screen mouth

• ^This is an example baseline using open source

software and libraries — implemented using out-of-the-
box tools over about a week.
Axes to Consider
• Theory or practice? Resource optimized vs resource constrained?

• Target hardware:
CPU + RAM vs GPU/M1 + Shared RAM vs GPU+CPU + Separate RAM

• Hardware support: Logic, quantization, sparse ops, batching…

• Novelty: Reproduction vs transfer (new data/hardware) vs novel?

• In-distribution or transfer: Fitting to in-distribution data, vs. adapting to a new

task or domain?
Efficiency in Theory versus Practice

Resource Optimized Resource Constrained

• Magnitude pruning • Structured pruning / layer pruning

• Server • Edge device

• Quantization (3-bit) • Quantization (8-bit) if hw supports

Target hardware considerations
In addition to devices, we can provide: $100 AWS and $50 OpenAI credits
per student.

• Where do you store model weights, activations, gradients?

How does this impact latency?

• Trade-off between storage size, speed, and on-the- y computation

• Do I want on-device training? Fine-tuning?

• How heavy is the OS? How heavy are USB vs GPIO?

• Does your hardware support ef cient batched computation? Ef cient low-

bitwidth computation? Ef cient control ow?
fi
fi
fl
fl
fi
Project Ideas
Resource Optimized
• Does ef cient method X published in a CV venue apply to NLP, or vice versa?

• Does theoretically proven idea Y published in ML venue apply to larger, more

complex models and datasets?
Resource Constrained
• Does “ef cient” method Z, evaluated on GPU/TPU, work on CPU/Edge? Under
memory constraints? Power constraints?

• Can you further optimize an already-ef cient model?

Can you compress a huge model enough to t it on device?
All of the above
• Compare existing methods across different metrics: Pareto optimality,
generalization, fairness, …
fi
fi
fi
fi
Learning Goals

Project is not Project is

• Entrepreneurship 101 • Measuring Ef ciency and Power

• Multimodal Machine Learning • Adjusting data for 👆

(amazing class)
• Changing architectures for 👆
• Graded based on model performance
• Producing Pareto curves for 👆
• Real world robotics
fi
Plotting Goals
Where to start?
• What pre-trained models exist for my task?

• What is a baseline I can feasibly train/evaluate in a few hours?

• How can I sub-sample my data to create a feasible train/test set?

• Single domain? Limited label space? Simpli ed task?

• Goal: Performance that’s non-trivial but do not need competitive performance

• What is unique about my data/task/… that makes me think I can

compress my models?
fi

Business Writing Scenarios
75% (4)
Business Writing Scenarios
397 pages
Java Interview JavaTpoint
100% (1)
Java Interview JavaTpoint
170 pages
Standard Ii: Talent Search Examination - 2021-22
100% (2)
Standard Ii: Talent Search Examination - 2021-22
17 pages
Deep Learning Nanodegree Program
No ratings yet
Deep Learning Nanodegree Program
9 pages
ColorGATE RIP-Software Release Notes 8.00 Build 5055
No ratings yet
ColorGATE RIP-Software Release Notes 8.00 Build 5055
34 pages
Tutorials
No ratings yet
Tutorials
17 pages
Which Device (A-H) Would You Use For The Tasks (1-8) ? ( ../8)
100% (3)
Which Device (A-H) Would You Use For The Tasks (1-8) ? ( ../8)
3 pages
Summary Notes
No ratings yet
Summary Notes
61 pages
Deep Learning NLP and Computer Vision
No ratings yet
Deep Learning NLP and Computer Vision
9 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Ug4 Proj
No ratings yet
Ug4 Proj
44 pages
CNN, MTCNN, Caps-net Face Recognition Analysis
No ratings yet
CNN, MTCNN, Caps-net Face Recognition Analysis
35 pages
M Thesis Report
No ratings yet
M Thesis Report
38 pages
Final - Tristan PP Slides
No ratings yet
Final - Tristan PP Slides
14 pages
CNN Basic
No ratings yet
CNN Basic
64 pages
Deep Learning Hardware Evolution
No ratings yet
Deep Learning Hardware Evolution
82 pages
Intro to Machine Learning Course
No ratings yet
Intro to Machine Learning Course
83 pages
PyTorch Guide
No ratings yet
PyTorch Guide
17 pages
Module 10 - Learners Guide
No ratings yet
Module 10 - Learners Guide
29 pages
Curriculum Vitae PDF
No ratings yet
Curriculum Vitae PDF
2 pages
Bigthinx Interview Prep Sheet
No ratings yet
Bigthinx Interview Prep Sheet
33 pages
Mohamed Nassar Resume
No ratings yet
Mohamed Nassar Resume
6 pages
Autonomous Car
100% (1)
Autonomous Car
12 pages
AI Practical File Expanded
No ratings yet
AI Practical File Expanded
41 pages
15 Ways To Lower LLM Costs
No ratings yet
15 Ways To Lower LLM Costs
17 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
10 pages
Intro Ai Group3
No ratings yet
Intro Ai Group3
35 pages
DL Mini Project
No ratings yet
DL Mini Project
26 pages
Deep Learning Basics Lecture 11 Practical Methodology
No ratings yet
Deep Learning Basics Lecture 11 Practical Methodology
25 pages
Deep Learning for Vision Experts
No ratings yet
Deep Learning for Vision Experts
91 pages
CampusX (D.L) Course Syllabus
No ratings yet
CampusX (D.L) Course Syllabus
5 pages
Computer Vision Engineer Interview Preparation Guide
No ratings yet
Computer Vision Engineer Interview Preparation Guide
20 pages
Computer Vision & CNNs - Study Notes
No ratings yet
Computer Vision & CNNs - Study Notes
12 pages
Syllabus Udacity Default en Us
No ratings yet
Syllabus Udacity Default en Us
4 pages
Ee210-Project Report Pdf-Ilovepdf-Compressed
No ratings yet
Ee210-Project Report Pdf-Ilovepdf-Compressed
59 pages
DesignSafe Bootcamp V1
No ratings yet
DesignSafe Bootcamp V1
129 pages
C8-Modern CNNs
No ratings yet
C8-Modern CNNs
57 pages
Crowd Counting
No ratings yet
Crowd Counting
11 pages
Introduction To AI and Machine Learning - PPTX - 20241231 - 193227 - 0000
No ratings yet
Introduction To AI and Machine Learning - PPTX - 20241231 - 193227 - 0000
10 pages
Lecture 1-4
No ratings yet
Lecture 1-4
76 pages
Deep Learning Curriculum
No ratings yet
Deep Learning Curriculum
23 pages
M3 L1 RR1
No ratings yet
M3 L1 RR1
57 pages
Deep Learning Course Overview
100% (2)
Deep Learning Course Overview
639 pages
Project Report Image Recognition: Btech Cse (Chandigarh University)
No ratings yet
Project Report Image Recognition: Btech Cse (Chandigarh University)
30 pages
Comprehensive PyTorch Coding Challenges Across Mac
No ratings yet
Comprehensive PyTorch Coding Challenges Across Mac
5 pages
Human Computation and Computer Vision
No ratings yet
Human Computation and Computer Vision
50 pages
Pehlivan 2019
No ratings yet
Pehlivan 2019
4 pages
HAPE - Hardware-Aware LLM Pruning For Efficient On-Device Inference Optimization
No ratings yet
HAPE - Hardware-Aware LLM Pruning For Efficient On-Device Inference Optimization
18 pages
IAI Sp2025 Session 15 - Improving LLMs
No ratings yet
IAI Sp2025 Session 15 - Improving LLMs
34 pages
Best Project Ideas in Web Dev
No ratings yet
Best Project Ideas in Web Dev
11 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Single-Shot Pruning and Quantization For Hardware-Friendly Neural Network Acceleration
No ratings yet
Single-Shot Pruning and Quantization For Hardware-Friendly Neural Network Acceleration
8 pages
001 Intro
No ratings yet
001 Intro
66 pages
Py Torch
50% (2)
Py Torch
23 pages
Deep Learning: Unsupervised Methods
No ratings yet
Deep Learning: Unsupervised Methods
60 pages
Year 1 - Python, Math & Foundations of AI
No ratings yet
Year 1 - Python, Math & Foundations of AI
48 pages
Deep Learning
No ratings yet
Deep Learning
30 pages
Project Report - Intro To AI
No ratings yet
Project Report - Intro To AI
40 pages
Ann 5TH
No ratings yet
Ann 5TH
98 pages
01 Intro
No ratings yet
01 Intro
49 pages
The Role of Big Data in AI C13
No ratings yet
The Role of Big Data in AI C13
11 pages
Abb E-Clipse Bypass Configurations (BCR, BDR, VCR, or VDR) For Ach 550 User Manual
No ratings yet
Abb E-Clipse Bypass Configurations (BCR, BDR, VCR, or VDR) For Ach 550 User Manual
100 pages
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
No ratings yet
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
65 pages
TJR TUJR WF4 Manual 01 25 15
No ratings yet
TJR TUJR WF4 Manual 01 25 15
62 pages
Dsei30 06a
No ratings yet
Dsei30 06a
3 pages
State of The Art Reliability
No ratings yet
State of The Art Reliability
39 pages
Confirmatory Composite Analysis Guide
No ratings yet
Confirmatory Composite Analysis Guide
10 pages
VLSI Testing - DFT and Scan
No ratings yet
VLSI Testing - DFT and Scan
35 pages
Magnetically Levitated Ball
No ratings yet
Magnetically Levitated Ball
4 pages
Gatling 2
No ratings yet
Gatling 2
10 pages
Nextreme Whitepaper Design Considerations For TEG System Optimization NWP003.1
No ratings yet
Nextreme Whitepaper Design Considerations For TEG System Optimization NWP003.1
14 pages
CIS Docker Benchmark v1.5.0 PDF
No ratings yet
CIS Docker Benchmark v1.5.0 PDF
292 pages
CORPORATE - FORM Internet Banking
No ratings yet
CORPORATE - FORM Internet Banking
2 pages
Python Lab
No ratings yet
Python Lab
21 pages
Now and Get: Best VTU Student Companion You Can Get
No ratings yet
Now and Get: Best VTU Student Companion You Can Get
5 pages
Slicing 1
No ratings yet
Slicing 1
7 pages
Polymorphism Assignment
No ratings yet
Polymorphism Assignment
5 pages
Mamata Java Developer
No ratings yet
Mamata Java Developer
7 pages
Week 4 Cyber Attacks On Online Learning Platforms Transcript
No ratings yet
Week 4 Cyber Attacks On Online Learning Platforms Transcript
3 pages
KHUSH
No ratings yet
KHUSH
21 pages
Coronnello Et Al. - 2005 - Sector Identification in A Set of Stock Return Time Series Traded at The London Stock Exchange (2) - Annotated
No ratings yet
Coronnello Et Al. - 2005 - Sector Identification in A Set of Stock Return Time Series Traded at The London Stock Exchange (2) - Annotated
27 pages
Snapdragon 616 Processor Product Brief
No ratings yet
Snapdragon 616 Processor Product Brief
2 pages
Log
No ratings yet
Log
4 pages
Gate Controlled Switch
No ratings yet
Gate Controlled Switch
14 pages
Egbuziem CelestinaCV
No ratings yet
Egbuziem CelestinaCV
3 pages
mANT30 PDF
No ratings yet
mANT30 PDF
1 page

W01 PracticalProblemsProjects

Uploaded by

W01 PracticalProblemsProjects

Uploaded by

Practical Problems & Projects

11-767: On-Device Machine Learning

Prof. Emma Strubell

Apple Research Blog July 2021

Yonatan Bisk & Emma Strubell

Why is this hard?

Why is this hard On-Device?

• Apple has their own neural engine for acceleration.

What is a naive algorithm we might use?

Inference Pipeline 1. Two feature representations (2 * model)

Note: runs periodically,

• So this is quickly learnable (optimally)

• Skipping important details here because the model is

• Double channels “within limits of computation”

• Bottleneck expansions are smaller and added attention

• Wider ~= performance to deeper (but faster)

• Attention adds performance with little to no new parameters

Margin ensures weighting on hard examples

1. Filtering Unclear Faces (no details)

2. Augmentations: “pixel-level changes such as color jitter or grayscale conversion, structural

3. COVID-19: “we designed a synthetic mask augmentation. We used face landmarks to

• Assignment via Convex coding (minimal updates)

• Wider (shallower) networks

We provide: You decide:

• Lab 2: Benchmarking • Hardware: Laptop, robot, RPi…

• Lab 3: Quantization • Model: ResNet, Transformer,

• Speech-to-text translation: Automatic speech

• Im2Cal: Estimating food calories from image by

• Hey where’s that thing: Temporal localization in

• Shazaam: On-device music recognition w/

• v1.0 (2015): Find tweets with positive/negative

• V2.0 (2023): Use an LLM to generate thirst-related

• ^This is an example baseline using open source

• Hardware support: Logic, quantization, sparse ops, batching…

• Novelty: Reproduction vs transfer (new data/hardware) vs novel?

• In-distribution or transfer: Fitting to in-distribution data, vs. adapting to a new

Resource Optimized Resource Constrained

• Magnitude pruning • Structured pruning / layer pruning

• Server • Edge device

• Quantization (3-bit) • Quantization (8-bit) if hw supports

• Where do you store model weights, activations, gradients?

• Trade-off between storage size, speed, and on-the- y computation

• Do I want on-device training? Fine-tuning?

• How heavy is the OS? How heavy are USB vs GPIO?

• Does your hardware support ef cient batched computation? Ef cient low-

• Does theoretically proven idea Y published in ML venue apply to larger, more

• Can you further optimize an already-ef cient model?

Project is not Project is

• Multimodal Machine Learning • Adjusting data for 👆

• What is a baseline I can feasibly train/evaluate in a few hours?

• How can I sub-sample my data to create a feasible train/test set?

• Single domain? Limited label space? Simpli ed task?

• Goal: Performance that’s non-trivial but do not need competitive performance

• What is unique about my data/task/… that makes me think I can

You might also like