0% found this document useful (0 votes)

112 views34 pages

Mathematical Algorithms For Artificial Intelligence and Big Data

This document provides an overview of a course on mathematical algorithms for artificial intelligence and big data. The course covers topics such as singular value decomposition, principal component analysis, data clustering, linear and nonlinear dimension reduction, and deep learning. It also discusses challenges posed by big data such as high dimensionality and the growth of unstructured data sources. The goal is to introduce mathematical concepts and algorithms that can help process and analyze massive amounts of data.

Uploaded by

Jose Ramon Villatuya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views34 pages

Mathematical Algorithms For Artificial Intelligence and Big Data

Uploaded by

Jose Ramon Villatuya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Mathematical Algorithms

for Artificial Intelligence and Big Data

Thomas Strohmer
Department of Mathematics
University of California, Davis

Spring 2017
Course Objective

Experiments, observations, and numerical simulations in many

areas of science nowadays generate massive amounts of data.

This rapid growth heralds an era of "data-centric science,"

which requires new paradigms addressing how data are
acquired, processed, distributed, and analyzed.

This course covers mathematical concepts and algorithms

(many of them very recent) that can deal with some of the
challenges posed by Artificial Intelligence and Big Data.
Details about this Big Data course

This course is about mathematical methods for Big Data

Prerequisite:
Linear algebra and a basic experience in programming
(preferably Matlab) will be required. Solid basis in
undergraduate mathematics is recommended.

What this class is not about:

Formal software development
Database theory
Specific applications
Heuristic methods that lack mathematical foundations
(well, except for deep learning ...)
Textbooks

There is no required textbook. The following books contains

some material on these topics (but there is no need to buy
these books)
C. Bishop. Pattern Recognition and Machine Learning.
F. Cucker, D. X. Zho. Learning Theory: an approximation
theory viewpoint.
S. Foucart and H. Rauhut. A mathematical introduction to
compressive sensing.
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of
Statistical Learning: Data Mining, Inference and Prediction.
Michael W. Mahoney. Randomized Algorithms for Matrices
and Data.
Textbook in development

Notes from the book draft will be made available.

Grading Scheme

50% Homework: will be assigned about every other week.

A subset of these problems will be graded.
50% Final Project
Final Project:
Write a 8-page (or so) report on one of the following topics:
Describe how some of the methods you learned in this
course will be used in your research.
Find a practical application yourself (not copying from
papers/books) using the methods you learned in this
course; describe how to use them; include numerical
demonstrations.
Find an interesting data set and present a careful
numerical comparison of existing algorithms related to one
of the topics of this couse.
If in doubt, please ask me!
Teaching Assistants

Shuyang Ling Yang Li

Goal and challenges of Big Data

Goal: The goal is to turn data into information

Challenges: Capture, curation, time-limitations, storage,

search, sharing, transfer, analysis, and visualization of the data.

Data can be massive, non-static, multi-modal, incomplete,

noisy, non-random, unstructured, dynamic, streaming, ...
“Data is the new (crude) oil for the economy!”
“Data is the new (crude) oil for the economy!”

You are not Google’s customer.

“Data is the new (crude) oil for the economy!”

You are not Google’s customer.

You are Google’s commodity (crude oil)

Big Data Everywhere!

Lots of data is being collected and warehoused

Web data (often user-provided)
e-commerce, purchases at stores
Medical data, health care
Bank/Credit Card transactions
Social Network
Traffic, GPS, ...
Scientific experiments
...
How much data?

YouTube contains 120 million videos

and 72 hours of video uploaded
every minute.
Google processes 3.5 billion
requests per day
There is currently an estimate of 3.8
trillion photographs, 10% of them
taken in the last year.
Facebook has about 140 billion
images with about 300 million new
images a day.
2.5PB are flowing through Walmart’s
databases
NYSE collects 1 TB each day.
How much data?

CERN’s Large Hydron Collider

generates 15 PB a year
The BRAIN initiatives produce
terabytes of data a day
The Large Synoptic Survey
Telescope in Chile will collect
30TB per night. Headed by
Tony Tyson from UC Davis
How much data?

Governments (USA, China, Russia, UK,

Israel, Germany, ...) collect ??? PB /day
How much data?

Governments (USA, China, Russia, UK,

Israel, Germany, ...) collect ??? PB /day

The CIA (via In-Q-Tel) was an early

investor in Facebook
How much data?

Governments (USA, China, Russia, UK,

Israel, Germany, ...) collect ??? PB /day

The CIA (via In-Q-Tel) was an early

investor in Facebook

Somewhere in Nevada is an 8-Football field large storage area

that collects all the emails sent in the USA.
More Data ...

Experts now predict that 40 zettabytes of data will be in

existence by 2020.

Big Data does not just mean massive amounts of data

Big Data also means complex data
Heterogeneous data
Incomplete data
Unstructured/semi-structured Data
Graph Data
Social Network, Semantic Web
Streaming Data
Big Data is not new

Seismic data acquisition and processing

Census
Wall Street hedge funds (e.g. Renaissance Technologies)
Governments
Banks, Insurances
Scientific Research
Big Data Tasks

Discovery of useful, possibly unexpected, patterns in data

Non-trivial extraction of implicit, previously unknown and
potentially useful information from data
Finding outliers (security threat, credit card theft, ...)
Clustering
Classification
Object recognition
Visualization, dimension reduction
“Data cleaning”: denoising, smoothing, grouping, ...
Association Rule Mining (Costumers who buy X often
buy Y, Costumer 123 likes product p10)
Collaborative filtering: users collaborate in filtering
information to find information of interest (Amazon, Netflix)
Meta Data Analysis

The idea is 100 years old (see Karl Pearson), but its full
potential will be unleashed only now.

Example:
In a recent analysis researchers developed a framework for
comparing classiffers common in Machine Learning (Boosted
decision trees, Random Forests, SVM, KNN, PAM and DLDA)
based on a standard series of datasets.

Result: A simple (but mathematically rigorous) method gave

better classification results across the data sets than the
“glamorous” methods.

The dawning Age of Big Data will make it not just possible but
very common (and perhaps necessary?) to validate methods
via such meta data analyses.
Big Data Startups

Crunchbase records more than 2900 Startups and

Angellist more than 3500 Startups in "Big Data"

Two random examples (out of 1000+?) of Bay area startups:

Forensic Logic (Walnut Creek): Crime analysis
23andMe (Mountain View): Genomics

Two startups by mathematicians:

ThetaRay: Cybersecurity (R.R. Coifman, Amir Averbuch)
Ayasdi: Topological data analysis (Gunnar Carlsson)
Many Data Initiatives Nationwide

Campus-wide initiatives at NYU, Columbia, Michigan, Harvard,

MIT, Berkeley, ...

New Master’s Degree programs in Data Science, for example at

Berkeley, NYU, Stanford, UC Davis, ...

New Alan Turing Institute for Data Sciences in UK

For a long list across the world see

http://data-science-university-programs.silk.co
Topic Overview (tentative)

Basic goals of AI and Machine Learning

Curses and blessings of dimensionality,
Surprises in high dimensions
Singular Value Decomposition,
Principal Component Analysis
Data Clustering: k-means, graph Laplacian
Linear dimension reduction, random projections
Nonlinear dimension reduction, diffusion maps,
manifold learning, intrinsic geometry of data,
Some basics on Deep Learning
High-dim. probability; Curses and blessings

Things in high dimension can behave very differently than in

low dimension.
High-dim. probability; Curses and blessings

Things in high dimension can behave very differently than in

low dimension.

A cube in high dimensions does not look like this:

High-dim. probability; Curses and blessings

Things in high dimension can behave very differently than in

low dimension.

A cube in high dimensions looks like this:

SVD and PCA

Singular Value Decomposition Principal Component Analysis

Dimension reduction

Linear dimension reduction and random projections

Johnson-Lindenstrauss projections
Clustering

A basic task in data analysis is clustering:

k-means: advantages and limitations

Graph Laplacian, spectral clustering

Diffusion maps

What is a diffusion map?

Manifold learning

Intrinsic geometry of data

Nonlinear dimension reduction

Deep Learning
Deep Learning: neural network with more than one layer

Deep networks achieve state-of-the-art results in several

complex object recognition tasks

They learn a huge network of filter banks and non-linearities on

large datasets

Heuristic method, a lot of trial-and-error

Almost no mathematical theory (yet)

And last but not least

Algorithms for AI and Big Data are powerful.

Use your power responsibly and carefully.

And last but not least

Algorithms for AI and Big Data are powerful.

Use your power responsibly and carefully.

Einstein: “Not everything that can be counted, counts.

And not everything that counts, can be counted.”

Article - Deploying With FDM PDF
No ratings yet
Article - Deploying With FDM PDF
17 pages
SoC or System On Chip Seminar Report
89% (18)
SoC or System On Chip Seminar Report
25 pages
AI2SD2019 Paper 230
No ratings yet
AI2SD2019 Paper 230
13 pages
PPT01-Introduction To Big Data
No ratings yet
PPT01-Introduction To Big Data
34 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
5introduction Data Science
No ratings yet
5introduction Data Science
46 pages
Big Data Basics for CS Students
No ratings yet
Big Data Basics for CS Students
10 pages
BDCC Unit 1
No ratings yet
BDCC Unit 1
165 pages
1.introduction To Data Science
No ratings yet
1.introduction To Data Science
23 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Big Data Management Course Overview
No ratings yet
Big Data Management Course Overview
26 pages
Big Data Analytics - AAM - Unit 1
No ratings yet
Big Data Analytics - AAM - Unit 1
178 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
232 pages
Reema Aladerawi Big Data Adavnced Group (B)
No ratings yet
Reema Aladerawi Big Data Adavnced Group (B)
15 pages
Unit - 1
No ratings yet
Unit - 1
104 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
Unit 2 Da
No ratings yet
Unit 2 Da
69 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
31 pages
Big Data
No ratings yet
Big Data
35 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Big Data Introduction
No ratings yet
Big Data Introduction
41 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
BCA Lecture I
No ratings yet
BCA Lecture I
20 pages
Chap1-Overview of Data Science
No ratings yet
Chap1-Overview of Data Science
50 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Big Data Analytics Unit1
No ratings yet
Big Data Analytics Unit1
20 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
19 pages
Introduction to Data Science Concepts
100% (1)
Introduction to Data Science Concepts
167 pages
Basic Concepts in Big Data
No ratings yet
Basic Concepts in Big Data
10 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Unit4 - DataAnalytics and IoT PDF
No ratings yet
Unit4 - DataAnalytics and IoT PDF
40 pages
FDS Module I-I
No ratings yet
FDS Module I-I
38 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Module 1
No ratings yet
Module 1
90 pages
BigData AmberSahai1
No ratings yet
BigData AmberSahai1
32 pages
Activ Steps
No ratings yet
Activ Steps
11 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Project Report
No ratings yet
Project Report
29 pages
Foundations of Data Science PPT TEXT BOOK
No ratings yet
Foundations of Data Science PPT TEXT BOOK
132 pages
20IT501 BDA Unit1
No ratings yet
20IT501 BDA Unit1
18 pages
Bda U1
No ratings yet
Bda U1
78 pages
Data Science
No ratings yet
Data Science
40 pages
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
100% (1)
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
75 pages
BigData Nptel
100% (1)
BigData Nptel
813 pages
BIG Data - Unit - 1
No ratings yet
BIG Data - Unit - 1
24 pages
Data Analytics & Big Data Overview
No ratings yet
Data Analytics & Big Data Overview
64 pages
Research Paper On Hadoop
No ratings yet
Research Paper On Hadoop
47 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
Data Science and Big Data Analytics - Unit - 1
No ratings yet
Data Science and Big Data Analytics - Unit - 1
47 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
Unit - 1
No ratings yet
Unit - 1
46 pages
Digital Fluency Notes PDF
No ratings yet
Digital Fluency Notes PDF
38 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
Big Data Analysis
No ratings yet
Big Data Analysis
3 pages
Data-Mining FINAL
No ratings yet
Data-Mining FINAL
45 pages
Errata BATutorial SecondEdition
No ratings yet
Errata BATutorial SecondEdition
1 page
CIR 941& 963 - Orientation - Apr18
No ratings yet
CIR 941& 963 - Orientation - Apr18
2 pages
1803 09288
No ratings yet
1803 09288
73 pages
All Exercises R
No ratings yet
All Exercises R
21 pages
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
No ratings yet
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
43 pages
Deeper Understanding, Faster Calculation - Exam P Insights & Shortcuts 20th Edition
No ratings yet
Deeper Understanding, Faster Calculation - Exam P Insights & Shortcuts 20th Edition
28 pages
Business Analytics
No ratings yet
Business Analytics
4 pages
978 951 39 6777 2 - Vaitos21102016
No ratings yet
978 951 39 6777 2 - Vaitos21102016
196 pages
2014 Fall b6101 Syllabus
No ratings yet
2014 Fall b6101 Syllabus
6 pages
Deeper Understanding, Faster Calculation - Exam P Insights & Shortcuts 20th Edition
No ratings yet
Deeper Understanding, Faster Calculation - Exam P Insights & Shortcuts 20th Edition
28 pages
A Spreadsheet Approach To Business Quantitative Methods
No ratings yet
A Spreadsheet Approach To Business Quantitative Methods
16 pages
Excel Ninja Tips and Tricks Guide
No ratings yet
Excel Ninja Tips and Tricks Guide
12 pages
New Board All Set To Set Sail in 2017-2018
No ratings yet
New Board All Set To Set Sail in 2017-2018
6 pages
2010AFNCE100003
No ratings yet
2010AFNCE100003
4 pages
DataSeer Training Prospectus
No ratings yet
DataSeer Training Prospectus
25 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Modeling PL and BS Items
No ratings yet
Modeling PL and BS Items
2 pages
8051 Notes New
No ratings yet
8051 Notes New
70 pages
Useful Function Keys and Shortcuts
No ratings yet
Useful Function Keys and Shortcuts
2 pages
Nigerian Grid Voltage Optimization
No ratings yet
Nigerian Grid Voltage Optimization
1 page
Understanding The Stack
No ratings yet
Understanding The Stack
119 pages
Similiumanual
0% (1)
Similiumanual
87 pages
CL Programming Guide
100% (1)
CL Programming Guide
87 pages
Carrier Prog 300708
No ratings yet
Carrier Prog 300708
11 pages
Algorithm Analysis and Design
No ratings yet
Algorithm Analysis and Design
83 pages
Xigmanas Guide For Creating An Iscsi Target From A Zfs Volume
No ratings yet
Xigmanas Guide For Creating An Iscsi Target From A Zfs Volume
18 pages
GDG Interview Questions
No ratings yet
GDG Interview Questions
5 pages
Dell™ Latitude™ E5510 Discrete Service Manual: Notes, Cautions, and Warnings
No ratings yet
Dell™ Latitude™ E5510 Discrete Service Manual: Notes, Cautions, and Warnings
76 pages
SQL Exercises
No ratings yet
SQL Exercises
13 pages
Venter Review 2010
No ratings yet
Venter Review 2010
12 pages
Caching Strategies in Databases
No ratings yet
Caching Strategies in Databases
21 pages
Readme PDF
100% (1)
Readme PDF
5 pages
Signal Sampling, Quantization, Binary Encoding: Oleh Albert Sagala
No ratings yet
Signal Sampling, Quantization, Binary Encoding: Oleh Albert Sagala
46 pages
Cool:gen for Enterprise Developers
100% (2)
Cool:gen for Enterprise Developers
16 pages
MotoHawk Software PS
No ratings yet
MotoHawk Software PS
6 pages
Cubic Spline Assignment
No ratings yet
Cubic Spline Assignment
12 pages
CSE209 Computer Organization and Architecture 4 3-1-0
No ratings yet
CSE209 Computer Organization and Architecture 4 3-1-0
2 pages
Software Design and Architecture
No ratings yet
Software Design and Architecture
36 pages
UNIX Case Study PDF
No ratings yet
UNIX Case Study PDF
10 pages
An Insight Into Adobe Document Services (ADS)
No ratings yet
An Insight Into Adobe Document Services (ADS)
13 pages
Creating A Calculator Visual Studio C#
No ratings yet
Creating A Calculator Visual Studio C#
19 pages
C CourseManual 201dfg3 PDF
No ratings yet
C CourseManual 201dfg3 PDF
232 pages
Academic & Research Profile
No ratings yet
Academic & Research Profile
4 pages
AutoFormplus & CATIA System Requirements
No ratings yet
AutoFormplus & CATIA System Requirements
2 pages
CDM SMITH Fusion EBS Integration17SEP16
No ratings yet
CDM SMITH Fusion EBS Integration17SEP16
21 pages