Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
69 views2 pages

Roboadvisor Performance Prediction

The document outlines a problem statement for a roboadvisor that must allocate elements to categories based on performance metrics. It discusses that currently there is no general solution to determine if machine learning can meet the objectives or the timeframe needed due to limitations in predicting necessary attributes, algorithms, and how to address limiting factors like number of categories and quality of training data.

Uploaded by

msa83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views2 pages

Roboadvisor Performance Prediction

The document outlines a problem statement for a roboadvisor that must allocate elements to categories based on performance metrics. It discusses that currently there is no general solution to determine if machine learning can meet the objectives or the timeframe needed due to limitations in predicting necessary attributes, algorithms, and how to address limiting factors like number of categories and quality of training data.

Uploaded by

msa83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Problem statement for a Roboadvisor

Given:
1. Element x (ex: customers, investments), and
2. Categories k (ex: high/medium/low credit risk, do or dont invest,
expected return).
3. Performance metrics (ex: precision, accuracy, recall, training time)
4. Objective to allocate x to k at a given level of performance
Determine:
1. if the state-of-the-art in machine learning can meet the objective
2. if not, extrapolate the timeframe in which the objective can be
achieved.
Solution (special case):
For the special case where the only metric is training time, and a machine
learning algorithm f has been selected, we can determine if training time can
be met, and if not, the time frame in which it could be, based on an
extrapolation of the necessary CPU and/or IOPS capability.
Solution (general case):
For the general case, there is currently no solution:
First, there is no theory that can predict the set of attributes (aka features)
necessary to categorize x into k with a given level of performance (see quote
in Appendix). Further predicting feature discovery against time is harder.
Second, there is no theory that can predict whether a machine learning
algorithm meeting the objective exists or how long it would take to invent
such an algorithm. Lastly, performance may be limited by other factors:
The number of categories
The degree of separation between categories (ex: delineating
between cat and dog is easier than cat and kitten)
Limitations of the learning algorithm selected
Quality of the training data
Predicting how to address these factors is difficult, and the fixes may not be
time dependent.
Given the current intractability of the problem, an approach based on [1]
seems the best alternative:
A. Define x, k, and M (the training dataset), and target performance metrics
B. Until success or failure, iterate:
1. Identify algorithm(s) and attributes
2. Train against the dataset
3. Generate performance charts (example in figure below)
4. Test: are performance limits being approached asymptotically?
A. Yes: revisit step 1

B. No: continue with step 2


Of course, this approach is empirically guided trial-and-error, however it
reflects the current state of machine learning theory.

Figure 1. Example performance charts [1]

REFERENCES
[1] Sukumar, Sreenivas R. and Edl-Castillo-Negrete, Carlos E., Machine
Learning for Big Data: A Study to Understand Limits of Performance at
Scale, ORNL/TM-2015/344, 12/21/2015,
http://info.ornl.gov/sites/publications/files/Pub56662.pdf, retrieved
11/26/2016.

APPENDIX: Quote from [1]


Today, data scientists do not have approximations or a deterministic
theoretical bound on the number of dimensions d required for a feature
matrix M. There are no guarantees on the expected accuracy and precision
and no automated way to design feature spaces given raw datasets.

You might also like