Roboadvisor Performance Prediction
Roboadvisor Performance Prediction
Given:
1. Element x (ex: customers, investments), and
2. Categories k (ex: high/medium/low credit risk, do or dont invest,
expected return).
3. Performance metrics (ex: precision, accuracy, recall, training time)
4. Objective to allocate x to k at a given level of performance
Determine:
1. if the state-of-the-art in machine learning can meet the objective
2. if not, extrapolate the timeframe in which the objective can be
achieved.
Solution (special case):
For the special case where the only metric is training time, and a machine
learning algorithm f has been selected, we can determine if training time can
be met, and if not, the time frame in which it could be, based on an
extrapolation of the necessary CPU and/or IOPS capability.
Solution (general case):
For the general case, there is currently no solution:
First, there is no theory that can predict the set of attributes (aka features)
necessary to categorize x into k with a given level of performance (see quote
in Appendix). Further predicting feature discovery against time is harder.
Second, there is no theory that can predict whether a machine learning
algorithm meeting the objective exists or how long it would take to invent
such an algorithm. Lastly, performance may be limited by other factors:
The number of categories
The degree of separation between categories (ex: delineating
between cat and dog is easier than cat and kitten)
Limitations of the learning algorithm selected
Quality of the training data
Predicting how to address these factors is difficult, and the fixes may not be
time dependent.
Given the current intractability of the problem, an approach based on [1]
seems the best alternative:
A. Define x, k, and M (the training dataset), and target performance metrics
B. Until success or failure, iterate:
1. Identify algorithm(s) and attributes
2. Train against the dataset
3. Generate performance charts (example in figure below)
4. Test: are performance limits being approached asymptotically?
A. Yes: revisit step 1
REFERENCES
[1] Sukumar, Sreenivas R. and Edl-Castillo-Negrete, Carlos E., Machine
Learning for Big Data: A Study to Understand Limits of Performance at
Scale, ORNL/TM-2015/344, 12/21/2015,
http://info.ornl.gov/sites/publications/files/Pub56662.pdf, retrieved
11/26/2016.