Detect outliers with 3 methods: LOF, DBSCAN and one-class SVM
- Required packages can be installed with the following command:
pip install -r requirements.txt
consumption_data.xlsis provided. There are 4 columns with 940 entries. The first column denotes entry ID, which is ignored in detecting outliers. Therefore, the data entries are 3-dimensional.- Get numpy array data with size
[940, 3]with the following code (check outdataset.pyfor implementation):
from dataset import get_dataset
data = get_dataset()- Data visualization:
For detailed descriptions please see report.pdf.
- Check out
lof.pyfor implementation. - Result:
- Check out
dbscan.pyfor implementation. - Result:
- Check out
svdd.pyfor implementation. - Result with Gaussian kernel:
- Result with linear kernel:
Zhongyu Chen