Eye-movement event detection using random forest. This is an updated version of this code by r-zemblys. The updated version provided here is compatible with Python 3 and utilizes contemporary iterations of the essential packages. While the updated code version presented here strives for compatibility and modernization, it's important to note that there's no assurance of identical performance or behavior compared to the original. Should you opt to utilize this updated version, please cite the original paper authored by Zemblys et al. Cite as:
@article{zemblys2018irf,
title={Using machine learning to detect events in eye-tracking data},
author={Zemblys, Raimondas and Niehorster, Diederick C and Komogortsev, Oleg and Holmqvist, Kenneth},
journal={Behavior research methods},
volume={50},
number={1},
pages={160--181},
year={2018},
}Read also ./doc/IRF_replication_report.pdf for more information on the dataset and changes made in the post-processing routine.
IRF was developed using Python 3.11 programming language and number of packages for data manipulation and training machine learning algorithms. This section describes how to prepare required software, how to use IRF algorithm and how to train your own classifier.
An easy way of preparing your python environment is to use Anaconda - an open source package management system and environment management system that runs on Windows, macOS and Linux. To install Anaconda follow the instructions provided in https://www.anaconda.com/download/, then open your terminal and type:
conda create --name irf python=3.11
source activate irfThe next step is to install all required python libraries. Run the following commands in your terminal window:
pip install tqdm
pip install numpy
pip install pandas
pip install scikit-learn
pip install matplotlib
pip install astropyTo check if your environment is prepared correctly, run:
python run_irf.py --helpYou should see the following output:
usage: run_irf.py [-h] [--ext EXT] [--output_dir OUTPUT_DIR]
[--workers WORKERS] [--save_csv]
clf root dataset
Eye-movement event detection using Random Forest.
positional arguments:
clf Classifier
root The path containing eye-movement data
dataset The directory containing experiment data
optional arguments:
-h, --help show this help message and exit
--ext EXT File type
--output_dir OUTPUT_DIR
The directory to save output
--workers WORKERS Number of workers to use
--save_csv Save output as csv fileThis package includes a hand-labeled eye-movement dataset, called lookAtPoint_EL (see in ./etdata/). To parse this data using IRF, a pretrained model will be made available for download. Unzip and place it in ./models/ directory and run:
python run_irf.py irf_2018-03-26_20-46-41 etdata lookAtPoint_ELYou can also use custom --output_dir parameter if you like, otherwise output folder will be set to ./etdata/lookAtPoint_EL_irf.
After running the above command you will get messages like:
etdata/lookAtPoint_EL_irf/i2mc/lookAtPoint_EL_S1_i2mc.mat does not exist. Run i2mc extractor first!
etdata/lookAtPoint_EL_irf/i2mc/lookAtPoint_EL_S2_i2mc.mat does not exist. Run i2mc extractor first!
...One of the features (i2mc) requires third party software. Running IRF for the first time converts data into the format, that is required for I2MC the algorithm. Open ./util_lib/I2MC-Dev/I2MC_rz.m in MATLAB, edit folders.data to point to your output directory and run the code. It will extract and save i2mc features. Note that I2MC code uses random initiations to calculate data clusters and therefore each time you recalculate i2mc feature, it will be slightly different. Therefore if you care about reproducing your classification, use the same already extracted i2mc data.
Now run python run_irf.py irf_2018-03-26_20-46-41 etdata lookAtPoint_EL again. IRF will parse your data and save it as structured numpy arrays. It has also an option to save output in tab delimited text format: just add parameter --save_csv when running IRF.
The internal data format used by IRF is a structured numpy array with a following format:
dtype = np.dtype([
('t', np.float64), #time in seconds
('x', np.float32), #horizontal gaze direction in degrees
('y', np.float32), #vertical gaze direction in degrees
('status', np.bool), #status flag. False means trackloss
('evt', np.uint8) #event label:
#0: Undefined
#1: Fixation
#2: Saccade
#3: Post-saccadic oscillation
#4: Smooth pursuit
#5: Blink
])
That means one first needs to convert the dataset to this format. Note that dataset folder needs to have db_config.json file, that describes the geometry of the setup - physical screen dimensions in mm, eye distance in mm and screen resolution in pixels, for example:
"geom": {
"screen_width": 533.0,
"screen_height": 301.0,
"eye_distance": 565.0,
"display_width_pix": 1920.0,
"display_height_pix": 1080.0
}
Geometry also needs to be defined in ./util_lib/I2MC-Dev/I2MC_rz.m. Note that dimensions here are in cm! After preparing your data run the IRF code in a similar way described above.
To train your own classifier place your training data into dataset/train and your validation data into dataset/val directories. Note that dataset directory needs to contain db_config.json file that describes the geometry of the setup. Training and validation data needs to be in the structured numpy array format described above.
You can use ./utils_lib/data_prep/augment.py script to prepare lookAtPoint_EL dataset for training the IRF. Just run the script and it will augment data by resampling it to various sampling rates and will add noise to it. Furthermore the script will split data into the training/validation and testing sets. Remember to copy db_config.json to lookAtPoint_EL/training/.
In config.json you can adjust the training parameters:
{
"events": [1, 2, 3], #event labels to use; only fixation (1), saccade (2) and pso (3) are tested
"n_trees": 32, #number of trees to use
"extr_kwargs": { #feature extraction parameters
"w": 100, #context size for calculating features; in ms
"w_vel": 12,
"w_dir": 22,
"interp": false, #not used
"print_et": false #not used
},
"features": [ #features to use
"fs",
"disp",
"vel",
"acc",
"mean-diff",
"med-diff",
"rms",
"std",
"bcea",
"rms-diff",
"std-diff",
"bcea-diff",
"rayleightest",
"i2mc"
]
}
Now run:
python run_training.py etdata/lookAtPoint_EL training
This will perform feature extraction, train the IRF classifier and save it to the ./models/irf_datetime directory. Note that the training script will stop if the i2mc feature is used, in case of which you will need to run ./util_lib/I2MC-Dev/I2MC_rz.m before actually training the classifier. After i2mc is extracted, rerun the training script one more time.