Find mislabeled frames? #1253

adam-matic · 2023-03-30T11:12:19Z

adam-matic
Mar 30, 2023

Hi, I have a dataset with about 1k frames, with possible multiple instances in each frame, but it is labeled by non-experts. I would like to see the frames from the training, validation and test sets that have the most errors - this would hopefully point to mislabeled frames.

I've looked in the documentation, and it seems like the useful metric could be the "dist.dists"? If I just load the model metrics (with sleap.load_metrics ), can I identify which frames are "bad" ? I've tried, I can't see how.

Another option from the documentation is to load the model, then load the whole set of labels, and do model.predict (labels) to get predictions.
Then I have a set of ground truth labels and predicted ones. Can I somehow calculate the metrics manually, while keeping identity of the frames?

Thanks

Answered by roomrys

Mar 30, 2023

Hi @adam-matic,

I just took a look in the sleap.nn.evals.py file to check how the dist.dists are ordered and it looks like we reorder the labeled frames by video before computing the distances between instances.

sleap/sleap/nn/evals.py

Lines 90 to 91 in b305eda

     # Find labeled frames in this video.  
   labeled_frames_gt = labels_gt.find(video_gt)

If you have just a single video, then that makes things a bit easier since the LabeledFrames (and the dist.dists matrix) will be ordered just as they are in the original Labels.labeled_frames (assuming that all ground truth frames have a corresponding predicted frame or "positive pair").

Warning: If your labels_pr and labels_gt …

View full answer

roomrys · 2023-03-30T21:16:32Z

roomrys
Mar 30, 2023
Maintainer

Hi @adam-matic,

I just took a look in the sleap.nn.evals.py file to check how the dist.dists are ordered and it looks like we reorder the labeled frames by video before computing the distances between instances.

sleap/sleap/nn/evals.py

Lines 90 to 91 in b305eda

    
           # Find labeled frames in this video. 
        
           labeled_frames_gt = labels_gt.find(video_gt)

If you have just a single video, then that makes things a bit easier since the LabeledFrames (and the dist.dists matrix) will be ordered just as they are in the original Labels.labeled_frames (assuming that all ground truth frames have a corresponding predicted frame or "positive pair").

Warning: If your labels_pr and labels_gt do not have the same number of labeled frames then the solutions below will not work since some off the labeled frames will be thrown out since there is nothing to compare to.

Thus, we could just use the index of the dists matrix to also index into our ground truth labels to find the corresponding frame:

# Untested code, but should work as a template if anything
labels_gt = labels.load_file("path/to/labels_gt/in/the/trained/model/directory/labels_gt.split_of_interest.slp")
metrics= load_metrics("path/to/trained/model/directory", "replace_with_split_of_interest")
dists = metrics["dist.dists"]

idx_dists = 4
dist_metric_for_single_frame = dists[idx_dists]
corresponding_labeled_frame = labels_gt.labeled_frames[idx_dists]

frame_idx = corresponding_labeled_frame.frame_idx
frame_video = corresponding_labeled_frame.video

Note: Each trained model will have it's own folder (i.e. models\230227_141847.multi_instance.n=150) which will include both the ground truth and predicted datasets for each split (e.g. labels_gt.train.slp and labels_pr.train.slp). When I refer to a "ground truth dataset", I am referring to the labels_gt datasets stored in the trained model's directory.

If you have multiple videos, this reordering would make it a bit more difficult to invert from dist.dists back to the labeled frames, but not impossible. To recreate our map, we can rely on the predictability of the Labels.find function to spit out the frames in the same order if we re-run it on our ground truth dataset. This new order of LabeledFrames should follow the same order of our dist.dists matrix.

# labels_gt and dists are the same as above

reordered_labels = []
for video in labels_gt.videos:
    reordered_labels.append(labels_gt.find(video))

idx_dists = 4
dist_metric_for_single_frame = dists[idx_dists]
corresponding_labeled_frame = reordered_labels.labeled_frames[idx_dists]

Arguably, rerunning the evaluation on your own slp might be better depending on how comfortable you are with using the SLEAP API (and possibly hacking at the code a bit). If you want to hack away, it might be easiest just to create and return a map for reference while computing the distance metrics - which is a great idea for integrating into SLEAP if users want this sort of information! The relative file is sleap.nn.evals.py and the relative functions are evaluate, find_frame_pairs, match_frame_pairs, and of course compute_dists.

Let me know what you decide to do and if you need any help.

Thanks,
Liezl

1 reply

adam-matic Mar 31, 2023
Author

Fantastic, thank you! I have many videos, I'll try hacking away, creating a map that includes the distance metric, and reference to individual instances and frames.

adam-matic · 2023-03-31T15:21:52Z

adam-matic
Mar 31, 2023
Author

Here is my current solution:

from sleap.nn.evals import *


def evaluate_with_refs(labels_gt: Labels,
              labels_pr: Labels,
              oks_stddev: float = 0.025,
              oks_scale: Optional[float] = None,
              match_threshold: float = 0,
              user_labels_only: bool = True,
              )-> Dict[Text, Union[float, np.ndarray]]: 

    
    metrics = dict()
    
    frame_pairs = find_frame_pairs(labels_gt, labels_pr, user_labels_only = user_labels_only)
    if not frame_pairs: 
        print("no frame pairs")
        return metrics
    
    positive_pairs, false_negatives = match_frame_pairs(
        frame_pairs,
        stddev=oks_stddev,
        scale=oks_scale,
        threshold=match_threshold,
        user_labels_only=user_labels_only)
    
   dists2 = []
    
    for instance_gt, instance_pr, _ in positive_pairs:
        points_gt = instance_gt.points_array
        points_pr = instance_pr.points_array

        d = np.linalg.norm(points_pr - points_gt, axis=-1)
        d2 = {"mean_dist": np.mean(d), "dist": d, "instance_gt": instance_gt, "instance_pr": instance_pr}
        dists2.append(d2)

  
    return dists2

Then I do:

model_path = "models/model_folder"
labels_gt = Labels.load_file(model_path + "/labels_gt.train.slp")
labels_pr = Labels.load_file(model_path + "/labels_pr.train.slp")
dists = evaluate_with_refs(labels_gt, labels_pr)
dists.sort(key = lambda x: x["mean_dist"], reverse=True)

I got some pretty bad ones in the first ten or so.

1 reply

roomrys Mar 31, 2023
Maintainer

Nice! Thanks for coming back and sharing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Find mislabeled frames? #1253

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

	# Find labeled frames in this video.
	labeled_frames_gt = labels_gt.find(video_gt)

Find mislabeled frames? #1253

Uh oh!

adam-matic Mar 30, 2023

Replies: 2 comments · 2 replies

Uh oh!

roomrys Mar 30, 2023 Maintainer

Uh oh!

adam-matic Mar 31, 2023 Author

Uh oh!

Uh oh!

adam-matic Mar 31, 2023 Author

Uh oh!

roomrys Mar 31, 2023 Maintainer

adam-matic
Mar 30, 2023

Replies: 2 comments 2 replies

roomrys
Mar 30, 2023
Maintainer

adam-matic Mar 31, 2023
Author

adam-matic
Mar 31, 2023
Author

roomrys Mar 31, 2023
Maintainer