Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Pass to pass and fail to pass columns in the dataset have many bugs #505

Description

@nihaljn

Describe the bug

The SWE-Bench Verified dataset on Huggingface has two columns - FAIL_TO_PASS and PASS_TO_PASS. While the description of the dataset says that these are lists of test identifiers, in many cases it is not so.

For e.g., for the task django__django-16950, these are the values in the columns

FAIL_TO_PASS - ["If form data is provided, a parent's auto-generated alternate key is"]
PASS_TO_PASS - ["#24377 - Inlines with a model field default should ignore that default", "#24377 - If we're adding a new object, a parent's auto-generated pk", "#24958 - Variant of test_inlineformset_factory_nulls_default_pks for"]

I have observed bugs in these columns for many (quantification unknown right now) instances.

I'm looking to separate out the F2P and P2P tests in my evaluations, so the correct lists for these would be appreciated.

Is this a preprocessing issue? If so, any pointers to where the test identifiers are extracted?

Steps/Code to Reproduce

The buggy columns can be easily found on HuggingFace for the dataset: princeton-nlp/SWE-bench_Verified.

Expected Results

The columns for FAIL_TO_PASS and PASS_TO_PASS should be lists of strings where each string is a test case identifier.

Actual Results

The strings may be arbitrary, such as shown in the issue description above.

System Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions