-
Notifications
You must be signed in to change notification settings - Fork 17
Update seqspec check for checking files in IGVF portal #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
mingjiecn
commented
Aug 28, 2025
- Adding igvf related schemas for different filter. For files in igvf portal, some fields are not required anymore. And schema for igvf has read_id pattern added.
- Make sure version higher than 0.3.0 will still work with seqspec check and seqspec upgrade.
…terlab#58) * update seqspec check * add spec parameter back to check function
* support gzipped yaml file for function load_spec * fix bug in function run_check * support gzipped yaml file for function load_spec
|
Hi Mingjie, thank you for the PR. It’s important that seqspec works well for the IGVF consortium. I have a concern about the schema changes you proposed. The schema defined in the schema folder is meant to describe the structure of any seqspec file. Adding a new schema specific to IGVF introduces a second definition of what a valid seqspec file looks like. This creates ambiguity; should files be validated against the IGVF schema or the standard seqspec schema? Maintaining two diverging schemas makes future development harder and risks breaking consistency across the tool. For that reason, I think we should avoid adding a schema that is specific to IGVF. Could you clarify specific attributes about the seqspec file that need to be changed to support IGVF? Maybe there is a possibility of adding additional tooling that allows standard seqspec files to work with the IGVF portal. |
|
Hi, Sina. For schema seqspec_igvf_onlist_skip.schema.json, we changed some fields from required to not required. For seqspec_igvf.schema.json, we have the same change beside that we add pattern for read_id: https://github.com/pachterlab/seqspec/blob/b1f71df6650220f89def56f7f6c2e97a5945bef5/seqspec/schema/seqspec_igvf.schema.json#L314C12-L314C19. |
|
Hi Mingjie, I’ve been thinking a lot about this and I have a few thoughts. The proposed changes to the schema you made are listed below: Secondly, could you explain the rationale for removing the md5 requirement for the onlist files? Historically I’ve ran into issues where ive gotten the wrong onlist file and have the md5 match the file helps with that. If youd like i could implement in seqspec format the ability to autocompute the md5 for onlist file and populate it in the spec (keeping it required for seqspec check). Lastly, I don’t think its a good idea to bake in the IGVF regex check into the base schema. If youd like I could add in an a --filter-list parameter that can optionally take in a list of function names like check_igvf_ids which performs the validation on the python side. This keeps the base schema as is and allows users to validate against a custom set of validators. What do you think? |