Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ian-whaling
Copy link
Contributor

No description provided.

@ian-whaling
Copy link
Contributor Author

  1. In what assay is this region_type used?
    bulk and single cell CRISPR perturbation screens: Perturb-seq, TAP-seq, CRISPR FACS, CRISPR FlowFISH, Proliferation screens
  2. In what ways will the identification and extraction of the region_type be useful for sequence processing?
    In CRISPR-based assays the main feature is not a transcript (i.e. cDNA), but a guide RNA spacer. Adding sgrna_target allows tools like seqspec index -t kb to extract this region as a feature.
  3. What seqspec tools need to be modified to take advantage of this new region_type?
    seqspec_index.py

@sbooeshaghi
Copy link
Collaborator

Hi @ian-whaling, thank you for suggesting a new region_type and for updating the relevant code. A few points:

  1. What is the rationale for selecting the term sgrna_target in this context? Would the term feature_barcode work instead? My understanding is that much of the single-cell tooling uses “feature” to describe genes, peaks, barcodes, guides, etc., and pairing feature it with barcode would make it applicable to other assays that don’t use CRISPR but have barcodes referring to a feature (e.g., guide, protein tag).
  2. Do you anticipate the feature_barcode region_type to be used in other tools within seqspec index?
  3. I’ve recently rewritten and standardized much of the seqspec index code in devel. Your changes will need to be rebased or re-applied against that version. The update should be straightforward since the code you edited is unchanged in those updates.

Thank you again!

@ian-whaling
Copy link
Contributor Author

Hi @ian-whaling, thank you for suggesting a new region_type and for updating the relevant code. A few points:

  1. What is the rationale for selecting the term sgrna_target in this context? Would the term feature_barcode work instead? My understanding is that much of the single-cell tooling uses “feature” to describe genes, peaks, barcodes, guides, etc., and pairing feature it with barcode would make it applicable to other assays that don’t use CRISPR but have barcodes referring to a feature (e.g., guide, protein tag).
  2. Do you anticipate the feature_barcode region_type to be used in other tools within seqspec index?
  3. I’ve recently rewritten and standardized much of the seqspec index code in devel. Your changes will need to be rebased or re-applied against that version. The update should be straightforward since the code you edited is unchanged in those updates.

Thank you again!

@sbooeshaghi

  1. In CRISPR assays, the “guide” (sgRNA) is a functional, sequence-based targeting molecule, not a barcode. While its sequence can serve as an identifier similar to a barcode, the purpose of that sequence is to mediate targeting, not labeling. Referring to it as a "feature barcode" may imply its sole role is identification, which I think could be misleading. Also region_type enum already contains other modality-specific functional types (cdna, gdna, protein, atac, tag) rather than only abstract descriptors. Though my main concern for our usage is that the region_type we introduce is unique to the sgRNA target region (whatever it may be called) for passing in the region to kallisto, which I think feature_barcode would also suffice.
  2. Only format_kallisto_bus and format_kallisto_bus_force_single.
  3. Rebased.

@sbooeshaghi sbooeshaghi merged commit 8abaf96 into pachterlab:devel Aug 14, 2025
@sbooeshaghi
Copy link
Collaborator

Thank you for contributing, I've merged these changes into seqspec devel branch.

mingjiecn pushed a commit to IGVF-DACC/seqspec that referenced this pull request Aug 19, 2025
* added sgrna_target as region_type

* fixed format_kallisto_bus_force_single
mingjiecn pushed a commit to IGVF-DACC/seqspec that referenced this pull request Aug 22, 2025
* added sgrna_target as region_type

* fixed format_kallisto_bus_force_single
mingjiecn pushed a commit to IGVF-DACC/seqspec that referenced this pull request Aug 22, 2025
* added sgrna_target as region_type

* fixed format_kallisto_bus_force_single
mingjiecn added a commit to IGVF-DACC/seqspec that referenced this pull request Aug 23, 2025
sbooeshaghi added a commit that referenced this pull request Aug 24, 2025
…ase 0.4.0 (#73)

See docs/CHANGELOG.md for more details.

* update schema (#52)

* update file_exsits function to check file url in igvf portal (#53)

* adding seqspec spec tokenization

* allow https for remote onlist (#54)

* added regions files for some popular commercial methods

* corrected two file names

* Update seqspec check so we can run it directly in python script (#58)

* update seqspec check

* add spec parameter back to check function

* added python usage to docs

* support gzipped yaml file for function load_spec (#60)

* support gzipped yaml file for function load_spec

* fix bug in function run_check

* support gzipped yaml file for function load_spec

* added region files for additional methods

* enabled skipping checks with seqspec check

* updating seqspec-html to print read info

* CHECK-161-onlist (#63)

* ignore onlist in seqspec check when needed

* fix bug in seqspec check

* code review

* Devel fix tests (#65)

* Update tox to use newer python interpreter versions

* validate_check_args now returns a list

* Deal with ascii and png display function name changes

* protocols and kits are a controlled vocabulary now.

* Clear out the environment variables before running the tests

In case they happen to be set

* Rename files from .txt to .tsv ot better match DACC conventions

* Set a specific seqpec version as the structure keeps changing

* Structure now needs an files attribute

* Update test for remote access

needs to change more text in the example, and suppress the new call to
the network

* Update for more detailed onlist structure

many calls to create Onlists needed more attributes

* update onlist test to use preferred -i argument

-r was deprecated

* Reduce code repetition by using to_dict in __repr__

The return the dictionaries in the same order, might as well have
fewer places to update

* Be robust to missing values for the File and Read objects

All of the attributes for the File object are retrieved with getattr,
and the Read.files attribute introduced with 0.3 is protected with
getattr.

You do need to provide a default value with getattr if you want to
avoid an attribute error

* Add files object to example seqspec

* Make plot_png work better if there's only one modality (#66)

With constrained_layout and a single modality the height of the bar
graph showing the regions collapsed into almost a line.

This plots it without constrained_layout, and adjusts the title offet
as needed.

This fixes #44

* made internal api more consistent, added -t kb-single to seqspec index to force single end reads for read with max size

* continued making internal api more consistent

* upgraded build system to pyproject.toml, removed requirements.txt and dev-requirements.txt, simplified release process in the Makefile, simplified version tracking with pyproject.toml through setuptools_scm

* cleaned up pyproject.toml

* updated to pyproject packaging, removed setup.py/cfg, requirements.txt and MANIFEST.in

* fixing python version and removing mcp requirement

* fixing pip install with pyproject.toml

* changing Assay/Region/File/Read/etc classes to be derived from pydantic Base Class. this removes the need to specify yaml tags. These now get stripped. Changed formatter from black and flake8 to ruff.

* fixed seqspec modify when sequence is empty, removed parent_id implicit in spec, it wasn't being used anywhere, removed seqspec convert from the cli (currently not implemented)

* added bead_TSO to validator

* verified check works on 10x_rna_5prime.spec.yaml

* - fixed bug in read get file by id
- updated seqspec index to initialize pydantic models with named args
- removed tox, changed to pytest as a test manager
- cleaned up internal api for seqspec onlist (todo, add subcommands list, download, join)

* added 'loose' loading of a spec file followed by conversion to a validated version (so subsequent loads of the file work). TODO consider using loose loading only for format and check commands, and strict loading for every other command.

* set loose loading only for seqspec check and seqspec format. capture loading validation errors and print to stdout when trying to load with strict mode

* complete test rewrite, currenty passing

* made region.regions no longer optional, defaults to an empty list, updated associated functions accordingly, addeed extensive tests

* cleaned up internal api, added some error handling in assay, can now map primer id from a read to any level in the library spec. no changes made to seqspec index, but greatly expands style of specs that are compatible.

* cleaning up pyproject.toml

* updated repr for Assay/Read/Region, cleaned up print code internal to use updated functions, updated seqspec index to fix file name useage when -s file is specified, updated some tests that were previously incorrect in seqspec index which used read ids instead of region ids. seqspec index was changed to fix the behavior when asking for region indices

* removing comments from seqspec check

* added repr for file object, depracated -r argument in seqspec index/onlist/find, updated seqspec index to use consistent internal types, expanded tests for index

* relaxed check_primer_ids_in_libspec_leaves in seqspec check since updates to index no longer require primer_id to be in leaves

* added doc regions

* updated region and assay doc examples

* added --no-overlap to seqspec index so the set of region ids contained within each read are fully unique

* add region_type: sgrna_target (#72)

* added sgrna_target as region_type

* fixed format_kallisto_bus_force_single

* updating gitignore

* added seqspec build, change internal api of seqspec format, removed spec_fn from seqspec check, made seqspec insert and seqspec modify consistent (taking in list of *Input objects), annotated *Input objects for llm usage

* added check_region_against_subregion_length and check_region_against_subregion_sequence which checks that the min/max length of a region are equal to the sum of the sub regions and that the sequence of the region is equal to the concatenation of the subregions. suggestions by Zhewei Shen and Ian Whaling based on a spec submitted by Alex Barrera to IGVF portal (https://data.igvf.org/configuration-files/IGVFFI9197UDXC/)

* made check_sequence_types checks more robust, fixed spec loading to not overwrite sequences if present in region

* added check_read_length_against_library to check that the read lengths don't exceed sequenceable range given by the library elements after or before the primer id (based on the strandedness of the read)

* updated documentation for consistency with updated api

* updated list of checks in the documentation

* seqspec plot png now layers on the sequencing reads onto the library spec

* removed unecessary ghost primer in the spec

* attempt to fix bug #68 for seqspec onlist, fixing broken tests

* updated change long, preparation for release

* added a dev guide (in progress)

* updated seqspec build cli help text

* updated dev docs and dev flow, prepping for release

---------

Co-authored-by: Mingjie Li <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Diane Trout <[email protected]>
Co-authored-by: Ian Whaling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants