Large-scale comparison of ICT expression in healthy endothelium.
Endothelion is a project dedicated to exploring a special subset of the transportome--the ensemble of ion channels, pumps, and solute carriers (SLCs) that control inorganic ion transport and homeostasis--in human healthy endothelium.
Phase One focuses on characterizing the transportome within individual endothelial models, including both established cell lines and tissue samples. By analyzing publicly available RNA-Seq datasets, each gene of interest (GOI) is assessed for presence or absence and visualized with intuitive charts. This phase provides a clear snapshot of which transportome elements are expressed in each model, forming a foundation for further functional studies.
Phase Two moves beyond single models to examine differential expression across endothelial cells from different anatomical regions of healthy tissues. Quantitative comparison of RNA-Seq data highlights genes that are enriched or depleted in specific vascular beds, uncovering regional specialization within the endothelium. This analysis aims at revealing patterns in transportome composition that may underlie functional differences between endothelial populations.
For the purposes of the Endothelion project, the list of GOIs is compiled by querying our Membrane Transport Protein Database (MTP-DB). This gene set can be recreated on the fly by running the Kerblam! workflow:
kerblam run make_genesetThe resulting list comprises 672 transportome elements selected for their relevance to inorganic ion transport and homeostasis, and includes the following categories:
- all known human ion channels (436 genes, i.e., the complete channelome, including possible auxiliary or modulatory subunits),
- the entire set of aquaporins (14 genes),
- all ATPase pumps (90 genes),
- solute carriers (SLCs) specific for inorganic solutes (81 genes),
- a set of 51 receptors (GPCRs and RTKs) with roles in inorganic ion dynamics.
Note
Pay attention to protein vs. gene nomenclature for VEGFRs. Gene Symbols are very confusing:
- FLT1 -is for-> VEGFR-1
- FLT2 -is for-> this symbol doesn't exist any more (but it was the old name of FGFR1)
- FLT3 -is for-> CD135 (i.e., the RTK receptor for the cytokine Flt3 ligand, FLT3LG)
- FLT4 -is for-> VEGFR-3
- KDR -is for-> VEGFR-2
The entire analysis workflow is implemented within the Kerblam! project management open-source framework (Visentin et al., 2025), to ensure full transparency and result reproducibility.
You can retrieve the processed expression tables (i.e., outputs of the x.FASTQ pipeline) for all included studies directly from the corresponding Zenodo repository by:
kerblam data fetchYou can replicate all the steps of the Endothelion pipeline for transportome profiler in this way:
kerblam run hCMEC_D3- jq (>= 1.7.1)
- kerblam (>= 1.0.0-rc.1) [optional]
- limma (>= 3.58.1)
- sva (>= 3.50.0)
- ggplot2 (>= 3.5.0)
- Hmisc (>= 5.2-3)
- tidyr (>= 1.3.1)
- dplyr (>= 1.1.4)
- purrr (>= 1.0.2)
- httr (>= 1.4.7)
- DBI (>= 1.2.3)
- RSQLite (>= 2.3.5)
- AnnotationDbi (>= 1.60.2)
- org.Hs.eg.db (>= 3.16.0)
- PCAtools (>= 2.10.0)
- r4tcpl (>= 1.5.1)
- rlang (>= 1.1.3)
- magrittr (>= 2.0.3)
- DB: NCBI SRA
- Search:
hCMEC D3 - Results: 12 BioProjects
- Zenodo: https://zenodo.org/records/12729454
- Kerblam! workflow:
hCMEC_D3
| ENA BioProject ID | Study Alias | Ctrl Runs | Library | Median Read Length | Average Depth | Uniquely Mapped Reads | Platform | Reference |
|---|---|---|---|---|---|---|---|---|
| PRJNA307652 | GSE76528 | 8 | PE | 2 × 51 bp | 57.1 M | 78.9 % | Illumina HiSeq 2000 | PMID: 26973449 |
| PRJNA575504 | GSE138309 | 3 | PE | 2 × 78 bp | 22.9 M | 91.3 % | Illumina NextSeq 550 | PMID: 32757312 |
| PRJNA578611 | GSE139133 | 2 | PE | 2 × 150 bp | 24.3 M | 95.3 % | Illumina NovaSeq 6000 | PMID: 32985481 |
| PRJNA777606 | GSE187565 | 2 | PE | 2 × 150 bp | 27.8 M | 94.2 % | Illumina NovaSeq 6000 | PMID: 40097733 |
| PRJNA847413 | GSE205739 | 4 | PE | 2 × 150 bp | 23.3 M | 60.2 % | Illumina NovaSeq 6000 | NA |
| PRJEB48614 | E-MTAB-11129 | 3 | PE | 2 × 41 bp | 23.1 M | 85.9 % | Illumina NextSeq 500 | PMID: 35967327 |
| PRJNA667281 | -- | 3 | PE | 2 × 150 bp | 22.1 M | 96.2 % | Illumina NovaSeq 6000 | PMID: 33631268 |
| PRJNA896725 | -- | 5 | PE | 2 × 150 bp | 25.6 M | 94.1 % | Illumina NovaSeq 6000 | PMID: 38638822 |
| ENA BioProject ID | Study Alias | Reason for Exclusion |
|---|---|---|
| PRJNA802135 | GSE195781 | duplicated/bad runs |
| PRJNA607654 | GSE145581 | miRNA-Seq |
| PRJNA1073892 | GSE255171 | regulatory T cells (Tregs) |
| PRJNA307651 | GSE76530 | miRNA-Seq |
Out of the 5 control Runs of PRJNA802135/GSE195781 (2022), 3 turned out to be identical to other runs already published in the previous study of the same group of authors (2016), under the different ID PRJNA307652/GSE76528. Namely:
| PRJNA802135 (2022) | PRJNA307652 (2016) |
|---|---|
| SRR17833475 | SRR3085451 |
| SRR17833476 | SRR3085449 |
| SRR17833478 | SRR3085446 |
When comparing the corresponding FASTQ files between the two studies, the single reads actually appear to be identical, and the different file size only depends on the different pattern used for the heading line (Field 1) of each read.
# Check it out
zcat SRR17833478_1.fastq.gz | wc -l
zcat SRR3085446_1.fastq.gz | wc -l
zcat SRR17833478_1.fastq.gz | head
zcat SRR3085446_1.fastq.gz | head
zcat SRR17833478_1.fastq.gz | tail
zcat SRR3085446_1.fastq.gz | tailThis explains the batch effect that affects PRJNA802135 control samples (see PCA and hierarchical clustering), the abnormal standard deviation levels that are produced when all the samples are pooled together, and the unlikely levels of correlation between the 2 datasets (especially when removing the 2 different-looking runs from PRJNA802135).
As for the 2 non-duplicated samples of PRJNA802135, they showed a very poor quality profile, such as severe problems of adapter contamination, eventually featuring percentages of alignment of just 20%.
For all these reasons, PRJNA802135 study was completely purged from the global cohort.