Abstract
Cannabis sativa is a fascinating, yet under-researched, species. To facilitate the global expansion of C. sativa cultivation a greater understanding of flowering time control is crucial. The PEBP gene family consists of universal promoters and repressors of flowering, with homologs of FLOWERING LOCUS T (FT) being highly conserved key regulators of flowering. FT encodes florigen, and balancing the florigen and anti-florigen signals is key for fine-tuning a crop’s flowering to local climatic conditions. Here, we provide an in-depth characterisation of the PEBP gene family in C. sativa and the closely related species H. lupulus. Phylogenetic analysis reveals expansion of FT and TFL1/CEN clades in the Cannabaceae. The retention of the duplicated PEBP genes may be of functional significance, with divergent sequences and expression patterns hinting at signatures of sub-functionalisation. We speculate that duplicated PEBP genes have been crucial to the evolution of photoperiod insensitivity and sexually dimorphic flowering in C. sativa and harnessing the available genetic variation for these traits will be key for establishing C. sativa as a crop for the future.
Background
Perhaps among one of the most infamous families of flowering plants is the Cannabaceae. This fascinating family includes Cannabis sativa (C. sativa) and Humulus lupulus (H. lupulus), both of which have cultural importance given the metabolites produced in glandular trichomes on female flowers [1, 2]. Both species have tremendous commercial value, with H. lupulus used in brewing beer and uses of C. sativa spanning pharmaceuticals, biofuel and building materials [3]. However, the historic prohibition of C. sativa has had the long-term impact of hindering in-depth research and breeding. While genetic studies on C. sativa remain in their infancy, the crop is quickly gaining attention [4,5,6,7,8,9,10].
Several interesting biological phenomena exist in the Cannabaceae, the genetic control of which can be uncovered via comparative genomic studies. Separated by 20 million years of evolution, C. sativa and H. lupulus are both dioecious but differ in their life history strategies with C. sativa being an annual and H. lupulus a perennial [11,12,13]. Furthermore, in both species, male individuals flower before females and the photoperiod strongly impacts plant development, with C. sativa and H. lupulus being among the first species for which the influence of photoperiod on flowering was demonstrated (Tournois 1912, cited in [14, 15]). Both C. sativa and H. lupulus are short-day plants, but as H. lupulus is perennial, it requires long days to sufficiently develop vegetatively before transitioning to a reproductive state [11]. Therefore, understanding the genetic control of flowering, especially in the annual C. sativa, will be essential for expanding the geographic range of cultivation.
Since the 1930 s, classical studies sought to uncover the underlying mechanism by which plants transition to a flowering state. The flowering stimulus was theorised to be a graft-transmissible flowering hormone, called florigen [16, 17]. Eventually, extensive molecular genetic studies in the model plant Arabidopsis thaliana demonstrated that florigen is the protein product of FLOWERING LOCUS T (FT) [18]. FT is expressed in the leaf phloem companion cells and the mobile protein subsequently translocates to the shoot apical meristem where it interacts with transcription factors to initiate the floral transition [19].
FT belongs to the phosphatidylethanolamine-binding (PEBP) gene family. These genes have well-studied, diverse roles in plant growth and development, with the most prominent function being flowering time control [20]. There are three subclades of PEBP genes: FT-like, TFL1/CEN-like and MFT-like. Broadly speaking, FT-like genes are flowering promoters (florigens) whereas TFL1/CEN-like genes are flowering repressors (anti-florigens), and thus the balance between these opposing signals is a major determinant of reproductive development [20, 21]. Phylogenetically, MFT-like genes constitue the sister subfamily to both, FT and TFL1/CEN-like genes, with a role in seed germination and dormancy [22, 23]. However, functional diversification within these subclades has been reported, and amino acid substitutions at known positions can be indicative of whether a PEBP gene is a promoter or repressor of flowering [20, 24].
PEBP genes have been targeted throughout domestication in many different species [25,26,27,28,29]. Previous phylogenetic studies of the PEBP gene family in other plant families like the Poaceae, Brassicaceae and Rosaceae [30,31,32] demonstrated that the copy number of PEBP genes can vary substantially. For example, within the Rosaceae, duplications of PEBP genes in the genome of Malus domestica have occurred likely due to an independent whole genome duplication unique to the Malus genus, resulting in eight PEBP genes [30]. Contrastingly, Prunus persica, which is also in the Rosaceae, possesses only five PEBP genes.
An angiosperm-wide analysis demonstrated that FT and TFL1/CEN clades have expanded differentially with a stark increase in copy number in monocots [33]. Increases in PEBP gene number are evident in Sorghum bicolor (19 genes), Oryza sativa (19 genes), and Zea mays (23 genes) [34,35,36]. Most monocots have five FT subclades, and preliminary analysis suggests sub-/neo-functionalisation within these lineages [33]. For example, in Hordeum vulgare HvFT4 is involved not only in flowering but also in spikelet development [37].
Previous research has identified an apparent duplication of a PEBP gene, CsFT1, as a candidate gene contributing to photoperiod insensitivity (“autoflowering”) in C. sativa [8]. Furthermore, sexual dimorphism regarding flowering time in C. sativa means that investigating the PEBP gene family will likely yield fascinating insights [8].
To better understand flowering dynamics in C. sativa we characterised the PEBP gene family in the Cannabaceae. We provide a detailed overview of the PEBP gene family in C. sativa and H. lupulus and also explore PEBP diversity in newly released diverse C. sativa genomes. We find an unexpectedly large number of PEBP genes, specifically an expansion of both the florigen (FT) and anti-florigen (TFL1/CEN) clades in the Cannabaceae and hypothesise that functional diversification has occurred in this gene family in C. sativa. We hypothesise that PEBP gene duplications are part of the driving force behind photoperiod insensitivity and sexually dimorphic flowering in C. sativa and thus are key molecular breeding targets for the expansion of global C. sativa cultivation.
Methods
Identification of PEBP genes in C. sativa and H. lupulus
Annotated PEBP genes in C. sativa were obtained from the ‘CBDRx’ reference genome gene annotation (‘CBDRx’-cs10 v2; https://www.ncbi.nlm.nih.gov/assembly/GCF_900626175.2/, last accessed on 10 Jan. 2023) [38] and verified using BLAST with A. thaliana FT (AT1G65480) protein as query (e-value < 1e − 5). Additionally, the ‘FINOLA’ genome (GCA_003417725.2) [39] was searched for PEPB-like genes using NCBI BLAST with the ‘CBDRx’ PEBP genomic sequence as a query (e-value < 2e-68). The top hits were retained for each ‘CBDRx’ PEBP gene, checking the NCBI MSA viewer (v1.25.0) to ensure only full-length hits were retained.
For C. sativa genomes not on NCBI (https://resources.michael.salk.edu/resources/cannabis_genomes/index.html, last accessed on 18 Aug., 2023), BLAST databases were constructed using NCBI BLAST + blastp (v2.10.1) (e-value < 1) with ‘CBDRx’ PEBP proteins as query: male genomes ‘Boone County’ (BCMa/BCMb), ‘Golden Redwood’ (GRMa/GRMb), ‘Ace High 3–2’ (AH3Ma/AH3Mb) and ‘Kompolti’ (KOMPa/KOMPb); female genomes ‘White Widow’ (WHWa/WHWb) and YunMa (YMv2a/YMv2b); monoecious genomes ‘Santhica 27’−2 (SAN2a/SAN2b) and KCDora (KCDv1a/KCDv1b) (Table S1, Figure S2). Nucleotide blast databases were generated for ‘Kompolti’ and ‘Boone County’ using NCBI BLAST + makeblastdb to check for the presence of a previously identified helitron-like sequence in CsFT1a (QKVJ02000894.1|:191,793–192,866) [8].
For H. lupulus the most recent unmasked “Cascade” genome assembly and annotation [2] was downloaded from HopBase (http://hopbase.cqls.oregonstate.edu/dovetailDownloads/dovetailCascadeMasked.php, last accessed on 18 Aug., 2023). Dc-megablast (v2.10.1) was used with ‘CBDRx’ PEBP coding sequences as queries and an e-value cutoff of 0.001.
For genomes without a full annotation, (H. lupulus and C. sativa ‘FINOLA’) AUGUSTUS (http://bioinf.uni-greifswald.de/augustus/) and FGENESH + (http://www.softberry.com/berry.phtml?topic=fgenes_plus&group=programs&subgroup=gfs, last accessed on 16 Aug. 2023) were used to predict protein sequence from blast hits [40, 41]. The presence of a PEBP domain was verified using the NCBI conserved domain database [42]. Protein sequences were aligned with MAFFT and visualised in Jalview to check for an intact PEBP domain and the presence of a start codon [43, 44]. For ‘FINOLA’ PEBP genes, any amino acid substitutions in comparison to ‘CBDRx’ were confirmed using the ‘FINOLA’ Illumina WGS short read data (SRS17330655, SRS17330649) that was generated previously [8].
To identify haplotigs (alleles assembled as two separate genetic loci), normalised coverage analysis was conducted. For H. lupulus publicly available Illumina WGS short read data (DRR024451) was obtained from NCBI SRA and mapped to the Dovetail “Cascade” reference genome (http://hopbase.cqls.oregonstate.edu/dovetailDownloads/dovetailCascadeMasked.php, last accessed on 8 May 2023) using BWA-MEM (Galaxy version v0.7.17.2) [45].
Coverage analysis was further conducted for ‘FINOLA’ using the aforementioned WGS short-read data [8] that were mapped to the ‘FINOLA’ genome (GCA_003417725.2, last accessed on 2 Jul., 2020) using BWA-MEM (Galaxy version v0.7.17.2) [45].
To get average coverage of blast hits, normalised relative to the whole genome coverage, samtools depth (-a, v1.10) and samtools bedcov (Galaxy Version 2.0.3) were used. An average coverage cutoff > 0.8 was used. All megablast and coverage analyses were done using the Galaxy platform [46].
Phylogenetic analysis of PEBP genes
For comparative sequence analysis of PEBP genes, a recent phylogenetic study was used to identify a Rosales species closely related to C. sativa with a high-quality reference genome, and Prunus persica was selected [47, 48]. For P. persica gene identifiers were obtained from the literature [], and sequences were obtained from NCBI. For A. thaliana, protein sequences were accessed from TAIR (The Arabidopsis Information Resource, www.arabidopsis.org, last accessed on 5 Sept., 2022). For Glycine max PEBP gene IDs were obtained from the literature [24, 49], the newest gene annotation ID verified in Soybase (https://www.soybase.org/, last accessed 23 November 2023) and NCBI was used to acquire the newest RefSeq (Glycine_max_v4.0, GCF_000004515.6) protein sequence. Maximum likelihood phylogenetic trees were created as previously described [8]. In brief, protein sequences were aligned in MAFFT [44] and ALISTAT (v.1.3) masked residues with low completeness ( < 0.5) in the alignment [50]. IQ-TREE was used for tree construction [51] with branch support values from Ultrafast bootstraps (UFBoot) and a Shi-modaira–Hasegawa approximate likelihood ratio test (SH-aLRT) test (1000 replicates each). A clade was deemed reliable if SH-aLRT and UFBoot supports were > 80% and > 95%, respectively. FigTree (v1.4.4, http://tree.bio.ed.ac.uk/software/figtree/, last accessed on 18 Aug. 2023) and ggtree [52] were used to visualise the trees.
Synteny and gene structures analysis
Genomic sequences of various C. sativa PEBP genes were obtained and compared using dot plots generated with D-GENIES (https://dgenies.toulouse.inra.fr/, last accessed on 12 Dec. 2023). Synteny plots were generated using MCScanX and TBtools [53, 54]. The MCScanX output was manually edited to ensure all verified HlPEBP hits were linked with their respective best CsPEBP blast hit in the synteny plot. For the ‘Kompolti’ synteny plots, MCScanX output was also edited to allow links between copies of duplicated genes. Publicly available gene annotation files (GFF3) were used for all genomes. The H. lupulus genome annotation was downloaded from HopBase (https://hopbase.org/content/cascadeDovetail/geneData/transdecoder/transdecoderOutput/transcripts.fasta.transdecoder.genomeCentric.gff3.gz, last accessed 18 Dec., 2023). Transcripts corresponding to the HlPEBP genes were identified and if no transcript was found the GFF was manually edited (Table S2). For ‘FINOLA’ a GTF was constructed from RNA-seq data (see StringTie method). Gene structure plots accompanied by phylogenetic trees were visualised in TBtools, with MAFFT used to align protein sequences and IQtree to generate the tree [44, 51, 54].
RNA-seq analysis
RNA-seq analysis was conducted as outlined previously [8, 55]. Reads were mapped to ‘Kompolti’ using HISAT2 v2.2.1 [56]. The RSeQC Infer Experiment tool was used to determine the strandedness of mapped datasets [57]. StringTie was used to assemble transcripts and calculate gene expression in transcripts per million (TPM) [58]. Morpheus (https://software.broadinstitute.org/morpheus, last accessed on 13 November., 2023) was used to generate heatmaps from TPM values. RNA-seq data is available on NCBI from bioprojects PRJNA956491 and PRJNA1126191.
Results
PEBP genes identified in C. sativa and H. lupulus
In previous work, we identified 12 PEBP genes in C. sativa [8]. More specifically, we found four FT-like genes, four TFL1/CEN-like genes, and three MFT-like genes (Fig. 1). To confirm the phylogenetic history and verify the number of the C. sativa PEBP genes identified, we analysed the PEBP genes in the closely related species H. lupulus.
To identify the PEBP genes in H. lupulus, dc-megablast was employed with 26 raw hits obtained when using the 12 C. sativa PEBP genes as query sequence (Table S2). Given the likelihood that “haplotigs” (haplotypes assembled as separate contigs) exist in the scaffold level H. lupulus assembly, the raw hits were subsequently verified using coverage and sequence analysis. After these filtering steps, 13 PEBP genes were verified to likely exist in H. lupulus (Fig. 1, Table S2).
In our phylogenetic analysis, we also included PEBP genes from the model plant A. thaliana and from Prunus persica (peach). Prunus persica was chosen because like C. sativa it belongs to the Rosales [30, 48].
The number of TFL1/CEN/BFT-like genes is the same in C. sativa and H. lupulus with five genes found in both species. A separation in a BFT and TFL1/CEN clade is visible (Fig. 1). The BFT subclade follows, albeit with low support values, the species phylogeny: A. thaliana BFT is sister to all other BFT-like genes, followed by P. persica, C. sativa and H. lupulus. The three BFT-like genes found in C. sativa and H. lupulus show a 1-to-1 relationship. This indicates that BFT-like genes underwent two rounds of duplications in the lineage leading to C. sativa and H. lupulus after divergence from the P. persica lineage. In eudicots separate TFL and CEN clades can also be identified [33]. Though a CEN clade is resolved in our phylogeny (containing AtATC, PpCEN, HlCEN4 and CsCEN4) a clear TFL1 clade could not be reconstructed, although from an an evolutionary perspective it may appear plausible that CsTFL1 is a TFL1 ortholog.
There are two notable discrepancies in PEBP gene number between H. lupulus and C. sativa. Six FT-like genes were identified in H. lupulus, with apparent HlFT2 and HlFT4 duplications in comparison to C. sativa which has four FT-like genes (Figs. 1, 2). The overall FT clade has strong support (99.7/99), while the internal separation into FT2 and FT1/3/4 subclades is sustained by lower support values (Fig. 1). Within the FT2 subclade are CsFT2 and HlFT2, which appear orthologous to FT from A. thaliana. The FT1/3/4 subclade contains only H. lupulus and C. sativa genes and is not well supported (93.7/79, see support values in Methods) (Fig. 1). The multiple C. sativa and H. lupulus FT-like genes found in this subclade indicate that multiple rounds of gene duplications took place. The HlFT3 ortholog identified in the current H. lupulus assembly is a partial sequence and may represent a pseudogene.
PEBP genes are syntenic between H. lupulus and C. sativa. Inter-genomic synteny blocks between the 10 largest scaffolds (sc) in H. lupulus (a) and the ten chromosomes of C. sativa accessions ‘CBDRx’ (female) (b) and ‘Kompolti’ (male). ‘Kompolti’ is a phased genome assembly, and thus both chromosomal haplotypes (c, d) were analysed. H. lupulus chromosomes are ordered as they relate to the C. sativa chromosomes [2]. Black arrows denote the chromosomal direction in a given assembly with respect to the ‘CBDRx’ genome. Inverted red triangles denote PEBP gene location. Grey lines between genomes denote synteny blocks from blast, with black lines denoting links between PEBP genes. Red lines denote the pseudoautosomal regions of the X and Y chromosomes of ‘Kompolti’.
Two and three MFT-like genes were identified in H. lupulus and C. sativa, respectively. Two separate MFT subclades exist in angiosperms [33], and this is supported in the present analysis. The MFT1 subclade follows the species phylogeny: A. thaliana MFT is sister to all other MFT-like genes, followed by orthologs from P. persica, C. sativa and H. lupulus. The other MFT subclade contains C. sativa CsMFT3 and H. lupulus HlMFT2 only, which may indicate that the CsMFT3 orthologs in A. thaliana and P. persica were lost (Fig. 1). This is further supported when Cucumis sativus is included in the phylogeny, as this more distantly related species to C. sativa also has both MFT clades (Figure S1).
CsMFT1 and CsMFT2 in the ‘CBDRx’ C. sativa genome are almost identical in sequence and none of the other analysed C. sativa genomes displays this supposed duplication (see below), hence these paralogs might be haplotigs. To test for this, an additional coverage analysis was conducted for CsMFT1 and CsMFT2 using ‘FINOLA’ (which lacks the CsMFT1-CsMFT2 duplication) and ‘Felina 32’ (of unknown CsMFT genotype) short-read sequencing data mapped to ‘CBDRx’. In general, lower normalised coverage was found for CsMFT1 and CsMFT2 (0.3–0.7) than for CsMFT3 (0.6–0.9) (Table S4). However, the results remain ambiguous as no short-read sequencing data was available for ‘CBDRx’. Therefore, the CsMFT1 and CsMFT2 duplication in ‘CBDRx’ may be genuine or an assembly artefact.
To understand the evolution of the PEBP gene family in the Cannabaceae, inter-genomic synteny analysis was conducted between C. sativa and H. lupulus (Fig. 2a, b). A high level of macrosynteny exists in C. sativa and H. lupulus. Large interspecies collinear blocks exist for the PEBP genes, with all PEBP genes appearing in similar chromosomal locations in both species (Fig. 2a, b). This analysis also indicated that the species-specific H. lupulus duplications observed in the phylogenetic tree (HlFT2a/2b and HlFT4a/4b) appear to be tandem duplications.
PEBP copy number variation in other C. sativa genomes
Previously, we identified a duplication of CsFT1 in the photoperiod-insensitive C. sativa accession ‘FINOLA’ [8]. Therefore, the PEBP genes in the publicly available ‘FINOLA’ genome [39] were analysed in greater detail. Using the same procedure as outlined above for H. lupulus, 15 PEBP genes were identified: Eight FT-like, four more than in the ‘CBDRx’ reference genome, four TFL1/CEN-like and two MFT-like genes (Table S5). The duplication of CsFT1 in ‘FINOLA’ was previously reported [8]. The additional discrepancy in gene number is due to four CsFT2 genes in ‘FINOLA’ compared to one in ‘CBDRx’.
To assess whether the CsFT1 and CsFT2 duplications found in ‘FINOLA’ are widespread in other C. sativa accessions, additional C. sativa genome assemblies that are fully phased (i.e. both chromosomal haplotypes are available) were analysed (https://resources.michael.salk.edu/resources/cannabisgenomes/index.html). Eight genomes were preliminarily analysed and were selected to try to encompass the diversity within the C. sativa gene pool (chemotype, flowering phenotype, sex, crop end-use) (Table S1, Figure S2). After preliminary phylogenetic analysis (Figure S2), a representative subset of four genomes was selected to supplement the in-depth sequence analysis on the grain cultivar ‘FINOLA’ and drug-type ‘CBDRx’: male hemp ‘Kompolti’, male feral hemp ‘Boone County’, a monoecious fibre-type ‘Santhica 27’ and a female drug-type ‘White Widow’.
The CsFT1 duplication previously identified in ‘FINOLA’ was also detected in the accessions ‘Boone County’ and ‘Kompolti’ (Figure S3). In all analysed genomes, one CsFT2 gene was detected, located on the sex chromosome X (CsFT2aX) (Fig. 2b, c). In the male genomes ‘Boone County’ and ‘Kompolti’, three and four CsFT2 duplicates were identified on chromosome Y, respectively (CsFT2aY, bY, cY) (Fig. 2d). CsFT2aY is located at a similar position on the Y chromosome as CsFT2aX on the X chromosome, within the pseudoautosomal region, but relatively close to the non-recombining region. Two CsFT2aY genes located 0.3 Mb apart, CsFT2aY_1.K.b and CsFT2aY_2.K.b, were found on chromosome Y in ‘Kompolti’ (Fig. 2d). The additional CsFT2bY and CsFT2cY genes were found central within the non-recombining region of chromosome Y in both ‘Kompolti’ and ‘Boone County’. For ‘FINOLA’ there is no well-supported chromosome location for CsFT2. The male individual, as well as the female individual, were both found to have a coverage of ~ 1 for CsFT2a, while coverage for CsFT2b to CsFT2c was ~ 0.5 in the male ‘FINOLA’ individual and 0 in the female ‘FINOLA’, indicating that those genes are located on the Y chromosome, similar to the other accessions with phased chromosome assemblies (Table S5).
Additionally, CsCEN1 and CsFT3 both appear to be only found on chromosome X (Fig. 2b, c). In ‘FINOLA’, the coverage for these genes is ~ 0.5 and ~ 1–1.5 in male and female individuals respectively (Table S5), supporting the notion that they are located on the X chromosome. Accordingly, two alleles of CsCEN1 and CsFT3 (1 per chromosome X haplotype) were identified in female and monoecious genomes ‘Santhica 27’ and ‘White Widow’, while the genes are hemizygous in the male individuals of the accessions ‘Boone County’ and ‘Kompolti’ (Figure S3).
Furthermore, only two MFT-like genes are found in all genomes except ‘CBDRx’, consolidating that the putative MFT duplication in ‘CBDRx’ (CsMFT1 and CsMFT2) is indeed cultivar-specific (Figure S3).
Key amino acids denote likely PEBP function in C. sativa
PEPB-like proteins have characteristic amino acid residues determining their function in floral promotion or inhibition [20]. Five amino acids, in particular, are conserved in FT-like proteins compared to TFL1/CEN-like and MFT-like proteins (Fig. 3): Y85, E109, Y134, W138 and Q140 [20, 29, 59, 60]. A comprehensive list of PEPB-like proteins was compiled from the literature for comparison with C. sativa PEPB proteins (Table S6).
Key PEBP amino acid positions denote gene function. An alignment of PEPB-like proteins from A. thaliana, P. persica, H. lupulus and C. sativa shows three main groups: FT, MFT and TFL (a). Several amino acids are candidates for PEBP function (Table S6) but for concision, only five positions are highlighted here (position refers to AtFT): Y85, E109, Y134, W138 and Q140 [29]. Web logos summarise the variation at the key amino acids in C. sativa FT-like (b), TFL1/CEN-like (c), and MFT-like (d) proteins. HlFT3 was not included in (a) as it is only a partial sequence.
Out of these five key amino acid positions, three were identical between CsFT1 to CsFT4 and FT. At two positions changes compared to the canonical amino acids were observed: Y134F in CsFT4 and Q140H in CsFT2 (Fig. 3). Y134F is also found in HlFT4 from H. lupulus and in the anti-florigen TFL1 from A. thaliana. Q140H is found in GmFT5a (from Glycine max) which phylogenetically clusters with CsFT2 and is a demonstrated flowering promoter (Figure S4) [61, 62].
Copy number variation of CsFT1
Previous research suggested that the CsFT1 duplicates in ‘FINOLA’ differed from one another in sequence (both protein and genomic) [8]. To investigate the variation at this locus between different C. sativa cultivars, the CsFT1 sequences from a number of genomes were assessed in greater detail (Fig. 4). In the phylogenetic tree and protein sequence alignment, three broad groups of CsFT1 sequences exist: those similar to CsFT1 (‘CBDRx’ and the photoperiod-sensitive cultivar ‘Felina 32’) [8], those similar to CsFT1a (photoperiod-insensitive cultivar ‘FINOLA’) and those similar to CsFT1b (‘FINOLA’) (Fig. 4a). In total, eighteen amino acids vary when comparing all CsFT1 protein sequences (Fig. 4a).
CsFT1 copy number variation in various C. sativa genomes. An alignment of the CsFT1 protein sequences from the analysed C. sativa genomes and AtFT (a). Black stars denote amino acid differences between CsFT1 sequences. Numbered amino acid positions are with respect to AtFT. Highlighted AA columns are potentially key for the function/designation of the PEBP subfamily (Table S6). The horizontal black line denotes the four segments encoded by the fourth exon [63]. The phylogenetic tree is according to Figure S3. Dot plots illustrating sequence similarity of CsFT1 locus in various C. sativa genomes (b). Schematic of the CsFT1 gene structure in various genomes, with sequences ordered according to the accompanying phylogenetic tree (c). Blue boxes denote the coding sequences. Gene labels include gene name, accession (C = ‘CBDRx’, F = ‘FINOLA’, W = ‘White Widow’, S = ‘Santhica 27’, K = ‘Komploti’, B = ‘Boone County’), and haplotype (a or b).
One of the amino acid differences between ‘CBDRx’ and ‘FINOLA’ CsFT1b is S76N, a position potentially important for function (Table S6). Three amino acid differences exist between all CsFT1a and CsFT1b alleles (amino acid position according to FT): H54L, N76S and N177T.
In ‘FINOLA’, ‘Kompolti’, ‘Boone County’ and ‘White Widow’ a CsFT1 protein start codon could not be predicted, potentially because the start codon is located on an upstream exon. In CsFT1 of the ‘Santhica 27’ haplotype B, a 7 nucleotide insertion is predicted to result in an altered splicing pattern (Fig. 4c), and a 10AA deletion in the protein sequence (Fig. 4a).
Given that the protein sequences are relatively similar, the genomic CsFT1 sequences were investigated. Dot plot comparisons of the ~ 10–20 kb genomic fragments containing the CsFT1 genes were analysed (Fig. 4b). This confirmed that the locus covering CsFT1 in ‘CBDRx’ vs. CsFT1a and CsFT1b in ‘FINOLA’ is quite diverged (Fig. 4b). The dot plots further revealed that ‘FINOLA’, ‘Boone County’ and ‘Kompolti b’ all carry the CsFT1a/1b duplication. Similarly, the ‘CBDRx’ CsFT1 locus resembles that of ‘Kompolti a’, ‘Santhica 27’ and ‘White Widow’. The genomic sequence upstream of the ‘White Widow a’ CsFT1 gene is quite different from ‘CBDRx’ and ‘FINOLA’ and so does not appear on the dot plot (Fig. 4b). However, these sequence deviations are upstream of the CsFT1 coding sequence (Fig. 4b).
Intronic variation was observed in the CsFT1 duplicates with a putative transposable element found in the large intron of CsFT1a [8]. The same helitron-like sequence was found in CsFT1a of ‘Boone County’ and ‘Kompolti’ (Fig. 4c). The intron length of CsFT1a-like alleles is equal across accessions, whereas the large intron of CsFT1b in ‘Boone County’ and ‘Kompolti’ varies in length (Fig. 4c).
Copy number variation in CsFT2 in male C. sativa
All CsFT2 protein sequences appear to be very similar, with only five amino acids differing between the various CsFT2 sequences (Fig. 5a).
CsFT2 copy number variation in various C. sativa genomes. An alignment of the CsFT2 protein sequences from the analysed C. sativa genomes and AtFT (a). Black stars denote amino acid differences between CsFT2 alleles. Highlighted AA columns are potentially key for the function/designation of the PEBP subfamily (Table S6). The horizontal black line denotes the fourth exon of the protein, split into four segments [63]. The phylogenetic tree is cropped from Fig. 3. Dot plots illustrating the genomic landscape of the CsFT2 locus in various genomes (b). Schematic of the CsFT2 gene structure in various genomes analysed, with sequences ordered according to the accompanying phylogenetic tree (c). Blue boxes denote the coding sequences. Gene labels include gene name (-X or -Y if located on sex chromosomes), accession (C = ‘CBDRx, F = ‘FINOLA’, W = ‘White Widow’, S = ‘Santhica 27’, K = ‘Komploti’, B = ‘Boone County’), and haplotype (a or b). Schematics demonstrating the allele combinations present in female (‘White Widow’/’Santhica 27’) and male (‘Boone County’/‘Kompolti’) genomes (d). Grey, light blue and dark blue boxes denote the pseudoautosomal, chrX-specific and chrY-specific chromosomal regions, respectively. Vertical black lines pinpoint the genomic location of CsFT2 alleles. *Expressed the genomic location of Kompolti chrX CsFT2a to ensure the same chromosomal direction as other assemblies
Three of those amino acids differences are at positions potentially important for function between CsFT2a and CsFT2bY/CsFT2cY (Fig. 5a, Table S6, amino acid position according to FT): N34I in CsFT2bY, T144S in CsFT2cY and A159V in both CsFT2bY and CsFT2cY. Furthermore, CsFT2cY found in both ‘Boone County’ and ‘Kompolti’ appears to be shorter at the N-terminal end (Fig. 5a, b, c).
Dot plot analysis was conducted to compare the genomic fragments containing CsFT2 duplicates (Fig. 5b). ‘FINOLA’ was not included in the in-depth CsFT2 analysis as it lacks a chromosome Y assembly, and thus ‘Kompolti’ was used for genomic comparison.
In chromosome X haplotypes (‘CBDRx’, ‘Kompolti A’, ‘Boone County A’, ‘Santhica 27’ and ‘White Widow’) only CsFT2aX was present (Fig. 5b). In chromosome Y haplotypes (‘Kompolti B’ and ‘Boone County B’) three CsFT2aY to cY were found. In ‘Boone County’ and ‘Kompolti’ CsFT2cY is shorter than the ‘CBDRx’ query (Fig. 5a, b). Overall the dot plots provide a complex picture, with different levels of similarity between CsFT2 loci. For example, the CsFT2aX locus appears to be very similar between ‘CBDRx’ and ‘White Widow A’, but different between ‘CBDRx’ and ‘Kompolti A’ (Fig. 5b).
The gene structures of CsFT2 sequences were visualised, and the largest variation was observed in the introns of CsFT2bY and the lack of exon 1 in CsFT2cY (Fig. 5c).
In summary, the CsFT2 gene number appears to be dynamic on Y chromosomes in C. sativa (Fig. 5).
Expression analysis of PEBP genes at vegetative, floral transition and flowering suggests functional conservation
To assess whether the identified genes are expressed and potentially functional during the floral transition, the expression of all PEBP genes in C. sativa was analysed in vegetative and flowering tissues from the dioecious photoperiod insensitive cultivar ‘FINOLA’, the monoecious photoperiod sensitive cultivar ‘Felina 32’, and F2 individuals from a ‘Felina 32’ × ‘FINOLA’ cross described previously (Fig. 6) [8].
PEBP gene expression in C. sativa. (a) Heatmap of RNA-seq data performed on flowering and vegetative tissue samples from ‘FINOLA’, ‘Felina 32’ and F2 individuals, mapped to ‘Kompolti’ haplotype A and B. Each row is an average TPM of the biological replicates. Each sample type has 3-5 biological replicates. Growth stages refer to the number of true leaf pairs [55], while ‘days’ refers to days post sowing [8]. Note that, although all transcripts were mapped to both Kompolti haplotypes, the female and monoecious plants have no Y chromosome. Transcripts from female or monoecious plants mapping to Y chromosomal genes probably originated from X chromosomal FT genes. Likewise, ‘Felina 32’ encodes CsFT1 and ‘FINOLA’ encodes both CsFT1a and CsFT1b. (b) A comparison of the CsFT2 expression in female (stage 2, L2F) and male (stage 2, L2M) shoot apical meristem samples (cv. ‘FINOLA’) before flowering. RNA-seq data was mapped to both Kompolti haplotypes separately for the male transcripts. Although this doesn’t allow distinguishing whether the transcripts originated from the X or Y chromosome it is clear from the statistics that CsFT2a transcripts are overexpressed in males as compared to females. Kruskal–Wallis with Conover-Iman post-hoc test was used, α = 0.05 with letters denoting significance levels.
The male ‘Kompolti’ genome was used as a reference for RNA-seq analysis and as ‘Kompolti’ is a phased assembly, both haplotypes are available. ‘Kompolti A’ includes one CsFT2 gene on chromosome X and one CsFT1 gene on chromosome 8, whereas ‘Kompolti B’ possesses the chromosome Y CsFT2 genes and the CsFT1 gene duplication on chromosome 8 (Figs. 2c, d, 6). This allowed the expression of the duplicated PEBP genes to be visualised using an appropriate reference genome.
As outlined previously [8], in ‘Felina 32’ CsFT1 expression was only observed at flowering under short-day conditions (Fig. 6a), whereas in ‘FINOLA’ either CsFT1a or CsFT1b expression was found in various developmental stages and tissue types (Fig. 6a), but primarily in flowering samples with CsFT1a typically being higher expressed than CsFT1b (Fig. 6a).
CsFT2a is highest expressed in leaves at flowering, though some expression is found in vegetative leaves and shoot apical meristems, and in the stem at the flowering stage (Fig. 6a). CsFT2b and CsFT2c have similar expression patterns, mostly restricted to male and female leaves at flowering and male vegetative shoot apical meristem (Fig. 6a). In vegetative shoot apical meristem samples, CsFT2a expression is significantly higher in males than females (Fig. 6b, P = 0.002 and P = 0.0038 for CsFT2aX.K.a and CsFT2aY_1.K.b in male samples, respectively).
The expression patterns of CsFT3 and CsFT4 are similar and highest in leaf samples at flowering. In ‘Felina 32’ both CsFT3 and CsFT4 expression were detected in stems and leaves at flowering induced by short days, while no or low expression was found in vegetative long day-treated ‘Felina 32’ (Fig. 6a). CsFT3 and CsFT4 were both expressed in ‘FINOLA’ leaves but not stems when grown under long days (Fig. 6a).
Expression of CsTFL1 is highest in stems and the shoot apical meristem, with only trace expression found in leaf tissues (Fig. 6a). CsCEN1 is highly expressed in all samples (Fig. 6a). CsCEN2 is highest at flowering with expression in leaves and the shoot apical meristem (Fig. 6a). CsCEN3 is expressed in all tissues and is highest in leaves both in vegetative samples and at flowering (Fig. 6a). CsCEN4 expression was not detected in any samples (Fig. 6a). CsMFT3 is higher expressed than CsMFT1 in almost all samples, with only trace expression found of the latter in shoot apical meristem and stem (Fig. 6a).
Discussion
Clade-specific expansion of the PEBP gene family in the Cannabaceae
PEBP genes have crucial roles in developmental processes, especially the floral transition [20]. Previous research identified PEBP genes as candidates for flowering in C. sativa [8], and the growing availability of genomes of various accessions allowed an exhaustive search of this gene family to be performed in C. sativa and the closely related species H. lupulus.
Our phylogenetic analysis reveals that in C. sativa and H. lupulus, both the FT and BFT clades have expanded compared to P. persica and A. thaliana (Fig. 1). The expansion is most obvious in the FT clade. The current topology suggests a duplication leading to an FT2 vs. FT1/3/4 clade. However, the FT2 clade contains genes from A. thaliana (AtTSF and AtFT), P. persica (PpFT), C. sativa (CsFT2) and H. lupulus (HlFT2a, HlFT2b) whereas the FT1/3/4 clade only contains H. lupulus and C. sativa genes. If the current topology is correct, and we reconcile the gene and species tree [64] this would indicate that the FT2-FT1/3/4 duplication occurred prior to the separation of A. thaliana, P. persica, C. sativa and H. lupulus into distinct lineages, and that the FT1/3/4 genes were lost in A. thaliana and P. persica. However, the topology of the FT clade is likely not fully resolved given the low support values for some of the branches. Therefore, instead of gene losses in the lineages leading to A. thaliana and P. persica it is also possible that there were several rounds of FT duplications in the lineage leading to the Cannabaceae after A. thaliana and P. persica branched off.
For most PEBP genes in C. sativa and H. lupulus, a 1-to-1 orthology is observed, indicating that most gene duplications occurred before H. lupulus and C. sativa separated. An exception is HlFT2 and HlFT4 which appear to be duplicated specifically in H. lupulus.
It is interesting to note the location of PEBP genes in common syntenic blocks between C. sativa and H. lupulus genomes (Fig. 2). This further supports that the expansion of the PEBP gene family occurred before the speciation event leading to C. sativa and H. lupulus 25 million years ago [3, 65].
The origin of the duplications in the FT and BFT clades is unclear. Many increases in PEBP gene number in the eudicots are whole genome duplication (WGD)-mediated (Malus domestica, Glycine max) [24, 25]. However, the recent C. sativa pangenome study suggests that C. sativa has not had a recent WGD event [66], indicating that the PEBP gene duplications described here result from small-scale duplication events.
Sequence divergence in duplicated PEBP genes may indicate functional divergence
The retention of an expanded PEBP gene family in the Cannabaceae suggests that the duplicated PEBPs may have adaptive significance.
For all identified C. sativa and H. lupulus PEBP genes the key amino acids denoting flowering promoter/repressor function were analysed (Table S6). In the majority of cases, the phylogenetic location of the analysed PEBP genes matches the expected amino acid residues that indicate flowering-promoting or repressive activity (Fig. 3, Table S6). Sequence divergence may be indicative of functional divergence, and in two cases sequence divergence in key amino acids was observed: CsFT2 and CsFT4 (Fig. 3). In several species such as O. sativa, Populus spp., and G. max duplicated PEBP genes have undergone sub- and neo-functionalisation to regulate flowering as well as other developmental processes under particular environmental conditions [20, 67].
CsFT2 has four out of the five residues expected to be present in a floral promoter, the exception being Q140H (Fig. 3). Single amino acid substitutions have been reported to change the ability of FT-like proteins to promote or repress flowering [59, 68]. It is possible that gene duplication and subsequent amino acid substitutions in CsFT2, may have played a role in fine tuning photoperiodic flowering in C. sativa. Additionally, the male-specific duplicates of CsFT2, CsFT2bY and CsFT2cY, could be responsible for sexual dimorphic flowering in C. sativa given the expression of these genes just before and at flowering (Fig. 6a, b).
CsFT3 and CsFT4 seem to be co-expressed in most sampled tissues which may indicate redundant or complementary roles in promoting flowering. The best hit for FT3 in H. lupulus is only a partial sequence and thus may be a pseudogene (Figs. 1, 2).
PEBP expression patterns indicate conservation as well as functional divergence
In C. sativa and H. lupulus both angiosperm MFT clades are retained whereas one MFT clade appears to be lost in P. persica and A. thaliana (Fig. 1). The more distantly related species Cucumis sativus also possess two MFT clades, suggesting that the two clades found in the Cannabaceae indeed represent the ancestral angiosperm duplication of MFT (Figure S1) [33]. Genes of the MFT clade typically function in seed dormancy [20, 22, 23].
Differential expression patterns in the C. sativa CEN/TFL1 clade suggest non-redundant functions in repressing flowering. The spatial pattern of CsTFL1 expression suggests a conserved function in determining plant architecture as in other species, given the higher expression observed in shoot apical meristem and stems [20, 69]. No expression was detected for CsCEN4 which could indicate that this is a pseudogene or expressed under specific conditions not analysed. The remaining three CsCEN-like genes are phylogenetically closer to one another, albeit with low branch support. CsCEN1, though located on chromosome X, is highly expressed in males and females, in all tissues and at all developmental stages (Fig. 6). The high expression of CsCEN1 in male individuals could be evidence for dosage compensation, a phenomenon previously described for X chromosomal genes of C. sativa [12, 70].
CsCEN2 expression mainly occurs in pre-flowering and flowering individuals, while CsCEN3 expression is found in both vegetative and flowering samples, suggesting that CsCEN3 functions at an earlier developmental stage from CsCEN2. Overall, these expression patterns suggest that CsCEN1, CsCEN2 and CsCEN3 may have differential but important roles in repressing flowering at different developmental stages and in different tissues. In A. thaliana BFT is associated with delayed flowering under high salinity, while in Populus spp. and Actinidia chinensis (kiwifruit), CEN/TFL1-like genes regulate seasonal flowering and dormancy release [71,72,73]. Thus, members of the CEN/TFL1 clade may be promising candidates for investigating differences in life history strategy in the annual C. sativa and perennial H. lupulus in the Cannabaceae.
Sex chromosome gene duplications may explain sexually dimorphic flowering in C. sativa
Sexual dimorphic flowering time has been previously reported in C. sativa [8, 74, 75] and H. lupulus [76, 77]. However, the genetic underpinnings conferring flowering sexual dimorphism are unknown. In previous research, we analysed a C. sativa population segregating for two loci associated with photoperiod insensitive flowering, Autoflower1 and Autoflower2 [8]. When cultivated in non-inductive long-day conditions, some individuals lacking the photoperiod insensitive alleles at both loci still flowered. All of those plants were male. This suggested the existence of loci related to flowering on the C. sativa Y chromosome.
On chromosome Y several CsFT2 copies are found, in ‘Kompolti’ termed CsFT2aY_1, CsFT2aY_2, CsFT2bY, CsFT2cY (Figs. 2, 3, 5d). Repetitive elements make it difficult to generate a high-quality Y chromosome assembly. Thus the exact number of CsFT2 duplicates on chromosome Y may be overestimated in the current assemblies, however at least two full-length CsFT2 copies are likely located on chromosome Y (Fig. 5). Furthermore, CsFT2 genes appear to be overexpressed in males vs. females at early developmental stages, before the onset of male flowering (Fig. 6b). We therefore speculate that the earlier flowering observed in C. sativa males may be associated with the chromosome Y-specific CsFT2 duplicates.
It would be interesting to delve into the evolutionary history of these duplicates and whether they also precede the speciation event 20 million years ago leading to C. sativa and H. lupulus [12].
As aforementioned, HlFT2 is an ortholog of CsFT2 and also located on chromosome X in H. lupulus. A recent genome assembly of Ficus hispida of the Moraceae family (one of the closest related families to the Cannabaceae and predominantly dioecious) also identified male-specific FT-like genes [78]. FT orthologs have also been detected in the male-specific region in Amaranthus tuberculatus and Amaranthus palmeri where it is postulated to contribute to male fitness [79, 80]. In Ambrosia artemisiifolia, earlier male flowering increases reproductive success [81]. Therefore it will be interesting to assess the connection between reproductive success and male flowering variation in C. sativa, in addition to investigating the evolutionary history of FT duplications on sex chromosomes in species closely related to C. sativa. However, it is also important to note that flowering regulators other than PEBP genes are located on the sex chromosomes and thus may have a role in sexually dimorphic flowering in C. sativa. For instance, a homolog of FLOWERING LOCUS D was recently found to be located on chromosome Y and overexpressed in male individuals [82].
Uncovering the genetic control of sexually dimorphic flowering in C. sativa would have several conceivable applications. Future breeding efforts may seek to substantially delay male flowering until the cannabinoid-producing female flowers have matured and are ready for harvest. Alternatively, it is more conceivable to have extremely early flowering males that would complete their life cycle faster and thus wouldn’t compete with females for resources (light, space, nutrients) in the field. Finally, it may be a target to synchronise male and female flowering for fibre hemp, as the lignification that coincides with flowering reduces fibre quality [83, 84], and thus synchronising the flowering time of both sexes would ensure the largest possible yield, with the highest fibre quality.
Photoperiod insensitivity in C. sativa may be a stepwise system involving CsFT1 duplications
Previous research has demonstrated a duplication of CsFT1 in a QTL for the photoperiod-insensitive flowering in ‘FINOLA’ [8]. In this study, we showed that this duplication also exists in the two publicly available genomes of ‘Kompolti’ and ‘Boone County’ (Figs. 2, 3). At the protein sequence level, there are three main groups of alleles: those resembling CsFT1 (‘CBDRx’-like), those resembling CsFT1a (‘FINOLA’) or CsFT1b (also ‘FINOLA’) (Fig. 4a). Interestingly, the sequenced ‘Kompolti’ individual is heterozygous for the duplication and thus possesses all three allele types (Fig. 4a, b, c). ‘Boone County’ is homozygous for the duplication and carries CsFT1a and CsFT1b (Fig. 4a, b, c).
Several substitutions between different CsFT1 alleles were identified, including N76S. This substitution differentiates FT and CsFT1a (S76) from CsFT1 and CsFT1b (N76) allele types (Fig. 4a). In A. thaliana S76 may be important for the efficient accumulation of FT in the shoot apical meristem, affecting flowering time control [68]. It is possible that the presence of serine at position 76 in CsFT1a may contribute to fine-tuning of flowering time control in C. sativa.
At the genomic level, dot plots confirmed the similarity between the ‘FINOLA’, ‘Boone County’ and ‘Kompolti’-B CsFT1 duplication (Fig. 4b). We previously hypothesised that a putative helitron-like transposable element in the large intron of CsFT1a may have a regulatory role in ‘FINOLA’ [8]. This putative transposable element was also found in the large intron of CsFT1a in ‘Boone County’ and ‘Kompolti (Fig. 4c). Furthermore, considerable intron differences exist between CsFT1b of ‘FINOLA’, ‘Kompolti’ and ‘Boone County’ with an increase in the length of the second intron in the latter two genomes (Fig. 4c), which might have functional importance in gene expression regulation.
The terminal flowering time (here defined as when clusters of flowers fully open at the terminal inflorescence) of ‘Boone County’ is reported as 73 and 89 days respectively for male and female individuals (Table S1). Previously ‘Kompolti’ was shown to be photoperiod-sensitive [75, 85]. However, common accession names are not necessarily indicative of relatedness in C. sativa as previous work demonstrated genetic and phenotypic variation within accessions of the same name [8, 86]. Nonetheless, it is unclear whether the CsFT1 duplication can be consistently associated with photoperiod insensitivity in ‘Kompolti’ and ‘Boone County’ as has been suggested in ‘FINOLA’ [8]. It may be that the CsFT1 (Autoflower2) locus interacts with another flowering locus to promote photoperiod insensitivity. In P. sativum and H. vulgare interactions among several members of the FT family determine photoperiodic flowering response [87, 88 ].
Conclusion
In conclusion, our data indicate that PEBP genes in C. sativa are of crucial importance for reproductive transition and development and thus demonstrate significant promise to be harnessed in future breeding programmes for C. sativa, for example, to expand the geographic range of this crop. Based on our data, we speculate that the expansion of the PEBP gene family has likely been crucial in the latitudinal adaptation and evolution of sexually dimorphic flowering in C. sativa. This characterisation of the PEBP family will provide the basis for the development of molecular markers for crop improvement, as well as targets for understanding the sub-functionalisation of duplicated genes.
In future work, the development of a C. sativa pangenome in addition to a chromosome Y assembly in H. lupulus will be important to address questions on the presence/absence of gene duplications and their associations with flowering phenotypes.
Data availability
All genome sequences are publicly available and accessible via the URLs listed in the methods. RNA seq data used is available NCBI under bioprojects PRJNA956491 and PRJNA1126191.
References
Melzer R, McCabe PF, Schilling S. Evolution, genetics and biochemistry of plant cannabinoid synthesis: a challenge for biotechnology in the years ahead. Curr Opin Biotechnol. 2022;75: 102684. https://doi.org/10.1016/j.copbio.2022.102684.
Padgitt-Cobb LK, Pitra NJ, Matthews PD, Henning JA, Hendrix DA. An improved assembly of the “Cascade” hop (Humulus lupulus ) genome uncovers signatures of molecular evolution and refines time of divergence estimates for the Cannabaceae family. Hortic Res. 2023;10:uhac281. https://doi.org/10.1093/hr/uhac281.
Schilling S, Dowling CA, Shi J, Hunt D, O’Reilly E, Perry AS, Kinnane O, McCabe PF, Melzer R. The cream of the crop: biology, breeding, and applications of Cannabis sativa. Annu Plant Rev Online. 2021;4:471–528. https://doi.org/10.1002/9781119312994.
Spitzer-Rimon B, Duchin S, Bernstein N, Kamenetsky R. Architecture and florogenesis in female Cannabis sativa plants. Front Plant Sci. 2019;10:350–350. https://doi.org/10.3389/FPLS.2019.00350.
Toth JA, Stack GM, Carlson CH, Smart LB. Identification and mapping of major-effect flowering time loci Autoflower1 and Early1 in Cannabis sativa L. Front Plant Sci. 2022. https://doi.org/10.3389/fpls.2022.991680.
Leckie KM, Sawler J, Kapos P, Mackenzie JO, Giles I, Baynes K, Lo J, Celedon JM, Baute GJ. Loss of daylength sensitivity by splice site mutation in Cannabis. bioRxiv 2023:2023.03.10.532103. https://doi.org/10.1101/2023.03.10.532103.
Garfinkel AR, Wilkerson DG, Chen H, Smart LB, Rojas BM, Getty BA, Michael TP, Crawford S. Genetic mapping of SNP markers and candidate genes associated with day-neutral flowering in Cannabis sativa L. bioRxiv 2023:2023.04.17.537043. https://doi.org/10.1101/2023.04.17.537043.
Dowling CA, Shi J, Toth JA, Quade MA, Smart LB, McCabe PF, Schilling S, Melzer R. A flowering locus T ortholog is associated with photoperiod-insensitive flowering in hemp (Cannabis sativa L.). Plant J. 2024;119:383–403. https://doi.org/10.1111/tpj.16769.
Steel L, Welling M, Ristevski N, Johnson K, Gendall A. Comparative genomics of flowering behavior in Cannabis sativa. Front Plant Sci 2023;14:1–22.
Dowling CA, Melzer R, Schilling S. Timing is everything: the genetics of flowering time in Cannabis sativa. Biochemist. 2021;43:34–8. https://doi.org/10.1042/bio_2021_138.
Thomas GG, Schwabe WW. Factors controlling flowering in the hop (Humulus lupulus L.). Ann Bot. 1969;33:781–93. https://doi.org/10.1093/oxfordjournals.aob.a084324.
Prentout D, Stajner N, Cerenak A, Tricou T, Brochier-Armanet C, Jakse J, Käfer J, Marais GAB. Plant genera Cannabis and Humulus share the same pair of well-differentiated sex chromosomes. New Phytol. 2021;231:1599–611. https://doi.org/10.1111/nph.17456.
Fu XG, Liu SY, van Velzen R, Stull GW, Tian Q, Li YX, Folk RA, Guralnick RP, Kates HR, Jin JJ, Li ZH, Soltis DE, Soltis PS, Yi TS. Phylogenomic analysis of the hemp family (Cannabaceae) reveals deep cyto-nuclear discordance and provides new insights into generic relationships. J Syst Evol. 2023;61:806–26. https://doi.org/10.1111/jse.12920.
Heslop-Harrison J. The experimental modification of sex expression in flowering plants. Biol Rev. 1957;32:38–90. https://doi.org/10.1111/j.1469-185X.1957.tb01576.x.
Kobayashi Y, Weigel D. Move on up, it’s time for change—mobile signals controlling photoperiod-dependent flowering. Genes Dev. 2007;21:2371–84. https://doi.org/10.1101/gad.1589007.
Corbesier L, Coupland G. The quest for florigen: a review of recent progress. J Exp Bot. 2006;57:3395–403. https://doi.org/10.1093/jxb/erl095.
Putterill J, Varkonyi-Gasic E. FT and florigen long-distance flowering control in plants. Curr Opin Plant Biol. 2016;33:77–82. https://doi.org/10.1016/j.pbi.2016.06.008.
Corbesier L, Vincent C, Jang S, Fornara F, Fan Q, Searle I, Giakountis A, Farrona S, Gissot L, Turnbull C, Coupland G. FT protein movement contributes to long-distance signaling in floral induction of Arabidopsis. Science. 2007;316:1030–3. https://doi.org/10.1126/science.1141752.
Golembeski GS, Imaizumi T. Photoperiodic regulation of florigen function in Arabidopsis thaliana. Arabidopsis Book. 2015;13: e0178. https://doi.org/10.1199/tab.0178.
Wickland DP, Hanzawa Y. The FLOWERING LOCUS T/TERMINAL FLOWER 1 gene family: functional evolution and molecular mechanisms. Mol Plant. 2015;8:983–97. https://doi.org/10.1016/j.molp.2015.01.007.
Gaston A, Potier A, Alonso M, Sabbadini S, Delmas F, Tenreira T, Cochetel N, Labadie M, Prévost P, Folta KM, Mezzetti B, Hernould M, Rothan C, Denoyes B. The FveFT2 florigen/FveTFL1 antiflorigen balance is critical for the control of seasonal flowering in strawberry while FveFT3 modulates axillary meristem fate and yield. New Phytol. 2021;232:372–87. https://doi.org/10.1111/nph.17557.
Xi W, Liu C, Hou X, Yu H. Mother of FT and TFL1 regulates seed germination through a negative feedback loop modulating ABA signaling in Arabidopsis. Plant Cell. 2010;22:1733–48. https://doi.org/10.1105/tpc.109.073072.
Nakamura S, Abe F, Kawahigashi H, Nakazono K, Tagiri A, Matsumoto T, Utsugi S, Ogawa T, Handa H, Ishida H, Mori M, Kawaura K, Ogihara Y, Miura H. A wheat homolog of MOTHER OF FT AND TFL1 acts in the regulation of germination. Plant Cell. 2011;23:3215–29. https://doi.org/10.1105/tpc.111.088492.
Wang Z, Zhou Z, Liu Y, Liu T, Li Q, Ji Y, Li C, Fang C, Wang M, Wu M, Shen Y, Tang T, Ma J, Tian Z. Functional evolution of phosphatidylethanolamine binding proteins in soybean and Arabidopsis. Plant Cell. 2015;27:323–36. https://doi.org/10.1105/tpc.114.135103.
Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH. The role of recently derived FT paralogs in sunflower domestication. Curr Biol. 2010;20:629–35. https://doi.org/10.1016/j.cub.2010.01.059.
Zhang X, Wang C, Pang C, Wei H, Wang H, Song M, Fan S, Yu S. Characterization and functional analysis of PEBP family genes in upland cotton (Gossypium hirsutum L.). PLoS One. 2016;11:e0161080. https://doi.org/10.1371/journal.pone.0161080.
Nelson MN, Książkiewicz M, Rychel S, Besharat N, Taylor CM, Wyrwa K, Jost R, Erskine W, Cowling WA, Berger JD, Batley J, Weller JL, Naganowska B, Wolko B. The loss of vernalization requirement in narrow-leafed lupin is associated with a deletion in the promoter and de-repressed expression of a Flowering Locus T (FT) homologue. New Phytol. 2017;213:220–32. https://doi.org/10.1111/nph.14094.
Soyk S, Müller NA, Park SJ, Schmalenbach I, Jiang K, Hayama R, Zhang L, Van Eck J, Jiménez-Gómez JM, Lippman ZB. Variation in the flowering gene SELF PRUNING 5G promotes day-neutrality and early yield in tomato. Nat Genet. 2017;49:162–8. https://doi.org/10.1038/ng.3733.
Wu F, Sedivy EJ, Price WB, Haider W, Hanzawa Y. Evolutionary trajectories of duplicated FT homologues and their roles in soybean domestication. Plant J. 2017;90:941–53. https://doi.org/10.1111/tpj.13521.
Zhang M, Li P, Yan X, Wang J, Cheng T, Zhang Q. Genome-wide characterization of PEBP family genes in nine Rosaceae tree species and their expression analysis in P. mume. BMC Ecol Evol. 2021;21:32. https://doi.org/10.1186/s12862-021-01762-4.
Venail J, Da Silva Santos PH, Manechini JR, Alves LC, Scarpari M, Falcão T, Romanel E, Brito M, Vicentini R, Pinto L, Jackson SD. Analysis of the PEBP gene family and identification of a novel FLOWERING LOCUS T orthologue in sugarcane. J Exp Bot. 2022;73:2035–49. https://doi.org/10.1093/jxb/erab539.
Li Y, Xiao L, Zhao Z, Zhao H, Du D. Identification, evolution and expression analyses of the whole genome-wide PEBP gene family in Brassica napus L. BMC Genom Data. 2023;24:27. https://doi.org/10.1186/s12863-023-01127-4.
Bennett T, Dixon LE. Asymmetric expansions of FT and TFL1 lineages characterize differential evolution of the EuPEBP family in the major angiosperm lineages. BMC Biol. 2021;19:181. https://doi.org/10.1186/s12915-021-01128-8.
Chardon F, Damerval C. Phylogenomic analysis of the PEBP gene family in cereals. J Mol Evol. 2005;61:579–90. https://doi.org/10.1007/s00239-004-0179-4.
Liu YY, Yang KZ, Wei XX, Wang XQ. Revisiting the phosphatidylethanolamine-binding protein (PEBP) gene family reveals cryptic FLOWERING LOCUS T gene homologs in gymnosperms and sheds new light on functional evolution. New Phytol. 2016;212:730–44. https://doi.org/10.1111/nph.14066.
Wolabu TW, Zhang F, Niu L, Kalve S, Bhatnagar-Mathur P, Muszynski MG, Tadege M. Three FLOWERING LOCUS T-like genes function as potential florigens and mediate photoperiod response in sorghum. New Phytol. 2016;210:946–59. https://doi.org/10.1111/nph.13834.
Pieper R, Tomé F, Pankin A, Von Korff M. FLOWERING LOCUS T4 delays flowering and decreases floret fertility in barley. J Exp Bot. 2021;72:107–21. https://doi.org/10.1093/jxb/eraa466.
Grassa CJ, Weiblen GD, Wenger JP, Dabney C, Poplawski SG, Timothy Motley S, Michael TP, Schwartz CJ. A new Cannabis genome assembly associates elevated cannabidiol (CBD) with hemp introgressed into marijuana. New Phytol. 2021;230:1665–79. https://doi.org/10.1111/nph.17243.
Laverty KU, Stout JM, Sullivan MJ, Shah H, Gill N, Deikus G, Sebra R, Hughes TR, Page JE, Bakel HV, Sciences G, City Y, Technology G, City NY, Centre D, Centre M, Tower W. A physical and genetic map of Cannabis sativa identifies extensive rearrangements at the THC/CBD acid synthase loci. Genome Res. 2019;29:146–56. https://doi.org/10.1101/gr.242594.118.Freely.
Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–9. https://doi.org/10.1093/nar/gkl200.
Solovyev V. Statistical approaches in Eukaryotic gene prediction. In: Balding D, Cannings C, Bishop M, editor. Handbook of Statistical genetics. 3rd ed. Wiley-Interscience; 2007. p. 1616. Hoboken, New Jersey, USA.
Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Marchler GH, Song JS, Thanki N, Yamashita RA, Yang M, Zhang D, Zheng C, Lanczycki CJ, Marchler-Bauer A. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020;48:D265–8. https://doi.org/10.1093/nar/gkz991.
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–91. https://doi.org/10.1093/bioinformatics/btp033.
Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 2017. https://doi.org/10.1093/bib/bbx108.
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–95. https://doi.org/10.1093/bioinformatics/btp698.
Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018;46:W537–44. https://doi.org/10.1093/nar/gky379.
Verde I, Jenkins J, Dondini L, Micali S, Pagliarani G, Vendramin E, Paris R, Aramini V, Gazza L, Rossini L, Bassi D, Troggio M, Shu S, Grimwood J, Tartarini S, Dettori MT, Schmutz J. The peach v2.0 release: high-resolution linkage mapping and deep resequencing improve chromosome-scale assembly and contiguity. BMC Genomics. 2017;18: 225. https://doi.org/10.1186/s12864-017-3606-9.
One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and the phylogenomics of green plants. Nature. 2019;574:679–85. https://doi.org/10.1038/s41586-019-1693-2.
Kong F, Liu B, Xia Z, Sato S, Kim BM, Watanabe S, Yamada T, Tabata S, Kanazawa A, Harada K, Abe J. Two coordinately regulated homologs of FLOWERING LOCUS T are involved in the control of photoperiodic flowering in soybean. Plant Physiol. 2010;154:1220–31. https://doi.org/10.1104/pp.110.160796.
Wong T, Kalyaanamoorthy S, Meusemann K, Yeates D, Misof B, Jermiin L. AliStat version 1.3 2014.
Trifinopoulos J, Nguyen L-T, von Haeseler A, Minh BQ. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016;44:W232–5. https://doi.org/10.1093/nar/gkw256.
Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2017;8:28–36. https://doi.org/10.1111/2041-210X.12628.
Wang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, Lee T, Jin H, Marler B, Guo H, Kissinger JC, Paterson AH. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40: e49. https://doi.org/10.1093/nar/gkr1293.
Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, Xia R. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant. 2020;13:1194–202. https://doi.org/10.1016/j.molp.2020.06.009.
Shi J, Schilling S, Melzer R. Morphological and genetic analysis of inflorescence and flower development in hemp (Cannabis sativa L.) 2024:2024.01.25.577276. https://doi.org/10.1101/2024.01.25.577276.
Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–60. https://doi.org/10.1038/nmeth.3317.
Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28:2184–5. https://doi.org/10.1093/bioinformatics/bts356.
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. Stringtie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–5. https://doi.org/10.1038/nbt.3122.
Hanzawa Y, Money T, Bradley D. A single amino acid converts a repressor to an activator of flowering. Proc Natl Acad Sci. 2005;102:7748–53. https://doi.org/10.1073/pnas.0500932102.
Ho WWH, Weigel D. Structural features determining flower-promoting activity of Arabidopsis FLOWERING LOCUS T. Plant Cell. 2014;26:552–64. https://doi.org/10.1105/tpc.113.115220.
Nan H, Cao D, Zhang D, Li Y, Lu S, Tang L, Yuan X, Liu B, Kong F. GmFT2a and GmFT5a redundantly and differentially regulate flowering through interaction with and upregulation of the bZIP transcription factor GmFDL19 in soybean. PLoS One. 2014. https://doi.org/10.1371/journal.pone.0097669.
Cai Y, Wang L, Chen L, Wu T, Liu L, Sun S, Wu C, Yao W, Jiang B, Yuan S, Han T, Hou W. Mutagenesis of GmFT2a and GmFT5a mediated by CRISPR/Cas9 contributes for expanding the regional adaptability of soybean. Plant Biotechnol J. 2020;18:298–309. https://doi.org/10.1111/pbi.13199.
Ahn JH, Miller D, Winter VJ, Banfield MJ, Lee JH, Yoo SY, Henz SR, Brady RL, Weigel D. A divergent external loop confers antagonistic activity on floral regulators FT and TFL1. EMBO J. 2006;25:605–14. https://doi.org/10.1038/sj.emboj.7600950.
Page RDM, Charleston MA. From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol Phylogenet Evol. 1997;7:231–40. https://doi.org/10.1006/mpev.1996.0390.
Jin J, Yang M, Fritsch PW, Velzen R, Li D, Yi T. Born migrators: Historical biogeography of the cosmopolitan family Cannabaceae. J Syst Evol. 2020;58:461–73. https://doi.org/10.1111/jse.12552.
Lynch RC, Padgitt-Cobb LK, Garfinkel AR, Knaus BJ, Hartwick NT, Allsing N, Aylward A, Bentz PC, Carey SB, Mamerto A, Kitony JK, Colt K, Murray ER, Duong T, Chen HI, Trippe A, Harkess A, Crawford S, Vining K, Michael TP. Domesticated cannabinoid synthases amid a wild mosaic cannabis pangenome. Nature. 2025;643:1001-1010. https://doi.org/10.1038/s41586-025-09065-0.
Jin S, Nasim Z, Susila H, Ahn JH. Evolution and functional diversification of FLOWERING LOCUS T/TERMINAL FLOWER 1 family genes in plants. Semin Cell Dev Biol. 2021;109:20–30. https://doi.org/10.1016/j.semcdb.2020.05.007.
Endo M, Yoshida M, Sasaki Y, Negishi K, Horikawa K, Daimon Y, Kurotani KI, Notaguchi M, Abe M, Araki T. Re-evaluation of florigen transport kinetics with separation of functions by mutations that uncouple flowering initiation and long-distance transport. Plant Cell Physiol. 2018;59:1621–9. https://doi.org/10.1093/pcp/pcy063.
Baumann K, Venail J, Berbel A, Domenech MJ, Money T, Conti L, Hanzawa Y, Madueno F, Bradley D. Changing the spatial pattern of TFL1 expression reveals its key role in the shoot meristem in controlling Arabidopsis flowering architecture. J Exp Bot. 2015;66:4769–80. https://doi.org/10.1093/jxb/erv247.
Prentout D, Razumova O, Rhoné B, Badouin H, Henri H, Feng C, Käfer J, Karlov G, Marais GAB. An efficient RNA-seq-based segregation analysis identifies the sex chromosomes of Cannabis sativa. Genome Biol. 2020. https://doi.org/10.1101/gr.251207.119.
Mohamed R, Wang C-T, Ma C, Shevchenko O, Dye SJ, Puzey JR, Etherington E, Sheng X, Meilan R, Strauss SH, Brunner AM. Populus CEN/TFL1 regulates first onset of flowering, axillary meristem identity and dormancy release in Populus. Plant J. 2010;62:674–88. https://doi.org/10.1111/j.1365-313X.2010.04185.x.
Ryu JY, Lee HJ, Seo PJ, Jung JH, Ahn JH, Park CM. The Arabidopsis floral repressor BFT delays flowering by competing with FT for FD binding under high salinity. Mol Plant. 2014;7:377–87. https://doi.org/10.1093/mp/sst114.
Voogd C, Brian LA, Wang T, Allan AC, Varkonyi-Gasic E. Three FT and multiple CEN and BFT genes regulate maturity, flowering, and vegetative phenology in kiwifruit. J Exp Bot. 2017;68:1539–53. https://doi.org/10.1093/jxb/erx044.
Mediavilla V, Jonquera M, Schmid-Slembrouck I, Soldati A. Decimal code for growth stages of hemp (Cannabis sativa L.). J Int Hemp Assoc. 1998;5:68–74.
Schilling S, Melzer R, Dowling CA, Shi J, Muldoon S, McCabe PF. A protocol for rapid generation cycling (speed breeding) of hemp (Cannabis sativa) for research and agriculture. Plant J. 2023;113:437–45. https://doi.org/10.1111/tpj.16051.
Lloyd DG, Webb CJ. Secondary sex characters in plants. Bot Rev. 1977;43:177–216. https://doi.org/10.1007/BF02860717.
Shephard HL, Parker JS, Darby P, Ainsworth CC. Sexual development and sex chromosomes in hop. New Phytol. 2000;148:397–411. https://doi.org/10.1046/j.1469-8137.2000.00771.x.
Liao Z, Zhang T, Lei W, Wang Y, Yu J, Wang Y, Chai K, Wang G, Zhang H, Zhang X. A telomere-to-telomere reference genome of ficus (Ficus hispida) provides new insights into sex-determination. Hortic Res. 2023. https://doi.org/10.1093/hr/uhad257.
Raiyemo DA, Bobadilla LK, Tranel PJ. Genomic profiling of dioecious Amaranthus species provides novel insights into species relatedness and sex genes. BMC Biol. 2023;21:37. https://doi.org/10.1186/s12915-023-01539-9.
Montgomery JS, Giacomini DA, Weigel D, Tranel PJ. Male-specific Y-chromosomal regions in waterhemp (Amaranthus tuberculatus) and Palmer amaranth (Amaranthus palmeri). New Phytol. 2021;229:3522–33. https://doi.org/10.1111/nph.17108.
Aljiboury AA, Friedman J. Mating and fitness consequences of variation in male allocation in a wind-pollinated plant. Evolution. 2022;76:1762–75. https://doi.org/10.1111/evo.14544.
Shi J, Toscani M, Dowling CA, Schilling S, Melzer R. Identification of genes associated with sex expression and sex determination in hemp (Cannabis sativa L.). J Exp Bot. 2025;76:175–90. https://doi.org/10.1093/jxb/erae429.
Liu M, Fernando D, Daniel G, Madsen B, Meyer AS, Ale MT, Thygesen A. Effect of harvest time and field retting duration on the chemical composition, morphology and mechanical properties of hemp fibers. Ind Crops Prod. 2015;69:29–39. https://doi.org/10.1016/j.indcrop.2015.02.010.
Salentijn EMJ, Petit J, Trindade LM. The complex interactions between flowering behavior and fiber quality in hemp. Front Plant Sci. 2019. https://doi.org/10.3389/fpls.2019.00614.
Trubanová N, Pender G, McCabe PF, Melzer R, Schilling S. Exploring phenotypic and genetic variability in hemp (Cannabis sativa) 2023:2023.11.01.565084. https://doi.org/10.1101/2023.11.01.565084.
Zhang M, Anderson SL, Brym ZT, Pearson BJ. Photoperiodic flowering response of essential oil, grain, and fiber hemp (Cannabis sativa L.) cultivars. Front Plant Sci. 2021. https://doi.org/10.3389/fpls.2021.694153.
Kikuchi R, Kawahigashi H, Ando T, Tonooka T, Handa H. Molecular and functional characterization of PEBP genes in barley reveal the diversification of their roles in flowering. Plant Physiol. 2009;149:1341–53. https://doi.org/10.1104/pp.108.132134.
Hecht V, Laurie RE, Vander Schoor JK, Ridge S, Knowles CL, Liew LC, Sussmilch FC, Murfet IC, Macknight RC, Weller JL. The pea GIGAS gene is a FLOWERING LOCUS T homolog necessary for graft-transmissible specification of flowering but not for responsiveness to photoperiod. Plant Cell. 2011;23:147–61. https://doi.org/10.1105/tpc.110.081042.
Acknowledgements
CAD would like to thank Louise Ryan for helpful discussions at the commencement of this project.
Funding
CAD is supported by an Irish Research Council–Environmental Protection Agency Government of Ireland Postgraduate Scholarship (grant no. GOIPG/2019/1987).
Author information
Authors and Affiliations
Contributions
CAD, RM, SS, PFM, and TPM designed and managed the project. CAD performed data analyses. CAD, RM, and SS wrote the manuscript. All authors revised, read, and approved the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Dowling, C.A., Michael, T.P., McCabe, P.F. et al. FT-like genes in Cannabis and hops: sex specific expression and copy-number variation may explain flowering time variation. BMC Genomics 26, 930 (2025). https://doi.org/10.1186/s12864-025-11975-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12864-025-11975-2