Abstract
Background
Capsicum (Solanaceae) is a globally important vegetable crop and is also used therapeutically in traditional medicine systems. However, little is known of the genetic variation within the commonly grown cultivars, the evolutionary relationships and differences in the chloroplast (cp.) genomes between Capsicum species remain unclear.
Results
The cp. genomes of 32 Capsicum varieties in three species from 6 countries were investigated. The cp. genome of Capsicum was found to be ~ 156 kb in length and to contain 113 unique genes, of which 79 encoded proteins, 30 encoded transfer tRNAs, and 4 were for ribosomal RNAs. The 32 varieties that we chose for study represented 13 genotypes, containing a total of 608 indels, 83 SNPs, 47 SSRs and 281–306 repeat sequences. We then included several previously sequenced Capsicum cp. genomes, and found that the nine investigated species showed a number of differences in the characteristics of the four IR boundaries, and it was the non-coding regions that contained the most variable regions. We conducted a phylogenetic reconstruction using the cp. genomes of 43 representative species of Solanaceae, and the resulting phylogeny generally reflected the currently accepted classification, with the species of the pungent group having close relationship with one another.
Conclusions
This study provides a comprehensive analysis of Capsicum chloroplast genomes, revealing significant variations in IR boundaries and other genomic features. These findings enhance our understanding of Capsicum evolution and genetic diversity.
Similar content being viewed by others
Introduction
Chilis and sweet (bell) peppers or paprika are members of the genus Capsicum spp. (Solanaceae), which is thought to contain at least 35 species and which originated in Central and South America [1]. Certain plants of this genus have global importance as vegetables, medicine, and ornamentals. Especially C. chinense Jacq., C. baccatum var. pendulum L., C. pubescens Ruiz & Pav., C. annuum L. and C. frutescens L. have high economic value and are widely cultivated and traded globally [2, 3]. Archeological microfossil evidence has indicated that peppers were domesticated in the Americas and have been consumed in this region for more than 7 000 years [4]. Pepper was introduced into China during the late 16th century [5]. Due to the complex geographical environment and climatic conditions, abundant germplasm resource of pepper have evolved or been shaped in China.
The investigation of the genetic diversity present in a group can help to reconstruct the origins and evolution of the species within that group, and can be invaluable in genetic breeding programs [6]. Because the Capsicum group is so large, understanding its genetic diversity is key to exploiting its genetic resources fully. The genetic diversity, population structure and phylogenetic relationships within C. annuum have been investigated in previous studies using SSR markers [7,8,9], morphological, physiological and yield traits [10, 11], and SNPs [12, 13]. Other studies have focused on the genetic variation within C. frutescens [2, 14,15,16], C. pubescens [18] and C. chinense [17,18,19] using morphological, physiological and molecular markers. The use of different markers for each species also makes comparison of genetic diversity across species impossible. Moreover, most of the molecular markers used in these studies were markers from nuclear genes, and to date, the (cp.) genome has been only rarely used to study Capsicum genetics.
The cp. genome, which is inherited from the maternal parent, is a circular, double-stranded, DNA molecule [20, 21]. It is small, with a low molecular weight, and ranges in size between 120 and 160 kb [31]. The structure of the cp. is relatively simple and quadripartite, with large (LSC) and small (SSC) single-copy regions often separated by two inverted repeat (IR) regions [22,23,24,25]. In angiosperms, between 110 and 130 genes are normally present on the cp. genome [26].
The cp. genome related to biosynthesis, transcription/translation and photosynthesis [27] and is inherited from the maternal parent [20, 21]. Despite the sequence and gene content of the plant cp. genome are highly conserved [25], gene loss, mutation, and pseudogenization often occurs in evolutionary history [28]. Cp genomes, proposed as “DNA super barcodes” by some previous studies [20], which are often used in species identification and analyses of genetic diversity [27, 29, 30], and has been widely used to revealed the phylogenetic, taxonomic and evolutionary studies [31]. After the first assembling of cp. genome of Nicotiana tabacum [32], and following the rise of sequencing technology, the cp. genomes of many plant species have been sequenced.The cp. genomes of several Capsicum species, including C. annuum C. annuum [33], Capsicum annuum var. glabriusculum [34], C. frutescens [35], C. chinense [36], C. baccatum [37], C. eximium [38], and C. eximium [43] have also been sequenced in previous studies. However, to date, there has been no comparison of genetic diversity among the different Capsicum species based in cp. genome. In this study, we re-sequenced and assembled the cp. genomes of 32 samples of Capsicum landraces and varieties. We then analyzed these genomes for GC content, number of genes, number of repeat sequences, codon usage bias and number of simple sequence repeats (SSR). The gene sequences of the IR regions and gene differentiation within different Capsicum species were compared. The sequences generated in this study were then combined with those of other Solanaceous species downloaded from NCBI, and the evolutionary relationships between Capsicum and other species of Solanaceae as inferred from the cp. genomes was discussed.
In conclusion, this study aimed to: (a) Characterize the cp. genomes of diverse Capsicum varieties; (b) Explore differentiation in the cp. genomes of representative Capsicum varieties, and (c) Reveal the phylogenetic relationships among Capsicum varieties and represent species of Solanaceae based on the complete cp. genome. These results will further our knowledge of the evolutionary origins of the Capsicum genus, and will enable future researchers to more fully utilize the germplasm resources available for this diverse and globally important group.
Materials and methods
Plant materials, DNA extraction and whole genome resequencing
We sampled a total of 32 pepper samples from three Capsicum species (C. annuum, C. chinense, and C. frutescens) taken from six countries. Samples had been collected by our project team on previous expeditions (Table 1). Seeds from these samples were planted in a greenhouse and the fresh leaves were harvested from the young plants when they were 20 days old. A voucher specimen of C. annuum has been deposited in the herbarium of the Kunming Institute of Botany, Chinese Academy of Sciences (KUN 1395981).
The harvested fresh leaves were stored in a deep freeze at -80 °C until needed. Total DNA was then extracted from the leaf material using a CTAB method as previously described [39]. The DNA was quantified using agarose gel electrophoresis (Omega Bio-Tek, Norcross, GA, United States), and the DNA in each sample was quantified and assessed for quality using a fluorometer (Qubit3.0, Thermo Fisher Scientific, Waltham, MA, United States). High-quality samples were then standardized to 10 µl and 200 µg DNA.
An Illumina NovaSeq6000 sequencing platform was used to fragment the genomic DNA, and the fragmented DNA (insert sizes ~ 450 base-pairs (bp)) was used to constructed libraries, resulting in the generation of 150 bp paired-end reads. Low-quality reads were filtered out of the raw sequencing data using fastp 0.21.0. The average depth of coverage was 5 ×. Low-quality reads (50% or more of the bases with a quality score < 10) were filtered out. The clean data were used in the subsequent analyses.
Chloroplast genome assembly, gene annotation and sequence analysis
The GetOrganelle pipeline (https://github.com/Kinggerm/GetOrganelle) was used for assembly of the clean reads, and the contigs were checked using a C. annuum reference genome (NC028007) in BLAST (https://blast.ncbi.nlm.nih.gov/). The reference genome was then used to position and align the contigs. We checked for possible misassembled regions through mapping the raw reads onto the final contig and observing the coverage patterns. and then used the CpGAVAS pipeline [40] to automatically annotate the genome, and Geneious 8.1 [41] to identify the start/stop codons and intron/exon boundaries. tRNAscan-SE v2.0 [42] was then employed to identify regions encoding tRNAs, and we used OGDraw v1.2 (http://ogdraw.mpimp-golm.mpg.de/) [43] to draw a physical map of the cp. genome.
Analysis of the features of the chloroplast genome
Repetitive sequences in the Capsicum cp. genome longer than or equal to 16 bp, including palindromic repeats, forward tandem repeats, complement repeats and reverse repeats, were identified using REPuter [44], with a minimum alignment scored and maximum period size of 500. SSR markers were identified using Phobos v3.3.12 [45] and SSRHunter [46], which can identify multinucleotide repeats with four or more copies and with lengths of between 2 and 6 bp. Codon usage analysis and calculation of relative synonymous codon usage (RSCU) were conducted using the MEGA v11 software [47].
Chloroplast genome genetic diversity analyses of Capsicum individuals
An alignment of cp. genome sequence data that could be used in DNAsp analysis was produced in MAFFT V7.471 (Kazutaka Katoh, Japan) [48]. We then used DNAsp to identify the single nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms (indels) present [49]. This analysis generated genotype data files and calculated the haplotype diversity (Hd). We then conducted a sliding window analysis using DNAsp [49] with a window length of 100 bp and a step size of 25 bp, Genetic relationships between the different genotypes present in our analyses were investigated through a genotype network analysis in NETWORK v10200 [50]. All the above analyses included all the indels identified in the aligned sequences.
Comparison of the genome within the genus Capsicum
The borders between the IR and the SSC and LSC regions may change over evolutionary time, which may also result in size differences between the cp. genomes of different species or varieties [57,58,59]. We used IRscope10 [51] to investigate the IR boundaries in the cp. genomes of the three species sampled in our study (C. annuum, C. frutescens and C. chinense) together with those of other six Capsicum species downloaded from NCBI (C. lycianthoides, C. chacoense, C. eximium, C. galapagoense, C. pubescens and C. tovarii). We used the online software mVISTA11 [52] with the Shuffle-LAGAN alignment model [53] and with C. annuum as a reference genome, to investigate differences among the cp. genomes of these nine Capsicum species.
Phylogenetic reconstruction and population structure analysis
We downloaded cp. genome sequences of 40 species in the Solanaceae as well as that of an outgroup (Helianthus annuus, Asteraceae) from GenBank. We then aligned these 40 sequences using MAFFT [48] together with the 3 species generated in our study. We reconstructed a maximum likelihood (ML) phylogenetic tree using MEGA v11 [47] using these sequences, with 1000 bootstrapping replicates, in order to investigate the phylogenetic placement of Capsicum within the Solanaceae. The nucleotide substitution model GTR + G + I was estimated with MEGA v11. We estimated a pairwise distance matrix with Maximum Composite Likelihood (MCL) and we applied the Neighbor-Join and BioNJ algorithms. The resulting topologies with superior log likelihood values were selected automatically as the starting trees for the heuristic search.
Results
Genetic diversity analyses of the Capsicum chloroplast genome based on 32 varieties
In this study, we sequenced and assembled the cp. genomes of 32 Capsicum varieties. A total of 13 genotypes were resolved in the 32 varieties, and each genotype was submitted to GenBank (Table 2). A total of 608 InDels and 83 SNPs were found in the genomes of the 32 Capsicum samples. Of these 83 SNPs, 8 were singleton variable sites and 75 were parsimony-informative sites, and the nucleotide diversity was 0.424. Most (67) of the total 83 SNPs appeared between the pungent group and non-pungent group, only 11 and 8 polymorphic sites within the the pungent group and non-pungent group, respectively. (Table 3). There was considerable genetic variation between the genomes of C. chinense and C. frutescens (the pungent varieties) and those of C. annuum (the non-pungent varieties). In the pungent group, we found 4 genotypes and 8 polymorphic sites, and the nucleotide diversity was 0.070. Genotype 10 represented thegenotype, and contained seven variates (genotype 10: CXJ4, CXJ6, CXJ7, CXJ8, CXJ74, CZM8, CZM9). In the non-pungent group, we found 9 genotypes and 11 polymorphic sites, and the nucleotide diversity was 0.042. Genotype 7 was most common and had seven variates (Genotype 7: CS13, CS14, CS17, LS5, TJ05, TJ06, TJ08) (Table 2). Considering, only 13 genotypes were revealed in the 32 varieties and the 13 genotypes (Genotype 1–13) divided into two lineages. We chose representative varieties (CXJ4 for C. frutescens and CS 13 for C. annuum for later characterization of the cp. genome.
We constructed an ML tree using 13 varieties, each of which represented one of the variants identified above. We found that different varieties of C. annuum, C. chinense and C. frutescens clustered together with others of the same species, and forming monophyletic groups of that species. The genetic relationship between the two pungent species, C. chinense and C. frutescens, was close (Fig. 1A). Similar relationships were revealed in the Network analyses. We identified nine genotypes (G1-G9) from C. annuum, three genotypes (G10-G12) from C. frutescens, and one (G13) from C. chinense (Fig. 1B).
Features of the Capsicum chloroplast genome
The 13 representative Capsicum cp. genomes identified in this study ranged in length from 156,729 (CS13) to 156,950 bp (CXJ4), which was very similar to the lengths of most previously published cp. genomes (Table 4). We found that each of the 13 representative Capsicum cp. genomes formed a single, circular DNA sequence (Fig. 2). All showed a classical tetrad structure, and contained two copies of the IR, between 25,748 (CS13) and 25,847 bp (CXJ4) in length. The LSC was between 87,380 (CS13) and 87,344 bp (CXJ4) long, and the SSC between 17,853 (CS13) and 19,912 bp (CXJ4) (Fig. 2; Table 1). The 13 representative Capsicum cp. genomes analyzed all had a GC content of 37.7% (Table 4). In each of the analyzed Capsicum cp. genomes, we found 132 genes, of which 8 encoded ribosomal RNAs (rRNA), 37, transfer RNAs (tRNA), and 87 were protein-coding genes (PCGs) (Fig. 2). We were able to assign each gene to one of four functional categories: photosynthesis (47 genes), self-replication (73), biosynthesis (5) or unknown function (6) (Fig. 2; Table 5). Nineteen gene species in the IR were found to be either partially or fully duplicated. These included eight PCGs, (ndhB, rps7, rps12, rpl2, rpl23, ycf1, ycf2 and ycf15), seven genes encoding tRNAs (trnA-TGC, trnI-CAT, trnI-GAT, trnL-CAA, trnN-GTT, trnR-ACG and trnV-GAC), and the four rRNA genes (4.5 S, 5 S, 16 S and 23 S) (Table 5). 113 unique genes were found in the Capsicum cp. genome (79 PCG genes, 30 tRNA genes and 4 rRNA genes). The structure of the cp. genome is likely to be highly conserved in Capsicum, because the structural elements that we observed were identical in all of the 13 Capsicum varieties analyzed.
Analysis of SSRs, repeat sequences and codon usage bias in the Capsicum chloroplast genome
The two most common genotypes among the Capsicum individuals sequenced in this study were CS13 and CXJ4. We therefore selected these two genotypes for the following analyses of SSR, repeat and codon usage bias. CS13 and CXJ14 were found to be similar in the number and composition of SSRs, with each containing only 47 identified SSRs (Fig. 3A). We found 16 SSRs of type AT, and 10 of type TA. Most of the SSRs were dinucleotides (91.48% of the total SSRs), trinucleotides (0.64%) or tetranucleotides (0.21%). Twenty-three SSRs were found in the LSC region (Fig. 3A), ten in the IR regions, and four in the SSC. There was an A/T nucleotide bias in the Capsicum cp. SSRs, with A/T repeats making up 61.7%.
We then investigated the presence of repetitive sequences in the Capsicum cp. genome. The varieties with the CS13 genotype contained 281 repetitive sequences, all between 16 and 76 bp in length. 109 forward repeats, 80 palindromic repeats, 61 reverse repeats and 31 complement repeats were found in this genotype. The CXJ4 varieties were found to contain 306 repetitive sequences, ranging in length between 16 and 76 bp. 122 forward repeats, 93 palindromic repeats, 63 reverse repeats and 28 complement repeats were found in the CXJ4 genotype. The numbers of the different types of repeats are given in Fig. 3B.
We next analyzed codon usage in the protein-coding genes. A total of 40 codons were found with an RSCU > 1.0, with AUU (4.14%), AAA (3.96%), GGA (3.84%), AAU (3.76%) and UUU (3.63%) being the most commonly used codons. Leu (L) was found to be the most common amino acid in the cp genome (3234 times), followed by Ser (S) and Ile (I), both of which were found > 2000 times. Trp (W) and Met (M) were the least commonly used amino acids, and occurred 485 and 620 times, respectively (Table 6). Codon preference analysis showed that codons which the 3’ ends containing A or T were preferred and the RSCU values always higher than 1.
IR region expansion and contraction in the Capsicum cp. genome
The IR boundaries of cp. genomes of nine Capsicum species (C. annuum, C. chacoense, C. chinense, C. eximium, C. frutescens, C. galapagoense, C. lycianthoides, C. pubescens, C. tovarii) were then compared. Of these species, C. lyclanthoides has the longest complete cp. genome (155,583 bp), and C. pubescens has the shortest (157,390 bp). All nine Capsicum species studied had a cp. genome structure typical of the angiosperms: quadripartite, with large (LSC) and small (SSC) single-copy regions separated by two inverted repeat (IR) regions (Fig. 4). The regions spanning the IR/LSC and IR/SSC junctions were compared in our nine representative Capsicum species. We found that the IR characteristic regions varied in length between 25,624 in C. lyclanthoides, and 25,887 bp in C. pubescens. Similarly, the SSC ranged in length from 86,813 to 87,688, and the LSC ranged from 86,813 in C. lyclanthoides to 87,688 bp in C. pubescens. Thus, variation in the size of Capsicum genomes appears to occur as a result of variation in the lengths of the IR, SSC and LSC regions, rather than only on IR size variation as is usual in most species.
Comparison of border distance between adjacent genes and junctions of the LSC, SSC and two IR regions among the chloroplast genomes of seven Capsicum species. Boxes above or below the main line indicate the adjacent border genes. The figure is not to scale with respect to sequence length, and only shows relative changes at or near the IR/SC borders
Our nine study species varied slightly at the IR/LSC and IR/SSC junctions. The genes rps19, rpl2, ycf1 and trnH were found at either the IR/LSC or the IR/SSC boundary. The IRb-LSC boundary was similar in all the tested Capsicum species, as was the IRa-SSC boundary, and these were located in the genes rps19 and ycf1, respectively. The IRa-LSC boundary fell between rpl2 and trnH (Fig. 4). The IRb-SSC boundary was a little different. In seven of our study species, this boundary fell in the ycf1 gene, however, that of C. lyclanthoides was located in the intergenic region between the ycf1 and ndhF genes. Our results therefore suggest that in Capsicum species, the IR/LSC and IR/SSC junction regions are highly conserved.
mVISTA comparison of chloroplast genomes in Capsicum
We used mVISTA to construct multiple alignments of the nine study Capsicum cp. genomes. C. lyclanthoides was used as a reference genome (Fig. 5). We found that overall, the cp. genomes in Capsicum species were highly conserved. As is expected, the coding regions were found to be less divergent than non-coding regions, and the IR regions were more highly conserved than were either the LSC or SSC region. The intron-containing genes were highly variable in the Capsicum cp. genomes studied, and intergenic spacers (trnL-trnF, rps12-rpl20, rpl32-ndhF, trnV-rps7, rps16-trnQ, petA-psbL, and trnK-rps16) were the most highly divergent sequences in the nine chosen Capsicum cp. genomes. The trnK, rpl20, ycf1 and ycf2 sequences appear to evolve rapidly in Capsicum, as these were the coding regions with the highest divergence, and are therefore potentially of use as markers for the taxonomic classification and phylogenetic reconstruction of Capsicum species.
Comparison of four cp. genomes using the mVISTA alignment program. The x-axis represents the coordinates in the cp. genome. The y-axis indicates the average percent identity of sequence similarity in the aligned regions, ranging between 50% and 100%. Purple bars represent exons, blue bars represent untranslated regions (UTRs), pink bars represent noncoding sequences (CNS), gray bars represent mRNA, and white bars represent differences in genomics
In order to reveal the varieties discriminatory efficiency of Capsicum, we extracted the divergent rpl20 and trnK gene sequence of the cp. genome from the nine species, and align to data matrixes, using haplotype analysis of DNAsp, we found that there were 7 genotypes in rpl20 gene from the nine species (Fig. 6A). Both C. frutescens and C. tovarii, C. chinense and C. eximium shared the same genotype. Discriminatory efficiency reached to 77.8%. the trnK gene produced 6 genotypes (Fig. 6B) and the discriminatory efficiency reached to 66.7%, C. frutescens and C. tovarii shared the same genotype, so did C. eximium, C. chinense and C. galapagoense. Discriminatory efficiency applying cp. genome region is high.
Levels of sequence divergence and nucleotide variability (π) were then examined using DNAsp within our nine aligned Capsicum cp. genome sequences. Interestingly, we found that despite the relatively close relationships between our study species, the genomes were nevertheless divergent, with nucleotide variability (π) being 0.0025. We found a total of 1,408 mutations, of which were 1216 SNPs and 184 were parsimony informative. Most of this variation occurred in the LSC regions, while the IR region was relatively conserved, and intergenic regions on the picks of diversity were showed in sliding window analysis (Fig. 7), such as trnL-trnF, rps12-rpl20, trnV-rps7, rpl32-ndhF and rps7-rrn16S were highly variable, this result is consistent with mVISTA analysis. The cp. genome is therefore potentially informative for Capsicum phylogenetic reconstruction at the species level, with the LSC being useful in both phylogenetic and genetic diversity analyses.
Phylogenetic analysis of Solanaceae species to reveal the phylogenetic relationship of Capsicum species
We constructed an alignment of cp. genome sequences, including those of the three species sequenced in this study as well as 40 further species in the Solanaceae downloaded from GenBank (Fig. 8). The best nucleotide substitution model was estimated by MEGA to be GTR + G + I, and the whole cp. genomes were used in the reconstruction of the ML trees. The results of this analysis were consistent with previous phylogenetic reconstructions in this groups, and also with the traditional classification of the Solanaceae. The different genera, including Lycium, Physalis, Nicotiana, Petunia and others, can be distinguished, and the Capsicum species clustered closely together in a monophyletic group. We found that C. annuum and C. tovarii were sister groups with a bootstrap value of 100%, and that C. chinense, C. eximium and C. frutescens were also sister groups with a bootstrap value of 87%. The whole cp. genome is therefore appropriate for the phylogenetic reconstruction of evolutionary relationships within the Solanaceae.
Discussion
Characteristics of the Capsicum chloroplast genome
Chloroplast genomes can be useful in the investigation of evolutionary relationships in and among plant species [54]. The cp genomes of certain Capsicum species have been previously reported [35,36,37,38], however, despite the global importance of Capsicum as a crop plant, systematic research and in-depth analyses of the evolutionary relationships in Capsicum is lacking. We sequenced the cp genomes of 32 varieties of three Capsicum species (C. annuum; C. chinense and C. frutescens) from six countries. In all three species studied, the cp genome showed a conserved quadripartite structure ranging from 156,729 to 156,950 bp, which is similar to in the cp genomes in most terrestrial plants [55]. A total of 132 genes were found in all three species, which is consistent with results from other studies, including C. annuum var. annuum [33], Capsicum annuum var. glabriusculum [56], C. frutescens [35], C. chinense [36]. C. baccatum [37], C. eximium [38]. Our functional analysis of the Capsicum cp genome also gave results similar to those reported for other species of in this genus [38]. We were able to divide the genes present into three major functional categories, including genes encoding components in the photosynthetic system, the genetic system, and open reading frame and other genes. Analyses of the IR region demonstrated relatively high levels of conservation in the IR/LSC and IR/SSC junction regions between different Capsicum species. We found that the AT content of the Capsicum cp genome was enriched (63%), which is consistent with previous reports from Capsicum [35, 56], and indeed from most higher plants [57]. This might also explain the fact that in our analysis of codon preference, the 3’ ends of most codons containing A or T had an RSCU > 1, and that these codons were preferred.
Genetic variation and genotype in Capsicum varieties
Capsicum is one of the most important spice crop genera worldwide. The genus is thought to have originated and been domesticated in Mexico, and to have secondary centers of origin in Guatemala and Bulgaria [58]. Columbus is credited with bringing Capsicum crops to Europe in the 15th century, from where they spread to Africa and Asia, including India, China and Japan, along the spice routes [59]. These crops therefore have a long history of cultivation in different areas of the world, and as a consequence, there is significant genetic variation in the different areas [60]. Knowledge of the germplasms available is necessary for the scientific breeding of new varieties with improved resistances against disease and adverse conditions. Landraces and cultivars of crops from different areas offer diverse germplasm resources that can be exploited in these attempts [61]. However, the landraces and cultivars that have been bred in Capsicum species have not, to date, been scientifically applied to crop improvement, and require further and extensive genetic diversity studies.
We sequenced the cp. genomes of 32 varieties of Capsicum to investigate genetic diversity in this genus, the 32 varieties only produce only 13 cp. genotypes, that means cp. genome of Capsicum is conservative, and the different phenotypes of varieties may be controlled by nucleotide genome. The nucleotide variability and the sliding window analysis showed that the genomes were relatively divergent despite the relatedness of the study species and most variation occurred in the LSC regions and the non-coding sequences, intron containing genes had higher levels of variability, which is similar to other species reported before [62, 63].
In this study the cp. genomes of three species and 32 varieties of Capsicum were sequenced, and the genetic diversity in this genus was investigated. We found only 13 cp. genotypes in the 32 varieties sampled, suggesting that cp. genome is conserved between Capsicum species, and that the diverse phenotypes observed in different varieties may be controlled by the nuclear genome. However, the nucleotide variability and the sliding window analysis suggested still certain divergence between the cp. genomes, and most variation occurring in the LSC regions and the non-coding sequences. As with the cp. genomes reported from previous studies [62, 63].
The DNA barcoding in the plant genome has been used for identifying species in diverse samples [64]. Many studies considered the regarding locus choice for DNA barcoding, and so many single copy gene locus and combined loci have been proposed to use as DNA barcoding sequences [65]. In cp. genome, locus trnK, trnH-psbA, matK and rbcL were universally recognized as barcode to species identification [66]. However, in our study, the rpl20 gene seems to have the highest discriminatory efficiency in Capsicum identification.
Comparison and phylogenetic analyses of the chloroplast genomes of Capsicum species provides comprehensive insights into the genetic relationship of Capsicum
About 3,000 species have been described from the Solanaceae to date, and the family contains several species of economically important plants, such as tomato, eggplant, potato and tobacco [67]. In our study of the phylogenetic relationships within the genus Capsicum and between Capsicum and other species in the Solanaceae, we reconstructed a phylogenetic tree using the cp. genomes of three species of Capsicum, and 40 further species from the Solanaceae. Capsicum formed a monophyletic group with high statistical support. Liu et al. [60] investigated the domestication and population differentiation in peppers, and also reconstructed the phylogenetic relationship within this genus using resequencing data. In the study of Liu et al. [60], C. pubescens was found to be sister to the other six species, with C. baccatum var. baccatum, C. baccatum var. pendulum and C. chacoense forming one clade (the baccatum clade), and with a second clade (the annuum clade) comprising C. annuum var. annuum, C. annuum var. glabriusculum, C. chinense, C. frutescens and C. galapagoense. Although the species that we used in our study differed slightly from these, we found that the phylogenetic relationships were similar, and also demonstrated the close relationship of the species in the pungent group (C. chinense and C. frutescense). Our phylogeny also revealed with the position of C. chacoense, C. tovarii, and C. eximium for the first time. However, In Solanaceae, we also found certain conflict relationships between the cp. genome tree generated in our ML analysis and the classical classification system [68,69,70], for example, neither Dunalia nor Solanum formed a monophyletic group in our analysis. Further investigation is necessary to determine whether some groups should be renamed.
The structure of the cp. genome is conserved, although sufficient sites show variation that the cp. genome is nevertheless informative in analyses of the phylogenetic relationships within the genus Capsicum. However, phylogenies constructed using cp. genomes may have limitations following extensive interspecific or intergeneric hybridization of the study species, of because of introgression or incomplete lineage sorting [71]. The inclusion of further Capsicum species and varieties in the analyses, together with a study of the morphology, biogeography and history of domestication of this group should result in a more robust phylogeny. It is our belief that this analysis of cp. genomes in Capsicum will provide a theoretical background for further research in this important genus. Conducting larger-scale comparative analyses with more Capsicum varieties is also the possible future research directions.
Conclusion
The cp. genome of 32 Capsicum varieties were newly sequenced and the genetic diversity and relationship of these varieties were revealed in this study. The results showed that the Capsicum varieties cp. genomes structure were relative conserved. 32 varieties produced 13 genotypes. A total of 608 indels, 83 SNPs and 47 SSRs have been identified which can be used as molecular markers in a future Capsicum diversity study, as well as the high variation region such as rpl20 and trnK gene. The phylogenetic reconstruction based on the cp. genome of Solanaceae data generally reflected the currently accepted classification, with the species of the pungent group having close relationship with one another. Our results enrich the data on the cp. genomes of the important vegetable genus and play an important role for the molecular identification and phylogenetic reconstruction of Capsicum species. The genetic diversity between varieties may guide future research on the adaptive evolution of Capsicum species, and the chloroplast genome data generated in this study can be used to improve Capsicum breeding programs and develop new varieties with enhanced traits.
Data availability
The cp. genome data for the 13 new Capsicum genotypes have been submitted to the NCBI and the accession number is PP894789- PP894801.
References
Pickersgill B. Relationships between weedy and cultivated forms in some species of Chili peppers (Genus capsicum). Evolution. 1971;25:683–91.
Arumingtyas EL, Ahyar AN. Genetic diversity of Chili pepper mutant (Capsicum frutescens L.) resulted from gamma-ray radiation. IOP Conf Ser: Earth Environ Sci 2022;1097:012059.
Olatunji TL, Afolayan AJ. Variability in seed germination characteristics of Capsicum annuum L. and Capsicum frutescens L. Pak J Bot. 2019;51:561–5.
Pickersgill B. The archaeological record of Chili peppers (Capsicum spp.) and the sequence of plant domestication in Peru. Am Antiq. 1969;34:54–61.
Zheng N. Thought and discussion on the introduction of Capsicum. Agricultural Archaeol (in Chinese). 2006;4:177–84.
Meffe GK. R. C.C.: Principle of conservation biology. Sunderland, Massachusetts; 1994.
Ana LPM, Eguiarte L, Mercer K, Ainsworth NEM, McHale LK, Knaap Evd, Barbolla LJ. Genetic diversity, gene flow and differentiation among wild, semiwild and landrace Chile pepper (Capsicum annuum) populations in Oaxaca, Mexico. American-Eurasian J Bot. 2022;109:1157–76.
Gu X-z, Cao Y-c, Zhang Z-h, Zhang B-x, Zhao H, Zhang X-m, Wang H-p, Li X-x. Wang L.-h. Genetic diversity and population structure analysis of Capsicum germplasm accessions. J Integr Agric. 2019;18:1312–20.
Dantas AP, Quemel dS, Romero JS, Santos JO, Lopes AD. Research article genetic diversity among pepper (Capsicum spp.) accessions estimated by microsatellite markers. Gen Mol Res. 2022;21:gmr18953.
Ahmed I, Nawab NN, Kabir R, Muhammad F, Intikhab A, Rehman AU, Zakariya Farid M, Jellani G, Nadeem S, Quresh W, et al. Genetic diversity for production traits in hot Chilli (Capsicum annuum L). Pak J Bot. 2022;54:2157–66.
Karim K, Rafii MY, Misran A, Ismail M, Harun AR, Ridzuan R, Chowdhury MFN, Hosen M, Yusuff O. Haque M.A. Genetic diversity analysis among Capsicum annuum mutants based on morpho-physiological and yield traits. Agronomy. 2022;12:2346.
Pereira-Dias L, Vilanova S, Fita A, Prohens J, Rodríguez-Burruezo A. Genetic diversity, population structure, and relationships in a collection of pepper (Capsicum spp.) landraces from the Spanish centre of diversity revealed by genotyping-by-sequencing (GBS). Hortic Res. 2019;6:54.
Cheng J, Qin C, Tang X, Zhou H, Hu Y, Zhao Z, Cui J, Li B, Wu Z, Yu J, et al. Development of a SNP array and its application to genetic mapping and diversity assessment in pepper (Capsicum spp). Sci Rep. 2016;6:33293.
Zhong Y, Cheng Y, Ruan M, Ye Q, Wang R, Yao Z, Zhou G, Liu J, Yu J, Wan H. High-throughput SSR marker development and the analysis of genetic diversity in Capsicum frutescens. Horticulturae. 2021;7:187.
Bhoomika HR, Kumar BMD, Sreelakshmi S. Genetic variability of bird’s eye Chilli (Capsicum Frutescens L.) in hill zones of Karnataka, India. Bangl J Bot. 2022;51:9–16.
Arumingtyas EL, Atiaturrochmah A, Kusnadi J. Confirmation of mutation and genetic stability of the M4 generation of Chili pepper’s (Capsicum frutescens L.) Ethyl Methane Sulfonate (EMS) mutant based on morphological, physiological and molecular characters. Biodiversitas J Biol Divers. 2023;24:531–8.
Chiquini-Medina RA, Castillo-Aguilar CdlC, Chiquini-Medina RA. Genetic improvement and its effect on the genetic diversity of Habanero Chili (Capsicum Chinense Jacq). Agro Productividad. 2022;15:103–9.
Jamir M. Genetic diversity analysis using SSR marker in Naga King Chilli (Capsicum chinense Jacq.) Genotypes. Annals Plant Soil Res. 2022;24:618–29.
Chhapekar SS, Brahma V, Rawoof A, Kumar N, Gaur R, Jaiswal V, Kumar A, Yadava SK, Kumar R, Sharma V, et al. Transcriptome profiling, simple sequence repeat markers development and genetic diversity analysis of potential industrial crops Capsicum chinense and C. frutescens of Northeast India. Ind Crops Prod. 2020;154:112687.
Huang S, Ge X, Cano A, Salazar B, Deng Y. Comparative analysis of chloroplast genomes for five Dicliptera species (Acanthaceae): molecular structure, phylogenetic relationships, and adaptive evolution. PeerJ. 2020;8:e8450.
Palmer JD, Stein DB. Conservation of chloroplast genome structure among vascular plants. Curr Genet. 1986;10:823–33.
Song Y, Chen Y, Lv J, Xu J, Zhu S, Li M, Chen N. Development of chloroplast genomic resources for Oryza species discrimination. Front Plant Sci. 2017;8:1854.
Gao X, Zhang X, Meng H, Li J, Zhang D, Liu C. Comparative chloroplast genomes of Paris Sect. Marmorata: insights into repeat regions and evolutionary implications. BMC Genomics. 2018;19:878.
Wang M, Wang X, Sun J, Wang Y, Ge Y, Dong W, Yuan Q, Huang L. Phylogenomic and evolutionary dynamics of inverted repeats across Angelica Plastomes. BMC Plant Biol. 2021;21:26.
Parks M, Cronn R, Liston A. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biol. 2009;7:84.
Favier A, Gans P, Boeri Erba E, Signor L, Muthukumar SS, Pfannschmidt T, Blanvillain R, Cobessi D. The plastid-encoded RNA polymerase-associated protein PAP9 is a superoxide dismutase with unusual structural features. Front Plant Sci. 2021;12:668897.
Qiao J, Cai M, Yan G, Wang N, Li F, Chen B, Gao G, Xu K, Li J, Wu X. High-throughput multiplex cpDNA resequencing clarifies the genetic diversity and genetic relationships among Brassica napus, Brassica rapa and Brassica oleracea. Plant Biotechnol J. 2016;14:409–18.
Henriquez CL, Abdullah AI, Carlsen MM, Mckain MR. Evolutionary dynamics of chloroplast genomes in subfamily Aroideae (Araceae). Genomics. 2020;112:2349–60.
Jiao J, Yin Y. A strategy for developing high-resolution DNA barcodes for species discrimination of wood specimens using the complete chloroplast genome of three Pterocarpus species. Planta. 2019;250:95–104.
Liu ZF, Ma H, Ci XQ, Li L, Li J. Can plastid genome sequencing be used for species identification in Lauraceae? Bot J Linn Soc. 2021;197:1–14.
Jheng C, Chen F, Lin TC, Chang JY. The comparative chloroplast genomic analysis of photosynthetic orchids and developing DNA markers to distinguish Phalaenopsis orchids. Plant Sci. 2012;190:62–73.
Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi TR. The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 1986;5:2043–9.
Bie M, Han C, Wang X, Xiao W, Song K. Characterization and phylogenetic relationships analysis of the complete chloroplast genome of Capsicum annuum (Solanaceae). Mitochondrial DNA Part B. 2020;5:570–1.
Zeng FC, Gao CW, Gao LZ. The complete chloroplast genome sequence of American bird pepper (Capsicum annuum var. glabriusculum). Mitochondrial DNA Part A. 2016;27:724–6.
Shim D, Raveendar S, Lee JR, Lee GA, Ro NY, Jeon YA, Cho GT, Lee HS, Ma KH, Chung JW. The complete chloroplast genome of Capsicum frutescens (Solanaceae). Appl Plant Sci. 2016;4:1600002.
Raveendar S, Lee KJ, Shin MJ, Cho GT, Chung JW. Complete chloroplast genome sequencing and genetic relationship analysis of Capsicum chinense Jacq. Plant Breed Biotechnol. 2017;5:261–8.
Kim TS, Lee JR, Raveendar S, Lee GA, Jeon YA, Lee HS, Ma KH, Lee SY, Chung JW. Complete chloroplast genome sequence of Capsicum baccatum var. baccatum. Mol Breed 2016; 36:110.
Sebastin R, Lee KJ, Cho G-T, Shin M-J, Kim S-H, Hyun DY, Lee J-R. The complete chloroplast genome sequence of a Bolivian wild Chili pepper. Capsicum Eximium Hunz (Solanaceae) Mitochondrial DNA Part B. 2019;4:1634–5.
Sue Porebski LG, Bailey., Bernard RB. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biology Report. 1997;15:8–15.
Liu C, Shi L, Zhu Y, Chen H, Zhang J, Lin X, Guan X. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genomics. 2012;13:715.
Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–9.
Lowe TM, Chan PP. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 2016;44:W54–7.
Lohse M, Drechsel O, Bock R. OrganellarGenomeDRAW (OGDRAW): a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr Genet. 2007;52:267–74.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 2001;29:4633–42.
Leese F, Mayer C, Held C. Isolation of microsatellites from unknown genomes using known genomes as enrichment templates. Limnol Oceanogr Methods. 2008;6:412–26.
Li Q, Wan JM. SSRHunter: development of a local searching software for SSR sites. Hereditas. 2005;27:808.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:2725–9.
Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772.
Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003;19:2496–7.
Bandelt HJ, Forster P, Rohl A. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol. 1999;16:37–48.
Amiryousefi A, Hyvönen J, Poczai P. IRscope: an online program to visualize the junction sites of chloroplast genomes. Bioinformatics. 2018;34:3030–1.
Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS. Dubchak I. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000;16:1046.
Brudno M, Malde S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S. Glocal alignment: finding rearrangements during alignment. Bioinformatics. 2003;19:54–62.
Chen C, Miao Y, Luo D, Li J, Wang Z, Luo M, Zhao T, Liu D. Sequence characteristics and phylogenetic analysis of the Artemisia argyi chloroplast genome. Front Plant Sci. 2022;13:906725.
He S, Xu B, Chen S, Li G, Zhang J, Xu J, Wu H, Li X, Yang Z. Sequence characteristics, genetic diversity and phylogenetic analysis of the Cucurbita ficifolia (Cucurbitaceae) chloroplasts genome. BMC Genomics. 2024;25:384.
Raveendar S, Na YW, Lee JR, Shim D, Ma KH, Lee SY, Chung JW. The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina Sequencing. Molecules. 2015;20:13080–8.
Massouh A, Schubert J, Yaneva-Roder L, Ulbricht-Jones ES, Zupok A, Johnson MTJ, Wright S, Pellizzer T, Sobanski J, Bock R. Spontaneous chloroplast mutants mostly occur by replication slippage and show a biased pattern in the plastome of Oenothera. Plant Cell. 2016;28:911–29.
Navhale VC, Dalvi VV, Wakode MM, Bhave SG, Devmore JP. Gene action of yield and yield contributing characters in Chilli (Capsicum annum L). Electron J Plant Breed. 2014;5:729–34.
Pradheep K, Veeraragavathatham D. Characterization of Capsicum spp. germplasm. Indian J Plant Genetic Resour. 2006;19:180–3.
Liu F, Zhao J, Sun H, Xiong C, Sun X, Wang X, Wang Z, Jarret R, Wang J, Tang B, et al. Genomes of cultivated and wild Capsicum species provide insights into pepper domestication and population differentiation. Nat Commun. 2023;14:5487.
Rao NK. Plant genetic resources: advancing conservation and use through biotechnology. Afr J Biotechnol. 2004;3:136–45.
He S, Yang Y, Li Z, Wang X, Guo Y, Wu H. Comparative analysis of four Zantedeschia chloroplast genomes: expansion and contraction of the IR region, phylogenetic analyses and SSR genetic diversity assessment. PeerJ. 2020;8:e9132.
Nanda S, Rout P, Ullah I, Nag SR, Reddy VV, Kumar G, Kumar R, He S, Wu H. Genome-wide identification and molecular characterization of CRK gene family in cucumber (Cucumis sativus L.) under cold stress and sclerotium rolfsii infection. BMC Genomics. 2023;24:219.
Chen S, Yao H, Han J, Liu C, Song J, Shi L, Zhu Y, Ma X, Gao T, Pang X. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS ONE. 2010;5:e8613.
Chase MW, Cowan RS, Hollingsworth PM, Berg CVD. Wilkinson M.J. A proposal for a standardised protocol to barcode all land plants. Taxon. 2007;56:295–9.
Samantha JW, Kylie B, Erin P, Alex R, Jennifer C-S, Álvaro JP, Kevin SB. Ability of rbcL and matK DNA barcodes to discriminate between montane forest orchids. Plant Syst Evol. 2022;308:19.
Pratt RC, Francis DM, Meneses LSB. Genomics of tropical Solanaceous species: established and emerging crops. In: Genomics of Tropical Crop Plants Plant Genetics and Genomics: Crops and Models. Edited by Moore PH, Ming R. New York, NY.: Springer; 2008.
Olmstead RG, Bohs L, Migid HA, Santiago-Valentin E, Garcia VF, Collier SM. A molecular phylogeny of the Solanaceae. Taxon. 2008;57:1159–81.
Orejuela A, Wahlert G, Orozco CI, Barboza G, Bohs L. Phylogeny of the tribes Juanulloeae and Solandreae (Solanaceae). Taxon. 2017;66:379–92.
Huang J, Xu W, Zhai J, Hu Y, Guo J, Zhang C, Zhao Y, Zhang L, Martine C, Ma H, et al. Nuclear phylogeny and insights into whole-genome duplications and reproductive development of Solanaceae plants. Plant Commun. 2023;4:100595.
Jia Y, Lucía V, Xiaodan C, Huimin L, Hao Z, Zhanlin L, Guifang Z. Development of chloroplast and nuclear DNA markers for Chinese oaks (Quercus Subgenus Quercus) and assessment of their utility as DNA barcodes. Front Plant Sci. 2017;8:816.
Funding
The project was supported by the Young Talents of Yunnan Xingdian project (XDYC-QNRC-2022-0233), the Yunnan Fundamental Research Projects (202201AU070179), the Key Program of Agriculture-Related Special Funds (202301BD070001-027) and the Major Science and Technology Project of Yunnan (202402AE090012, 202202AE090031).
Author information
Authors and Affiliations
Contributions
MD and JL developed the research concepts. SH, GL and YS directed most of the experimental and analytical work and wrote the manuscript. JL collected the leaf material and participated in the experimental work. KZ directed analytical work, DM, JL and SH acquired the funding. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
The experiments did not involve endangered or protected species. The collection of plant data was carried out with the permission of the relevant institutions, and complied with national or international guidelines and legislation.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
He, S., Siman, Y., Li, G. et al. Chloroplast genome characteristic, comparative and phylogenetic analyses in Capsicum (Solanaceae). BMC Genomics 25, 1052 (2024). https://doi.org/10.1186/s12864-024-10980-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12864-024-10980-1