Pangenome architecture and differentiation of agronomical traits in the Capsicum clade

Pangenome architecture and differentiation of agronomical traits in the Capsicum clade


WR-cap TU




Landbouw, Water, Voedsel>Sleuteltechnologieën LWV>Biotechnologie en Veredeling






Several studies have indicated the association of structural variations (SVs), such as inversions, translocations, presence/absence variations (PAVs) and copy number variations (CNVs) with important agronomic traits. To further provide insight into the genetic diversity and harnessing the potential of SVs in crop improvement, we will apply a Capsicum pangenome supporting a Genome Wide Association Study (GWAS). The importance of the approach was recently demonstrated for the identification of causal PAVs underlying disease resistance, (a)biotic stress tolerance, flowering time, silique length, and seed weight. Furthermore, several studies showed the identification of new loci associated with important agronomical traits, providing direct targets for crop improvement using a GWAS based on large-scale resequencing approach. Within the Capsicum Genome Initiative (CGI) we previously profiled species accessions with agronomical traits linked to yield and disease resistance that can be used to improve Capsicum crops. For example, wild annuum accessions and related wild species displayed different combinations of growth habit and flowering phenotype. The underlying genes represent an interesting resource that potentially can be used to breed for annuum varieties with more concentrated fruit set and higher yield. In addition, Chili anthracnose, caused by Colletotrichum species, is a serious problem constraining pepper production. Currently, we have sequenced several accessions that exhibit anthracnose resistance, which might be used to introduce Colletotrichum species specific anthracnose resistance into Capsicum crop accessions. Several viral resistant (TMV, TSWV, CMV), fungal resistant (Phytophtora capsici) and low temperature tolerant accessions have been characterized within the CGI project as well. In the current project we will also bring in accessions that have root knot nematode (RKN) pathogen (Meloidogyne spp.) resistance. However, the genetic basis for the aforementioned traits is unclear. At present we have a unique combination of Capsicum genome sequences, bioinformatics and genetics, which permits the development of new insights and breeding tools that will drive innovation in Dutch plant breeding and horticulture. Here, we here propose to the construct a pangenome from representative C. annuum, C. baccatum, C. cardenasii, C. chinense, C. chacoense, C. frutescens, C. pubescens, C. tovarii accessions, that were recently profiled within CGI, and which will be complemented with (i) additional resequencing and additional phenotyping data of 200 additional accessions. Their selection will be based on phylogenetic relationships, crossability within the Capsicum genus, phenotypic and genotypic characteristics, disease resistance, and geographic distribution, and availability of seeds. Furthermore, (ii) we aim to develop new pangenome technology and functionality of existing tools like PANTOOLS by integrating annotation, metabolome and proteome layers as well as query and visualization functionality using a graph database technology, supporting comparative analyses to further understand quantitative and qualitative traits, and subsequently identify and delineate the causal variation of traits (flowering phenotype, growth habit, disease resistance, brix, capsaicin, vitamins, etc.), using a (iii) GWAS approach. Although in this project we describe a GWAS application for Capsicum focussing on specific traits, the published results demonstrate the proposed approach generic, applying to many crops and to a broad range of traits.

Doel van het project

Contribution to call priorities
The project taps into the genetic diversity within the Capsicum genus enabling the development of innovative crops with increased nutritional value, increased shelf life, uniform fruit set and ripening facilitating more efficient harvesting and reducing waste (MMIP-A2, priority 27), improved disease resistance and (a)biotic stress tolerance, resulting in requiring less pesticides to control diseases thereby contributing to safe food, a lower environmental footprint and safe working conditions (MMIP-D4, priorities 30 and 31). The development of new innovative crops will strengthen the international position of the Dutch breeding industry and promote employment in the Horticultural sector thereby contributing to internationalizing the Dutch Horticulture sector (priority 47).

Contribution to the ‘Knowledge and Innovation Agenda Agriculture, Water and Food 2020-2023’
Pangenome is a key technology falling under MMIP ‘ST2-Biotechnology and Breeding (priority 44)’ and by its range of applications transects and is linked to missions Circular Agriculture (mission A), Climate neutral agriculture and food production (mission B), and Appreciated, heathy and safe food (mission D) set out in the Knowledge and Innovation Agenda (KIA). It answers to economic, environmental, social and health issues inherent to agricultural crop production. The project uses genomics, metabolomics, bioinformatics, and breeding expertise to explore and benefit from the potential of genetically diverse germplasm panels. This key technology aims to profile crops and crop related wild species (CRWs) on a genus level. It guides and anticipates on identifying optimal breeding material, putting breeding at a higher response level (MMIP S2, sub programmes 1-genome technology, 2-bioinformatics and big data, and 5-guiding breeding technologies), enabling the accelerated development of robust and innovative crops (MMIP A2, priority 4) that are more in line with economic societal demands, thereby also contributing to application for key technologies that promote the transnational cooperation. The technology targets the use of genetic diversity to develop advanced crops for robust horticulture production systems and decreasing ecological footprints (Mission A, MMIP 1 and 2). Profiling the broad range of genetically diverse accessions using the pangenome-based comparative approach, contributes to the development of new innovative crops aimed at a lower environmental footprint and improved sustainability, and the international development of the topsector (Mission ‘Internationalization of topsectors and breeding industry’ priority 47’). Our key technology will speed up precision breeding (e.g. for biotic and abiotic stress tolerance, disease resistance, drought tolerance, longer shelf-life, safer healthy crops and ‘precision designed food crops’) thereby contributing to MMIP D2 (priority 27) and D3 (priority 30). Here we aim to (1) construct pangenome maps with improved scalability and (2) prioritize genes and SVs, with great potential guiding breeding of advanced crops, thereby contributing to the realization of MJP Breeding 2.0. We argue our approach is generic and can be applied to reliably profile a wide variety of crop species (tomato, pepper, melon, cucumber, lettuce, rice, papaya, tulip, maize (corn), sugar beet, onion, etc.), greatly benefitting introgression hybridization and precision breeding.

Relatie met missie (Motivatie)

Incentive and urgency
World population1 is expected to keep growing to 9.8 billion by 2050. The demand for water, energy, minerals and arable lands increases and is reaching the capacity limits of Earth’s ecosystems, while at the same time the environmental footprints of today’s agricultural production systems are far from sustainable. Furthermore, changing environmental conditions, the demand for improved food quality (nutritional properties, increased shelf life), new consumer preferences and improved pathogen resistance, become increasingly important. The annual world-wide pepper production in 2018 amounted to 40,936,076 tonnes (FAOSTAT2, 2018), making pepper an important food crop, which however is increasingly challenged by fungal, bacterial, and viral diseases as well as polyphagous root-knot nematodes and herbivory insects. Such motives urge on development of adapted and advanced crops, and efficient agricultural production systems that are more in line with environmental, economic and social needs. These incentives drive breeding toward non-perishable food crops with increased yield, sensory and nutritional value, making it urgently important to disclose and use genetic diversity underlying important target crop traits by advanced introgression hybridization breeding approaches to realize these societal and economic challenges that agricultural production faces. This project addresses comparative genome bioinformatics innovation necessary for disclosing agronomically important traits, enabling breeding at a higher response level, and deploying crop innovation aimed at sustainability and increased productivity with less resources to secure future food production.
Definition, constraint and motivation
The vast amount of genome sequences that we previously realized in the Capsicum Genome Initiative3 (CGI) is an essential resource to disclose the genetic information underlying economically important traits in cultivated crops and related wild species of pepper (Capsicum annuum). However, the complexity of the genomic sequence and structural variation complicates the identification and delineation of genetic variation causal to growth habit, yield, nutritional compounds, pathogen resistance, stress tolerance, etc. In large genome projects such as the 150 tomato Genome project, the 100 Melon Genome Project, the International Lettuce Genome Sequence project and CGI, we delivered a wealth of genome variations (SNPs, CNVs, PAVs and other miscellaneous variations). However, in the context of a complex genomic background, the resequencing data on its own is insufficient for the identification and proper delineation of causal genome variations, requiring the application of pangenome technology. This follow-up project aims to disclose and generate a surplus value from the vast amount of genetic information that has previously been established in the CGI project. The pangenome and extended functionality of tools like PANTOOLS (Sheikhizadeh et al., 2016; 2018) will support intergenomic comparisons to establish the capsicum core-genome and identify individual-specific genetic variation and (putative) gene functions. Gene structure and gene annotation of specific gene families, that are selected based upon their known function or selected as new candidate genes from this study, here designated as ‘ortho-groups’, will be visualized using the functionality of graph database technology. When compared against functionally annotated reference genomes, further insight into candidate variations that may be causal to target traits can be obtained, which then may be converted and used as markers for marker-assisted breeding and selection of key-genes for introgression into cultivated pepper for accelerated crop improvement. However, resources for the proposed innovative research are lacking at WUR and requires additional support from TKI and the industry.

Geplande acties

In this project the following output and innovative knowledge is gained;
Data production and collection
1. Previously, we have used full-length transcriptome data (ISOSEQ) and high-throughput Illumina-based short read based transcriptome data (RNASEQ) to build gene models (BRAKER, MAKER) and structurally annotate (genes, repeats) reference genomes (C. annuum, C. chacoense, C. galapagoense). Genes and genomic features in pangenomes (repeats, CNVs) will be further functionally annotated by (i) homology-based comparisons against public databases, (ii) an INTERPRO scan (GO, KEGG, REACTOME, PFAM), (iii) a BUSCO analysis and (iv) an NLR-gene (NBS-LRR resistance gene type) scan and parsed to the pangenome data structure.

Pangenome construction and exploration
2. Development of scalable and stable workflows to further facilitate downstream analytics and biological interpretation across high performance clusters (HPC’s) and/or facilitated through professional applications such as ADAM and APACHE SPARK and interactive data analysis via specific routines from data science tools like R. Scalable and interactive visualization of genomics data and features on top of the ADAM and APACHE SPARK processing engine is supported by MANGO.
3. Based on all-vs-all gene/protein homology we will identify and build ortholog groups and based on the homology grouping construct species level pangenomes for C. annuum, C. chinense, C. chacoense, C. baccatum species groups.
4. To have a ‘saturated’ genetic repertoire of the genus, we will combine the species-level pangenomes into a genus-level pangenome.
5. Pangenomes, annotations and proteomes will be stored in a graph database (NEO4J or equivalent). The PANTOOLS database design will be adapted to store additional data layers such as for phenotyping.
6. Querying functionality will be further developed with tools like NEO4J BROWSER and NEO4J BLOOM to visualize, explore and review graph data, determine genome sections and genes of interest, and detect relevant patterns. The extent of integration of higher level graph views with PANTOOLS will be based on advanced perception during the project.

Pangenomic variation analysis
7. The pangenomes will be used to identify the Capsicum dispensable (accessory) and core genes.
8. Genes and genomic features will be selected from the dispensable genome based on their structural and functional annotation attributes (Pfam domains), genomic location, their comparison to functionally grouped genes and pathway information. The output of the selection will be a list of prioritized genes potentially underlying agronomically important traits.

Naam projectleider

sander peters