Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences
Jazyk angličtina Země Velká Británie, Anglie Médium print
Typ dokumentu časopisecké články, přehledy
Grantová podpora
National Science Foundation (NSF)
2126334
Research Coordination Network (RCN)
National Science Foundation (NSF)
2126334
Research Coordination Network (RCN)
PubMed
38079567
PubMed Central
PMC10712715
DOI
10.1093/database/baad088
PII: 7469248
Knihovny.cz E-zdroje
- MeSH
- big data * MeSH
- databáze genetické * MeSH
- fenotyp MeSH
- genotyp MeSH
- šlechtění rostlin MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
Centro de Ciencias de la Complejidad Universidad Nacional Autónoma de México Ciudad de México México
Cold Spring Harbor Laboratory 1 Bungtown Rd Cold Spring Harbor New York NY 11724 USA
Department of Biochemistry Faculty of Science Palacky University Olomouc Czech Republic
Department of Botany and Plant Pathology Oregon State University Corvallis OR 97331 USA
Department of Ecology and Evolutionary Biology University of Connecticut Storrs CT USA
Federal University of Agriculture Zuru PMB 28 Zuru Kebbi 872101 Nigeria
Institute of Forest Science Madrid Spain
National Research Council Canada 110 Gymnasium Pl Saskatoon Saskatchewan S7N 0W9 Canada
Phoenix Bioinformatics 39899 Balentine Drive Suite 200 Newark CA 94560 USA
Zobrazit více v PubMed
Scossa F., Alseekh S. and Fernie A.R. (2021) Integrating multi-omics data for crop improvement. J. Plant Physiol., 257, 153352. PubMed
Yang W., Feng H., Zhang X. et al. (2020) Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol. Plant, 13, 187–214. PubMed
Borgman C.L. (2015) Big Data, Little Data, No Data: Scholarship in the Networked World. The MIT Press, Cambridge, MA.
Mosconi G., Li Q., Randall D. et al. (2019) Three gaps in opening science. Comput. Support Coop. Work (CSCW), 28, 749–789.
Federer L.M. (2019) Who, what, when, where, and why? Quantifying and understanding biomedical data reuse. University of Maryland.
Wallis J.C., Rolando E. and Borgman C.L. (2013) If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLoS One, 8, e67332. PubMed PMC
Pasquetto I.V., Randles B.M. and Borgman C.L. (2017) On the reuse of scientific data. Data Sci. J., 16, 1–9.
Culina A., Crowther T.W., Ramakers J.J.C. et al. (2018) How to do meta-analysis of open datasets. Nat. Ecol. Evol., 2, 1053–1056. PubMed
He L. and Nahar V. (2016) Reuse of scientific data in academic publications: an investigation of Dryad digital repository. J. Inf. Manag., 65, 478–494.
Pasquetto I.V., Borgman C.L. and Wofford M.F. (2019) Uses and reuses of scientific data: the data creators’ advantage. Harv. Data Sci. Rev., 1.
Rung J. and Brazma A. (2013) Reuse of public genome-wide gene expression data. Nat. Rev. Genet., 14, 89–99. PubMed
Karasti H. and Blomberg J. (2018) Studying infrastructuring ethnographically. Comput. Support. Coop. Work (CSCW), 27, 233–265.
Hanson B., Sugden A. and Alberts B. (2011) Making data maximally available. Science, 331, 649. PubMed
Leonelli S. (2013) Integrating data to acquire new knowledge: three modes of integration in plant science. Stud. Hist. Philos. Sci. Part C, 44, 503–514. PubMed
Kattge J., Bonisch G., Diaz S. et al. (2020) TRY plant trait database – enhanced coverage and open access. Glob. Chang. Biol., 26, 119–188. PubMed
Harper L., Campbell J., Cannon E.K.S. et al. (2018) AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database, 2018, bay088. PubMed PMC
Adam-Blondon A.F., Alaux M., Pommier C. et al. (2016) Towards an open grapevine information system. Hortic. Res., 3, 16056. PubMed PMC
Dempsey L. and Heery R. (1998) Metadata: a current view of practice and issues. J. Doc., 54, 145–172.
Mayernik M.S. and Acker A. (2018) Tracing the traces: the critical role of metadata within networked communications. J. Assoc. Inf. Sci. Technol., 69, 177–180.
Edwards D. (2016) The impact of genomics technology on adapting plants to climate change. In: Edwards D, Batley J (eds) Plant Genomics and Climate Change. Springer, New York, NY, pp. 173–178.
Hu T., Chitnis N., Monos D. et al. (2021) Next-generation sequencing technologies: an overview. Hum. Immunol., 82, 01–811. PubMed
Smith L.M., Fung S., Hunkapiller M.W. et al. (1985) The synthesis of oligonucleotides containing an aliphatic amino group at the 5ʹ terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis. Nucleic Acids Res., 13, 2399–2412. PubMed PMC
Sanger F., Nicklen S. and Coulson A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A., 74, 5463–5467. PubMed PMC
Crossley B.M., Bai J., Glaser A. et al. (2020) Guidelines for Sanger sequencing and molecular assay monitoring. J. Vet. Diagn. Invest., 32, 767–775. PubMed PMC
Mardis E.R. (2008) The impact of next-generation sequencing technology on genetics. Trends Genet., 24, 133–141. PubMed
van Dijk E.L., Auger H., Jaszczyszyn Y. et al. (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426. PubMed
Buermans H.P. and den Dunnen J.T. (2014) Next generation sequencing technology: advances and applications. Biochim. Biophys. Acta, 1842, 1932–1941. PubMed
Slatko B.E., Gardner A.F. and Ausubel F.M. (2018) Overview of next-generation sequencing technologies. Curr. Protoc. Mol. Biol., 122, e59. PubMed PMC
Ekblom R. and Wolf J.B. (2014) A field guide to whole-genome sequencing, assembly and annotation. Evol. Appl., 7, 1026–1042. PubMed PMC
English A.C., Richards S., Han Y. et al. (2012) Mind the gap: upgrading genomes with pacific biosciences RS long-read sequencing technology. PLoS One, 7, e47768. PubMed PMC
Huddleston J., Ranade S., Malig M. et al. (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696. PubMed PMC
Wang Y., Zhao Y., Bollas A. et al. (2021) Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol., 39, 1348–1365. PubMed PMC
Marx V. (2023) Method of the year: long-read sequencing. Nat. Methods, 20, 6–11. PubMed
Chen P., Sun Z., Wang J. et al. (2023) Portable nanopore-sequencing technology: trends in development and applications. Front Microbiol., 14, 1043967. PubMed PMC
Wick R.R., Judd L.M. and Holt K.E. (2019) Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol., 20, 129. PubMed PMC
Grodzicker T., Williams J., Sharp P. et al. (1975) Physical mapping of temperature-sensitive mutations of adenoviruses. Cold Spring Harb. Symp. Quant. Biol., 39, 439–446. PubMed
Yang W., Kang X., Yang Q. et al. (2013) Review on the development of genotyping methods for assessing farm animal diversity. J. Anim. Sci. Biotechnol., 4, 2. PubMed PMC
Carvalho B., Bengtsson H., Speed T.P. et al. (2007) Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics, 8, 485–499. PubMed
Chagne D., Crowhurst R.N., Troggio M. et al. (2012) Genome-wide SNP detection, validation, and development of an 8K SNP array for apple. PLoS One, 7, e31745. PubMed PMC
Bayer M.M., Rapazote-Flores P., Ganal M. et al. (2017) Development and evaluation of a barley 50k iSelect SNP Array. Front. Plant Sci., 8, 1792. PubMed PMC
Verde I., Jenkins J., Dondini L. et al. (2017) The Peach v2.0 release: high-resolution linkage mapping and deep resequencing improve chromosome-scale assembly and contiguity. BMC Genomics, 18, 225. PubMed PMC
Ganal M.W., Polley A., Graner E.M. et al. (2012) Large SNP arrays for genotyping in crop plants. J. Biosci., 37, 821–828. PubMed
McKain M.R., Johnson M.G., Uribe-Convers S. et al. (2018) Practical considerations for plant phylogenomics. Appl. Plant Sci., 6, e1038. PubMed PMC
Kumar P., Choudhary M., Jat B.S. et al. (2021) Skim sequencing: an advanced NGS technology for crop improvement. J. Genet., 100, 1–10. PubMed
Schmickl R., Liston A., Zeisek V. et al. (2016) Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae). Mol. Ecol. Resour., 16, 1124–1135. PubMed
Head S.R., Komori H.K., LaMere S.A. et al. (2014) Library construction for next-generation sequencing: overviews and challenges. Biotechniques, 56, 61–64, 66, 68, passim. PubMed PMC
Deschamps S., Llaca V. and May G.D. (2012) Genotyping-by-Sequencing in Plants. Biology, 1, 460–483. PubMed PMC
Elshire R.J., Glaubitz J.C., Sun Q. et al. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One, 6, e19379. PubMed PMC
Andrews K.R., Good J.M., Miller M.R. et al. (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat. Rev. Genet., 17, 81–92. PubMed PMC
Miller M.R., Dunham J.P., Amores A. et al. (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res., 17, 240–248. PubMed PMC
Danecek P., Auton A., Abecasis G. et al. (2011) The variant call format and VCFtools. Bioinformatics, 27, 2156–2158. PubMed PMC
Lyon M.S., Andrews S.J., Elsworth B. et al. (2021) The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol., 22, 32. PubMed PMC
Leinonen R., Sugawara H., Shumway M. et al. (2011) The sequence read archive. Nucleic Acids Res., 39, D19–21. PubMed PMC
Kodama Y., Shumway M., Leinonen R. et al. (2012) The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res., 40, D54–56. PubMed PMC
Edgar R., Domrachev M. and Lash A.E. (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., 30, 207–210. PubMed PMC
Barrett T. and Edgar R. (2006) Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol., 411, 352–369. PubMed PMC
Clough E. and Barrett T. (2016) The gene expression omnibus database. In: Statistical Genomics: Methods and Protocols, pp. 93–110. PubMed PMC
Tateno Y., Imanishi T., Miyazaki S. et al. (2002) DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res., 30, 27–30. PubMed PMC
Miyazaki S., Sugawara H., Ikeo K. et al. (2004) DDBJ in the stream of various biological data. Nucleic Acids Res., 32, D31–34. PubMed PMC
Ogasawara O., Kodama Y., Mashima J. et al. (2020) DDBJ Database updates and computational infrastructure enhancement. Nucleic Acids Res., 48, D45–D50. PubMed PMC
Cochrane G., Karsch-Mizrachi I., Nakamura Y. et al. (2011) The international nucleotide sequence database collaboration. Nucleic Acids Res., 39, D15–18. PubMed PMC
Cochrane G., Karsch-Mizrachi I., Takagi T. et al. (2016) The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res., 44, D48–50. PubMed PMC
(2020) Promoting best practice in nucleotide sequence data sharing. Sci. Data, 7, 152. PubMed PMC
Nordberg H., Cantor M., Dusheyko S. et al. (2014) The genome portal of the department of energy joint genome institute: 2014 updates. Nucleic Acids Res., 42, D26–31. PubMed PMC
Sreedasyam A., Plott C., Hossain M.S. et al. (2023) JGI Plant Gene Atlas: an updateable transcriptome resource to improve functional gene descriptions across the plant kingdom. Nucleic Acids Res., 51, 8383–8401. PubMed PMC
Goodstein D.M., Shu S., Howson R. et al. (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res., 40, D1178–1186. PubMed PMC
Members C.-N. and Partners (2021) Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021. Nucleic Acids Res., 49, D18–D28. PubMed PMC
Cezard T., Cunningham F., Hunt S.E. et al. (2022) The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res., 50, D1216–D1220. PubMed PMC
Song S., Tian D., Li C. et al. (2018) Genome Variation Map: a data repository of genome variations in BIG Data Center. Nucleic Acids Res., 46, D944–D949. PubMed PMC
Chang Y., Song X., Zhang Q. et al. (2022) Robust CRISPR/Cas9 mediated gene editing of JrWOX11 manipulated adventitious rooting and vegetative growth in a nut tree species of walnut. Sci. Hortic., 303, 111199.
International Hapmap C. (2003) The International HapMap Project. Nature, 426, 789–796. PubMed
Jung S., Jesudurai C., Staton M. et al. (2004) GDR (Genome Database for Rosaceae): integrated web resources for Rosaceae genomics and genetics research. BMC Bioinf., 5, 130. PubMed PMC
Jung S., Lee T., Cheng C.H. et al. (2019) 15 years of GDR: new data and functionality in the Genome Database for Rosaceae. Nucleic Acids Res., 47, D1137–D1145. PubMed PMC
Yu J., Jung S., Cheng C.H. et al. (2014) CottonGen: a genomics, genetics and breeding database for cotton research. Nucleic Acids Res., 42, D1229–1236. PubMed PMC
Yu J., Jung S., Cheng C.H. et al. (2021) CottonGen: the community database for cotton genomics, genetics, and breeding research. Plants, 10, 2805. PubMed PMC
Grant D., Nelson R.T., Cannon S.B. et al. (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res., 38, D843–846. PubMed PMC
Brown A.V., Conners S.I., Huang W. et al. (2021) A new decade and new data at SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res., 49, D1496–D1501. PubMed PMC
Gonzales M.D., Archuleta E., Farmer A. et al. (2005) The Legume Information System (LIS): an integrated information resource for comparative legume biology. Nucleic Acids Res., 33, D660–665. PubMed PMC
Dash S., Campbell J.D., Cannon E.K. et al. (2016) Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family. Nucleic Acids Res., 44, D1181–1188. PubMed PMC
Fernandez-Pozo N., Menda N., Edwards J.D. et al. (2015) The Sol Genomics Network (SGN)—from genotype to phenotype to breeding. Nucleic Acids Res., 43, D1036–1041. PubMed PMC
Foerster H., Bombarely A., Battey J.N.D. et al. (2018) SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual curation of metabolic networks in Solanum and Nicotiana specific databases. Database (Oxford), 2018, bay035. PubMed PMC
Lawrence C.J. (2007) MaizeGDB. Methods Mol. Biol., 406, 331–345. PubMed
Portwood J.L. 2nd, Woodhouse M.R., Cannon E.K. et al. (2019) MaizeGDB 2018: the maize multi-genome genetics and genomics database. Nucleic Acids Res., 47, D1146–D1154. PubMed PMC
Wegrzyn J.L., Lee J.M., Tearse B.R. et al. (2008) TreeGenes: a forest tree genome database. Int. J. Plant Genomics, 2008, 412875. PubMed PMC
Falk T., Herndon N., Grau E. et al. (2019) Growing and cultivating the forest genomics database, TreeGenes. Database, 2019, bay084. PubMed PMC
Garcia-Hernandez M., Berardini T.Z., Chen G. et al. (2002) TAIR: a resource for integrated Arabidopsis data. Funct. Integr. Genomics, 2, 239–253. PubMed
Poole R.L. (2007) The TAIR database. Methods Mol. Biol., 406, 179–212. PubMed
Sanderson L.A., Caron C.T., Tan R. et al. (2019) KnowPulse: A web-resource focused on diversity data for pulse crop improvement. Front. Plant Sci., 10, 965. PubMed PMC
Smith R.N., Aleksic J., Butano D. et al. (2012) InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics, 28, 3163–3165. PubMed PMC
Kalderimis A., Lyne R., Butano D. et al. (2014) InterMine: extensive web services for modern biology. Nucleic Acids Res., 42, W468–472. PubMed PMC
Tello-Ruiz M.K., Jaiswal P. and Ware D. (2022) Gramene: a resource for comparative analysis of plants genomes and pathways. Methods Mol. Biol., 2443, 101–131. PubMed
Ware D. (2007) Gramene. Methods Mol. Biol., 406, 315–329. PubMed
Ware D.H., Jaiswal P., Ni J. et al. (2002) Gramene, a tool for grass genomics. Plant Physiol., 130, 1606–1613. PubMed PMC
Gladman N., Olson A., Wei S. et al. (2022) SorghumBase: a web-based portal for sorghum genetic information and community advancement. Planta, 255, 35. PubMed PMC
Lyne R., Sullivan J., Butano D. et al. (2015) Cross-organism analysis using InterMine. Genesis, 53, 547–560. PubMed PMC
Paajanen P., Kettleborough G., Lopez-Girona E. et al. (2019) A critical comparison of technologies for a plant genome sequencing project. Gigascience, 8, giy163. PubMed PMC
Sun Y., Shang L., Zhu Q.H. et al. (2022) Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci., 27, 391–401. PubMed
Pucker B., Irisarri I., de Vries J. et al. (2022) Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. Quant. Plant Biol., 3, e5. PubMed PMC
Shi J., Tian Z., Lai J. et al. (2023) Plant pan-genomics and its applications. Mol. Plant, 16, 168–186. PubMed
Ho S.S., Urban A.E. and Mills R.E. (2020) Structural variation in the sequencing era. Nat. Rev. Genet., 21, 171–189. PubMed PMC
Quan C., Lu H., Lu Y. et al. (2022) Population-scale genotyping of structural variation in the era of long-read sequencing. Comput. Struct. Biotechnol. J., 20, 2639–2647. PubMed PMC
Sun S., Wang X., Wang K. et al. (2020) Dissection of complex traits of tomato in the post-genome era. Theor. Appl. Genet., 133, 1763–1776. PubMed
Lye Z.N. and Purugganan M.D. (2019) Copy number variation in domestication. Trends Plant Sci., 24, 352–365. PubMed
Hovhannisyan G., Harutyunyan T., Aroutiounian R. et al. (2019) DNA copy number variations as markers of mutagenic impact. Int. J. Mol. Sci., 20, 4723. PubMed PMC
Dolatabadian A., Patel D.A., Edwards D. et al. (2017) Copy number variation and disease resistance in plants. Theor. Appl. Genet., 130, 2479–2490. PubMed
Yuan Y., Bayer P.E., Batley J. et al. (2021) Current status of structural variation studies in plants. Plant Biotechnol. J., 19, 2153–2163. PubMed PMC
Alonge M., Wang X., Benoit M. et al. (2020) Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell, 182, 145–161 e123. PubMed PMC
Chawla H.S., Lee H., Gabur I. et al. (2021) Long-read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant. Plant Biotechnol. J., 19, 240–250. PubMed PMC
Li M., Xia L., Zhang Y. et al. (2019) Plant editosome database: a curated database of RNA editosome in plants. Nucleic Acids Res., 47, D170–D174. PubMed PMC
Thao N.P. and Tran L.S. (2016) Enhancement of plant productivity in the post-genomics era. Curr. Genomics, 17, 295–296. PubMed PMC
Pan Q., Wei J., Guo F. et al. (2019) Trait ontology analysis based on association mapping studies bridges the gap between crop genomics and Phenomics. BMC Genomics, 20, 443. PubMed PMC
Danecek P., Bonfield J.K., Liddle J. et al. (2021) Twelve years of SAMtools and BCFtools. Gigascience, 10, giab008. PubMed PMC
Brachi B., Morris G.P. and Borevitz J.O. (2011) Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol., 12, 232. PubMed PMC
Gali K.K., Sackville A., Tafesse E.G. et al. (2019) Genome-wide association mapping for agronomic and seed quality traits of field pea (Pisum sativum L.). Front. Plant Sci., 10, 1538. PubMed PMC
Khan S.U., Saeed S., Khan M.H.U. et al. (2021) Advances and challenges for QTL analysis and GWAS in the plant-breeding of high-yielding: a focus on rapeseed. Biomolecules, 11, 1516. PubMed PMC
Tibbs Cortes L., Zhang Z. and Yu J. (2021) Status and prospects of genome-wide association studies in plants. Plant Genome., 14, e20077. PubMed
Liu J., Hua W., Hu Z. et al. (2015) Natural variation in ARF18 gene simultaneously affects seed weight and silique length in polyploid rapeseed. Proc. Natl. Acad. Sci. U.S.A., 112, E5123–5132. PubMed PMC
Christeller J.T., McGhie T.K., Johnston J.W. et al. (2019) Quantitative trait loci influencing pentacyclic triterpene composition in apple fruit peel. Sci. Rep., 9, 18501. PubMed PMC
Chagné D., Ryan J., Saeed M. et al. (2019) A high density linkage map and quantitative trait loci for tree growth for New Zealand mānuka (Leptospermum scoparium). N. Z. J. Crop Hortic. Sci., 47, 261–272.
Budhlakoti N., Kushwaha A.K., Rai A. et al. (2022) Genomic selection: a tool for accelerating the efficiency of molecular breeding for development of climate-resilient crops. Front. Genet., 13, 832153. PubMed PMC
Bhat J.A., Ali S., Salgotra R.K. et al. (2016) Genomic selection in the era of next generation sequencing for complex traits in plant breeding. Front. Genet., 7, 221. PubMed PMC
Crossa J., Perez-Rodriguez P., Cuevas J. et al. (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci., 22, 961–975. PubMed
Fasoula D.A., Ioannides I.M. and Omirou M. (2019) Phenotyping and plant breeding: overcoming the barriers. Front. Plant Sci., 10, 1713. PubMed PMC
Akiyama K., Kurotani A., Iida K. et al. (2014) RARGE II: an integrated phenotype database of Arabidopsis mutant traits using a controlled vocabulary. Plant Cell Physiol., 55, e4. PubMed PMC
Miroslaw M. (2001) Officially Released Mutant Varieties – The FAO/IAEA Database. Plant Cell Tissue Organ. Cult., 65, 175–177.
Zheng Y., Zhang N., Martin G.B. et al. (2019) Plant Genome Editing Database (PGED): a call for submission of information about genome-edited plant Mutants. Mol. Plant, 12, 127–129. PubMed
Shikata M., Hoshikawa K., Ariizumi T. et al. (2016) TOMATOMA update: phenotypic and metabolite information in the micro-tom mutant resource. Plant Cell Physiol., 57, e11. PubMed
McGill B.J., Enquist B.J., Weiher E. et al. (2006) Rebuilding community ecology from functional traits. Trends Ecol. Evol., 21, 178–185. PubMed
Violle V., Navas M., Vile D. et al. (2007) Let the concept of trait be functional! Oikos, 116, 882–892.
Schneider F.D., Fichtmueller D., Gossner M.M. et al. (2019) Towards an ecological trait‐data standard. Meth. Ecol. Evolut, 10, 2006–2019.
Allan E., Manning P., Alt F. et al. (2015) Land use intensification alters ecosystem multifunctionality via loss of biodiversity and changes to functional composition. Ecol. Lett., 18, 834–843. PubMed PMC
Diaz S., Quetier F., Caceres D.M. et al. (2011) Linking functional diversity and social actor strategies in a framework for interdisciplinary analysis of nature’s benefits to society. Proc. Natl. Acad. Sci. U.S.A., 108, 895–902. PubMed PMC
Lavorel S. and Grigulis K. (2012) How fundamental plant functional trait relationships scale-up to trade-offs and synergies in ecosystem services. J. Ecol., 100, 128–140.
Ni J., Pujar A., Youens-Clark K. et al. (2009) Gramene QTL database: development, content and applications. Database (Oxford), 2009, bap005. PubMed PMC
Singh K., Batra R., Sharma S. et al. (2021) WheatQTLdb: a QTL database for wheat. Mol. Genet. Genomics, 296, 1051–1056. PubMed
Reich P.B., Wright I.J. and Lusk C.H. (2007) Predicting leaf physiology from simple plant and climate attributes: a global GLOPNET analysis. Ecol. Appl., 17, 1982–1988. PubMed
Kissling W.D., Walls R., Bowser A. et al. (2018) Towards global data products of Essential Biodiversity Variables on species traits. Nat. Ecol. Evol., 2, 1531–1540. PubMed
Peat H.J. and Fitter A.H. (1994) A comparative study of the distribution and density of stomata in the British flora. Biol. J. Linn. Soc. Lond., 52, 377–393. PubMed PMC
Poschlod P., Kleyer M., Jackel A.-K. et al. (2003) BIOPOP — A database of plant traits and internet application for nature conservation. Folia Geobot., 38, 263–271.
Garcia-Recio A., Santos-Gomez A., Soto D. et al. (2021) GRIN database: a unified and manually curated repertoire of GRIN variants. Hum. Mutat., 42, 8–18. PubMed
Kühn I., Durka W. and Klotz S. (2004) BiolFlor: a new plant-trait database as a tool for plant invasion ecology. Divers. Distrib., 10, 363–365.
Kleyer M., Bekker R.M., Knevel I.C. et al. (2008) The LEDA Traitbase: a database of life history traits of the Northwest European flora. J. Ecol., 96, 1266–1274.
Tavsanoglu C. and Pausas J.G. (2018) A functional trait database for Mediterranean Basin plants. Sci. Data, 5, 180135. PubMed PMC
Falster D., Gallagher R., Wenk E.H. et al. (2021) AusTraits, a curated plant trait database for the Australian flora. Sci. Data, 8, 254. PubMed PMC
Houle D., Govindaraju D.R. and Omholt S. (2010) Phenomics: the next challenge. Nat. Rev. Genet., 11, 855–866. PubMed
Hati A.J. and Singh R.R. (2021) Artificial intelligence in smart farms: plant phenotyping for species recognition and health condition identification using deep learning. AI, 2, 274–289.
Saleem M.H., Potgieter J. and Mahmood Arif K. (2019) Plant disease detection and classification by deep learning. Plants, 8, 468. PubMed PMC
Zhang C., Zhou L., Xiao Q. et al. (2022) End-to-end fusion of hyperspectral and chlorophyll fluorescence imaging to identify rice stresses. Plant Phenomics, 2022, 9851096. PubMed PMC
Sandhu K.S., Mihalyov P.D., Lewien M.J. et al. (2021) Combining genomic and phenomic information for predicting grain protein content and grain yield in spring wheat. Front. Plant Sci., 12, 613300. PubMed PMC
Araus J.L., Kefauver S.C., Zaman-Allah M. et al. (2018) Translating high-throughput phenotyping into genetic gain. Trends Plant Sci., 23, 451–466. PubMed PMC
Steinbach D., Alaux M., Amselem J. et al. (2013) GnpIS: an information system to integrate genetic and genomic data from plants and fungi. Database, 2013, bat058. PubMed PMC
Pommier C., Michotey C., Cornut G. et al. (2019) Applying FAIR Principles to Plant Phenotypic Data Management in GnpIS. Plant Phenomics, 2019, 1671403. PubMed PMC
Brookes A.J. and Robinson P.N. (2015) Human genotype-phenotype databases: aims, challenges and opportunities. Nat. Rev. Genet., 16, 702–715. PubMed
Cobo-Simón I. (2022) Cartograplant: cyberinfrastructure to improve forest health and productivity in the context of a changing climate. In Plant and Animal Genome XXIX Conference, San Diego (CA)
Sansone S.A., McQuilton P., Rocca-Serra P. et al. (2019) FAIRsharing as a community approach to standards, repositories and policies. Nat. Biotechnol., 37, 358–367. PubMed PMC
Bulow L., Schindler M., Choi C. et al. (2004) PathoPlant: a database on plant-pathogen interactions. Silico. Biol., 4, 529–536. PubMed
Bulow L., Schindler M. and Hehl R. (2007) PathoPlant: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses. Nucleic Acids Res., 35, D841–845. PubMed PMC
Wu W., Wu Y., Hu D. et al. (2020) PncStress: a manually curated database of experimentally validated stress-responsive non-coding RNAs in plants. Database, 2020, baaa001. PubMed PMC
Global Burden Of Disease Cancer C., Fitzmaurice C., Abate D. et al. (2019) Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2017: a systematic analysis for the global burden of disease study. JAMA Oncol., 5, 1749–1768. PubMed PMC
Dhondt S., Wuyts N. and Inze D. (2013) Cell to whole-plant phenotyping: the best is yet to come. Trends Plant Sci., 18, 428–439. PubMed
Diaz B.P., Knowles B., Johns C.T. et al. (2021) Seasonal mixed layer depth shapes phytoplankton physiology, viral production, and accumulation in the North Atlantic. Nat. Commun., 12, 6634. PubMed PMC
Adak A., Murray S.C., Calderon C.I. et al. (2023) Genetic mapping and prediction for novel lesion mimic in maize demonstrates quantitative effects from genetic background, environment and epistasis. Theor. Appl. Genet., 136, 155. PubMed
Hill D.P., D’Eustachio P., Berardini T.Z. et al. (2016) Modeling biochemical pathways in the gene ontology. Database, 2016, baw126. PubMed PMC
Poux S. and Gaudet P. (2017) Best practices in manual annotation with the gene ontology. Methods Mol. Biol., 1446, 41–54. PubMed
Chibucos M.C. and Tyler B.M. (2009) Common themes in nutrient acquisition by plant symbiotic microbes, described by the Gene Ontology. BMC Microbiol., 9, S6. PubMed PMC
Fox S.E., Geniza M., Hanumappa M. et al. (2014) De novo transcriptome assembly and analyses of gene expression during photomorphogenesis in diploid wheat Triticum monococcum. PLoS One, 9, e96855. PubMed PMC
Vining K.J., Romanel E., Jones R.C. et al. (2015) The floral transcriptome of Eucalyptus grandis. New Phytol., 206, 1406–1422. PubMed
Fennell A.Y., Schlauch K.A., Gouthu S. et al. (2015) Short day transcriptomic programming during induction of dormancy in grapevine. Front. Plant Sci., 6, 834. PubMed PMC
Gupta P., Geniza M., Naithani S. et al. (2021) Chia (Salvia hispanica) gene expression atlas elucidates dynamic spatio-temporal changes associated with plant growth and development. Front. Plant Sci., 12, 667678. PubMed PMC
Godoy F., Kuhn N., Munoz M. et al. (2021) The role of auxin during early berry development in grapevine as revealed by transcript profiling from pollination to fruit set. Hortic. Res., 8, 140. PubMed PMC
Perez-Riverol Y., Xu Q.W., Wang R. et al. (2016) PRIDE Inspector Toolsuite: moving toward a universal visualization tool for proteomics data standard formats and quality assessment of ProteomeXchange datasets. Mol. Cell. Proteomics, 15, 305–317. PubMed PMC
Kosova K., Vitamvas P., Urban M.O. et al. (2018) Plant abiotic stress proteomics: the major factors determining alterations in cellular proteome. Front. Plant Sci., 9, 122. PubMed PMC
Jarnuczak A.F. and Vizcaino J.A. (2017) Using the PRIDE Database and ProteomeXchange for submitting and accessing public proteomics datasets. Curr. Protoc. Bioinfor., 59, 13 31 11–13 31 12. PubMed
Okuda S., Watanabe Y., Moriya Y. et al. (2017) jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Res., 45, D1107–D1111. PubMed PMC
Moriya Y., Kawano S., Okuda S. et al. (2019) The jPOST environment: an integrated proteomics data repository and database. Nucleic Acids Res., 47, D1218–D1224. PubMed PMC
Chen T., Ma J., Liu Y. et al. (2022) iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Res., 50, D1522–D1527. PubMed PMC
Ma J., Chen T., Wu S. et al. (2019) iProX: an integrated proteome resource. Nucleic Acids Res., 47, D1211–D1217. PubMed PMC
Sharma V., Eckels J., Taylor G.K. et al. (2014) Panorama: a targeted proteomics knowledge base. J. Proteome. Res., 13, 4205–4210. PubMed PMC
Desiere F., Deutsch E.W., King N.L. et al. (2006) The Peptide Atlas project. Nucleic Acids Res., 34, D655–658. PubMed PMC
Deutsch E.W. (2010) The PeptideAtlas Project. Methods Mol. Biol., 604, 285–296. PubMed PMC
Tsugawa H., Rai A., Saito K. et al. (2021) Metabolomics and complementary techniques to investigate the plant phytochemical cosmos. Nat. Prod. Rep., 38, 1729–1759. PubMed
Members M.S.I.B., Sansone S.A., Fan T. et al. (2007) The metabolomics standards initiative. Nat. Biotechnol., 25, 846–848. PubMed
Sumner L.W., Amberg A., Barrett D. et al. (2007) Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics, 3, 211–221. PubMed PMC
Vinaixa M., Schymanski E.L., Neumann S. et al. (2016) Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects. TrAC, 78, 23–35.
Salek R.M., Neumann S., Schober D. et al. (2015) COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access. Metabolomics, 11, 1587–1597. PubMed PMC
Steinbeck C., Conesa P., Haug K. et al. (2012) MetaboLights: towards a new COSMOS of metabolomics data management. Metabolomics, 8, 757–760. PubMed PMC
Considine E.C. and Salek R.M. (2019) A tool to encourage minimum reporting guideline uptake for data analysis in metabolomics. Metabolites, 9: 43. PubMed PMC
Schorn M.A., Verhoeven S., Ridder L. et al. (2021) A community resource for paired genomic and metabolomic data mining. Nat. Chem. Biol., 17, 363–368. PubMed PMC
Cooper L. and Jaiswal P. (2016) The Plant Ontology: a tool for plant genomics. Methods Mol. Biol., 1374, 89–114. PubMed
Cooper L., Walls R.L., Elser J. et al. (2013) The plant ontology as a tool for comparative plant anatomy and genomic analyses. Plant Cell Physiol, 54, e1. PubMed PMC
Avraham S., Tung C.W., Ilic K. et al. (2008) The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations. Nucleic Acids Res., 36, D449–454. PubMed PMC
Warman C., Sullivan C.M., Preece J. et al. (2021) A cost-effective maize ear phenotyping platform enables rapid categorization and quantification of kernels. Plant J., 106, 566–579. PubMed
Oellrich A., Walls R.L., Cannon E.K. et al. (2015) An ontology approach to comparative phenomics in plants. Plant Methods, 11, 10. PubMed PMC
Cooper L., Meier A., Laporte M.A. et al. (2018) The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res., 46, D1168–D1180. PubMed PMC
Tello-Ruiz M.K., Naithani S., Gupta P. et al. (2021) Gramene 2021: harnessing the power of comparative genomics and pathways for plant research. Nucleic Acids Res., 49, D1452–D1463. PubMed PMC
Naithani S. and Jaiswal P. (2017) Pathway analysis and omics data visualization using pathway genome databases: FragariaCyc, a case study. Methods Mol. Biol., 1533, 241–256. PubMed
Naithani S., Raja R., Waddell E.N. et al. (2014) VitisCyc: a metabolic pathway knowledgebase for grapevine (Vitis vinifera). Front. Plant Sci., 5, 644. PubMed PMC
Gupta P., Naithani S., Preece J. et al. (2022) Plant reactome and PubChem: the plant pathway and (Bio)Chemical Entity Knowledgebases. Methods Mol. Biol., 2443, 511–525. PubMed
Naithani S., Gupta P., Preece J. et al. (2020) Plant Reactome: a knowledgebase and resource for comparative pathway analysis. Nucleic Acids Res., 48, D1093–D1103. PubMed PMC
Jaiswal P. and Usadel B. (2016) Plant Pathway Databases. Methods Mol. Biol., 1374, 71–87. PubMed
Kattge J., Ogle K., Bönisch G. et al. (2011) A generic structure for plant trait databases. Meth. Ecol. Evolut., 2, 202–213.
van Kleunen M., Pysek P., Dawson W. et al. (2019) The Global Naturalized Alien Flora (GloNAF) database. Ecology, 100, e02542. PubMed
Manolio T.A., Collins F.S., Cox N.J. et al. (2009) Finding the missing heritability of complex diseases. Nature, 461, 747–753. PubMed PMC
Visscher P.M., Brown M.A., McCarthy M.I. et al. (2012) Five years of GWAS discovery. Am. J. Hum. Genet., 90, 7–24. PubMed PMC
Visscher P.M., Wray N.R., Zhang Q. et al. (2017) 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet., 101, 5–22. PubMed PMC
Uffelmann E., Huang Q.Q., Munung N.S. et al. (2021) Genome-wide association studies. Nat. Rev. Methods Primers, 1, 1–21.
Falconer D.S. and Mackay T.F.C. (1996) Introduction to Quantitative Genetics. Longmans Green, Harlow, Essex, UK.
Kearsey M.J. (1998) The principles of QTL analysis (a minimal mathematics approach). J. Exp. Bot., 49, 1619–1623.
Lynch M. and Walsh B.V. (1998) Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.
Sallam A., Eltaher S., Alqudah A.M. et al. (2022) Combined GWAS and QTL mapping revealed candidate genes and SNP network controlling recovery and tolerance traits associated with drought tolerance in seedling winter wheat. Genomics, 114, 110358. PubMed
Hayes B.J., Gjuvsland A. and Omholt S. (2006) Power of QTL mapping experiments in commercial Atlantic salmon populations, exploiting linkage and linkage disequilibrium and effect of limited recombination in males. Heredity, 97, 19–26. PubMed
Joiret M., Mahachie John J.M., Gusareva E.S. et al. (2019) Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies. BioData Min., 12, 11. PubMed PMC
Hartl D.L., Clark A.G. and Clark A.G. (1997) Principles of Population Genetics. Sinauer Associates, Sunderland, MA.
Lee Y.H. (2015) Meta-analysis of genetic association studies. Ann. Lab. Med., 35, 283–287. PubMed PMC
Dehghan A. (2018) Genome-wide association studies. Methods Mol. Biol., 1793, 37–49. PubMed
Buniello A., MacArthur J.A.L., Cerezo M. et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res., 47, D1005–D1012. PubMed PMC
Togninalli M., Seren U., Freudenthal J.A. et al. (2020) AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana. Nucleic Acids Res., 48, D1063–D1068. PubMed PMC
Zeggini E. and Ioannis J.P.A. (2009) Meta-analysis in genome-wide association studies. Pharmacogenomics, 10, 191–201. PubMed PMC
Soriano J.M., Colasuonno P., Marcotuli I. et al. (2021) Meta-QTL analysis and identification of candidate genes for quality, abiotic and biotic stress in durum wheat. Sci. Rep., 11, 11877. PubMed PMC
Kraft P., Zeggini E. and Ioannidis J.P. (2009) Replication in genome-wide association studies. Stat Sci., 24, 561–573. PubMed PMC
Li P., Zhang Y., Yin S. et al. (2018) QTL-by-environment interaction in the response of maize root and shoot traits to different water regimes. Front. Plant Sci., 9, 229. PubMed PMC
Lowry D.B., Lovell J.T., Zhang L. et al. (2019) QTL × environment interactions underlie adaptive divergence in switchgrass across a large latitudinal gradient. Proc. Natl. Acad. Sci., 116, 12933–12941. PubMed PMC
Pinu F.R., Beale D.J., Paten A.M. et al. (2019) Systems biology and multi-omics integration: viewpoints from the metabolomics research community. Metabolites, 9, 76. PubMed PMC
Pacheco A.R., Pauvert C., Kishore D. et al. (2022) Toward FAIR Representations of Microbial Interactions. mSystems, 7, e0065922. PubMed PMC
Sumner L.W., Styczynski M., McLean J. et al. (2015) Introducing the USA plant, algae and microbial metabolomics research coordination network (PAMM-NET). Metabolomics, 11, 3–5. PubMed PMC
Kodra D., Pousinis P., Vorkas P.A. et al. (2022) Is current practice adhering to guidelines proposed for metabolite identification in LC-MS untargeted metabolomics? A meta-analysis of the literature. J. Proteome Res., 21, 590–598. PubMed
Schroeder M., Meyer S.W., Heyman H.M. et al. (2019) Generation of a collision cross section library for multi-dimensional plant metabolomics using UHPLC-Trapped Ion Mobility-MS/MS. Metabolites, 10, 13. PubMed PMC
Wilkinson M.D., Dumontier M., Aalbersberg I.J. et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 3, 160018. PubMed PMC
Jeliazkova N., Apostolova M.D., Andreoli C. et al. (2021) Towards FAIR nanosafety data. Nat. Nanotechnol., 16, 644–654. PubMed
Iturbide M., Fernandez J., Gutierrez J.M. et al. (2022) Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository. Sci. Data, 9, 629. PubMed PMC
Mons B., Neylon C., Velterop J. et al. (2017) Cloudy, increasingly FAIR; revisiting the FAIR data guiding principles for the European open science cloud. Inform. Serv. Use, 37, 49–56.