-
Something wrong with this record ?
How to learn about gene function: text-mining or ontologies
TG. Soldatos, N. Perdigão, NP. Brown, KS. Sabir, SI. O'Donoghue,
Language English Country United States
Document type Journal Article, Review
- MeSH
- Data Mining methods trends MeSH
- Databases, Genetic * trends MeSH
- Gene Ontology * trends MeSH
- Humans MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Review MeSH
As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.
CEITEC Masaryk University Brno Czech Republic
CSIRO Computational Informatics Sydney Australia
Garvan Institute of Medical Research Sydney Australia
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc16000677
- 003
- CZ-PrNML
- 005
- 20160126115335.0
- 007
- ta
- 008
- 160108s2015 xxu f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1016/j.ymeth.2014.07.004 $2 doi
- 035 __
- $a (PubMed)25088781
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxu
- 100 1_
- $a Soldatos, Theodoros G $u MolecularHealth GmbH, Heidelberg, Germany.
- 245 10
- $a How to learn about gene function: text-mining or ontologies / $c TG. Soldatos, N. Perdigão, NP. Brown, KS. Sabir, SI. O'Donoghue,
- 520 9_
- $a As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.
- 650 _2
- $a zvířata $7 D000818
- 650 _2
- $a data mining $x metody $x trendy $7 D057225
- 650 12
- $a databáze genetické $x trendy $7 D030541
- 650 12
- $a genová ontologie $x trendy $7 D063990
- 650 _2
- $a lidé $7 D006801
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a přehledy $7 D016454
- 700 1_
- $a Perdigão, Nelson $u Instituto Superior Técnico, Universidade de Lisboa, Portugal.
- 700 1_
- $a Brown, Nigel P $u CEITEC, Masaryk University, Brno, Czech Republic.
- 700 1_
- $a Sabir, Kenneth S $u Garvan Institute of Medical Research, Sydney, Australia.
- 700 1_
- $a O'Donoghue, Seán I $u Garvan Institute of Medical Research, Sydney, Australia; CSIRO Computational Informatics, Sydney, Australia.
- 773 0_
- $w MED00005029 $t Methods (San Diego, Calif.) $x 1095-9130 $g Roč. 74, č. - (2015), s. 3-15
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/25088781 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20160108 $b ABA008
- 991 __
- $a 20160126115458 $b ABA008
- 999 __
- $a ok $b bmc $g 1102958 $s 924883
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2015 $b 74 $c - $d 3-15 $e 20140801 $i 1095-9130 $m Methods $n Methods $x MED00005029
- LZP __
- $a Pubmed-20160108