Comparing assignment-based approaches to breed identification within a large set of horses
Jazyk angličtina Země Velká Británie, Anglie Médium print-electronic
Typ dokumentu časopisecké články
Grantová podpora
QH92277
Národní Agentura pro Zemědělsk Vzkum
LO1210
Ministerstvo Školství, Mládeže a Tělovýchovy
2108
Mendelova Univerzita v Brně
PubMed
30963515
DOI
10.1007/s13353-019-00495-x
PII: 10.1007/s13353-019-00495-x
Knihovny.cz E-zdroje
- Klíčová slova
- Assignment success, Genetic differentiation, Horse breeds, Machine learning, Microsatellite variability,
- MeSH
- alely MeSH
- algoritmy MeSH
- chov * MeSH
- druhová specificita MeSH
- frekvence genu MeSH
- genetická variace MeSH
- genomika * MeSH
- genotyp MeSH
- heterozygot MeSH
- koně klasifikace genetika MeSH
- mikrosatelitní repetice genetika MeSH
- software MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
Considering the extensive data sets and statistical techniques, animal breeding embodies a branch of machine learning that has a constantly increasing impact on breeding. In our study, information regarding the potential of machine learning and data mining within a large set of horses and breeds is presented. The individual assignment methods and factors influencing the success rate of the procedure are compared at the Czech population scale. The fixation index values ranged from 0.057 (HMS1) to 0.144 (HTG6), and the overall genetic differentiation amounted to 8.9% among the breeds. The highest genetic divergence (FST = 0.378) was established between the Friesian and Equus przewalskii; the highest degree of gene migration was obtained between the Czech and Bavarian Warmblood (Nm = 14,302); and the overall global heterozygote deficit across the populations was 10.4%. The eight standard methods (Bayesian, frequency, and distance) using GeneClass software and almost all mainstream classification algorithms (Bayes Net, Naive Bayes, IB1, IB5, KStar, JRip, J48, Random Forest, Random Tree, PART, MLP, and SVM) from the WEKA machine learning workbench were compared by utilizing 314,874 real allelic data sets. The Bayesian method (GeneClass, 89.9%) and Bayesian network algorithm (WEKA, 84.8%) outperformed the other techniques. The breed genomic prediction accuracy reached the highest value in the cold-blooded horses. The overall proportion of individuals correctly assigned to a population depended mainly on the breed number and genetic divergence. These statistical tools could be used to assess breed traceability systems, and they exhibit the potential to assist managers in decision-making as regards breeding and registration.
Zobrazit více v PubMed
Genetics. 1999 Dec;153(4):1989-2000 PubMed
Anim Genet. 2002 Aug;33(4):264-70 PubMed
Anim Genet. 2003 Aug;34(4):297-301 PubMed
J Hered. 2004 Nov-Dec;95(6):536-9 PubMed
Bioinformatics. 2005 May 1;21(9):2128-9 PubMed
Mol Ecol. 2006 Oct;15(11):3157-73 PubMed
Mol Ecol. 2007 Mar;16(5):1099-106 PubMed
Mol Ecol Resour. 2008 Jan;8(1):103-6 PubMed
Anim Genet. 2011 Dec;42(6):627-33 PubMed
Meat Sci. 2008 Oct;80(2):389-95 PubMed
BMC Genet. 2013 Dec 09;14:118 PubMed
Anim Genet. 2014 Dec;45(6):898-902 PubMed
J Anim Breed Genet. 2017 Apr;134(2):85-86 PubMed
Evolution. 1984 Nov;38(6):1358-1370 PubMed
J Anim Breed Genet. 2018 Feb;135(1):73-83 PubMed
Proc Natl Acad Sci U S A. 1973 Dec;70(12):3321-3 PubMed
Am J Hum Genet. 1967 May;19(3 Pt 1):233-57 PubMed
J Mol Evol. 1983;19(2):153-70 PubMed
Proc Natl Acad Sci U S A. 1995 Jul 18;92(15):6723-7 PubMed
Mol Ecol. 1995 Jun;4(3):347-54 PubMed
Proc Natl Acad Sci U S A. 1997 Aug 19;94(17):9197-201 PubMed
Anim Genet. 1997 Dec;28(6):397-400 PubMed