• This record comes from PubMed

Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks

. 2019 Nov 01 ; 68 (6) : 876-895.

Language English Country Great Britain, England Media print

Document type Evaluation Study, Journal Article, Research Support, Non-U.S. Gov't

Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural networks (CNNs) can learn to extract relevant features automatically, without human intervention. Unfortunately, reaching expert-level accuracy in CNN identifications requires substantial computational power and huge training data sets, which are often not available for taxonomic tasks. This can be addressed using feature transfer: a CNN that has been pretrained on a generic image classification task is exposed to the taxonomic images of interest, and information about its perception of those images is used in training a simpler, dedicated identification system. Here, we develop an effective method of CNN feature transfer, which achieves expert-level accuracy in taxonomic identification of insects with training sets of 100 images or less per category, depending on the nature of data set. Specifically, we extract rich representations of intermediate to high-level image features from the CNN architecture VGG16 pretrained on the ImageNet data set. This information is submitted to a linear support vector machine classifier, which is trained on the target problem. We tested the performance of our approach on two types of challenging taxonomic tasks: 1) identifying insects to higher groups when they are likely to belong to subgroups that have not been seen previously and 2) identifying visually similar species that are difficult to separate even for experts. For the first task, our approach reached $CDATA[$CDATA[$>$$92% accuracy on one data set (884 face images of 11 families of Diptera, all specimens representing unique species), and $CDATA[$CDATA[$>$$96% accuracy on another (2936 dorsal habitus images of 14 families of Coleoptera, over 90% of specimens belonging to unique species). For the second task, our approach outperformed a leading taxonomic expert on one data set (339 images of three species of the Coleoptera genus Oxythyrea; 97% accuracy), and both humans and traditional automated identification systems on another data set (3845 images of nine species of Plecoptera larvae; 98.6 % accuracy). Reanalyzing several biological image identification tasks studied in the recent literature, we show that our approach is broadly applicable and provides significant improvements over previous methods, whether based on dedicated CNNs, CNN feature transfer, or more traditional techniques. Thus, our method, which is easy to apply, can be highly successful in developing automated taxonomic identification systems even when training data sets are small and computational budgets limited. We conclude by briefly discussing some promising CNN-based research directions in morphological systematics opened up by the success of these techniques in providing accurate diagnostic tools.

See more in PubMed

Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., Corrado G.S., Davis A., Dean J., Devin M., Ghemawat S., Goodfellow I., Harp A., Irving G., Isard M., Jia Y., Jozefowicz R., Kaiser L., Kudlur M., Levenberg J., Mane D., Monga R., Moore S., Murray D., Olah C., Schuster M., Shlens J., Steiner B., Sutskever I., Talwar K., Tucker P., Vanhoucke V., Vasudevan V., Viegas F., Vinyals O., Warden P., Wattenberg M., Wicke M., Yu Y., Zheng X.. 2016. TensorFlow: large-scale machine learning on heterogeneous distributed systems. Available from arXiv:1603.04467.

Arandjelović R., Zisserman A.. 2013. All about VLAD. 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE.

Arbuckle T., Schröder S., Steinhage V., Wittmann D.. 2001. Biodiversity informatics in action: identification and monitoring of bee species using ABIS. Proc. 15th Int. Symp. Informatics for Environmental Protection, Zürich, Switzerland: ETH; vol. 1 p. 425–430.

Austen G.E., Bindemann M., Griffiths R.A., Roberts D.L.. 2016. Species identification by experts and non-experts: comparing images from field guides. Sci. Rep. 6:33634. PubMed PMC

Azizpour H., Razavian A.S., Sullivan J., Maki A., Carlsson S.. 2016. Factors of transferability for a generic ConvNet representation. IEEE Trans. Pattern Anal. Mach. Intell. 38:1790–1802. PubMed

Baraud J. 1992. Coléoptères Scarabaeoidea d’Europe. Fédération Française des Sociétés de Sciences Naturelles & Société Linnéenne de Lyon, Faune de France, 78:1–856.

Barré P., Stöver B.C., Müller K.F., Steinhage V.. 2017. LeafNet: a computer vision system for automatic plant species identification. Ecol. Inform. 40:50–56.

Barker E., Barker W., Burr W., Polk W., Smid M.. 2012. Recommendation for key management part 1: general (revision 3). NIST Spec. Pub. 800:1–147.

Bengio Y. 2011. Deep learning of representations for unsupervised and transfer learning. Proceedings of ICML Workshop on Unsupervised and Transfer Learning. Bellevue: IEEE; p. 17–36.

Bengio Y., LeCun Y.. 2007. Scaling learning algorithms towards AI. In: Bottou L, Chapelle D, DeCoste D, Weston J, editors. Large-scale kernel machines. Cambridge MA: MIT Press; vol. 34 p. 1–41.

Brehm G., Strutzenberger P., Fiedler K.. 2013. Phylogenetic diversity of geometrid moths decreases with elevation in the tropical Andes. Ecography 36:1247–1253.

Breiman L. 2001. Random forests. Mach. Learn. 45:5–32.

Carranza-Rojas J., Goeau H., Bonnet P., Mata-Montero E., Joly A.. 2017. Going deeper in the automated identification of herbarium specimens. BMC Evol. Biol. 17:181. PubMed PMC

Caruana R. 1995. Learning many related tasks at the same time with backpropagation. In: Tesauro G, Touretzky DS and Leen TK, editors. Advances in neural information processing systems 7. Cambridge MA: MIT Press; p. 657–664.

Chollet F. 2015. Keras. GitHub. https://github.com/fchollet/keras.

Chollet F. 2016. Xception: deep learning with depthwise separable convolutions. Available from: arXiv:1610.02357.

Cireşan D., Meier U., Masci J., Schmidhuber J.. 2011. A committee of neural networks for traffic sign classification. The 2011 International Joint Conference on Neural Networks. San Jose: IEEE; 1918–1921.

Cortes C., Vapnik V.. 1995. Support-vector networks. Mach. Learn. 20:273–297.

Csurka G. 2017, Domain Adaptation for Visual Applications: A Comprehensive Survey. Available from: arXiv:1702.05374

Culverhouse P. F. 2007. Natural object categorization: man versus machine. In: MacLeod, N., editor. Automated taxon identification in systematics: theory, approaches and applications. Boca Raton, Florida: CRC Press; p. 25–45.

Culverhouse P. F., Macleod N., Williams R., Benfield M.C., Lopes R.M., Picheral M.. 2014. An empirical assessment of the consistency of taxonomic identifications. Mar. Biol. Res. 10:73–84.

Donahue J., Hendricks L.A., Guadarrama S,. Rohrbach M., Venugopalan S., Saenko K., Darrell T.. 2015. Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE; p. 2625–2634. PubMed

Donahue J., Jia Y., Vinyals O., Hoffman J., Zhang N., Tzeng E., Darrell T.. 2014. DeCAF: a deep convolutional activation feature for generic visual recognition. International Conference on Machine Learning. Columbus: IEEE; p. 647–655.

Everingham M., Van Gool L., Williams C. K.I., Winn J., Zisserman A.. 2010. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88:303–338.

Food and Agriculture Organization of the United Nations. Plant pests and diseases. http://www.fao.org/emergencies/emergency-types/plant-pests-and-diseases/en/http://www.fao.org/emergencies/. Accessed October 12, 2017.

Fei-Fei L., Fergus R., Perona P.. 2006. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28:594–611. PubMed

Feng L., Bhanu B., Heraty J.. 2016. A software system for automated identification and retrieval of moth images based on wing attributes. Pattern Recognit. 51:225–241.

Francoy T.M., Wittmann D., Drauschke M., Müller S., Steinhage V., Bezerra-Laure M. A.F., De Jong D., Gonçalves L.S.. 2008. Identification of africanized honey bees through wing morphometrics: two fast and efficient procedures. Apidologie 39:488–494.

Fukushima K. 1979. Neural network model for a mechanism of pattern recognition unaffected by shift in position—neocognitron. (in Japanese with English abstract). Trans. IECE Jpn. (A)62-A(10):658–665. PubMed

Fukushima K. 1980. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36:193–202. PubMed

Fukushima K., Miyake S., Ito T.. 1983. Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE Trans. Syst. Man Cybern. SMC 13:826–834.

Gauld I.D., ONeill M.A., Gaston K.J., Others 2000. Driving miss daisy: the performance of an automated insect identification system. In: Austin AD, Dowton M, editors. Hymenoptera: evolution, biodiversity and biological control. Collingwood, VIC, Australia: CSIRO; p. 303–312.

Global Invasive Species Database. (2017). Available from: http://www.iucngisd.org/gisd/100_worst.php Accessed on 12-10-2017.

Gonçalves A.B., Souza J.S., da Silva G.G., Cereda M.P., Pott A., Naka M.H., Pistori H.. 2016. Feature extraction and machine learning for the classification of Brazilian Savannah pollen grains. PLoS One 11:e0157044. PubMed PMC

Griffin G., Holub A., Perona P.. 2007. Caltech-256 object category dataset. Pasadena (CA): California Institute of Technology.

He H., Garcia E.A.. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21:1263–1284.

He K., Zhang X., Ren S., Sun, J.. 2016. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE; p. 770–778.

Joly A., Goëau H., Bonnet P., Bakić V., Barbe J., Selmi S., Yahiaoui I., Carré J., Mouysset E., Molino J.-F., Boujemaa N., Barthélémy D.. 2014. Interactive plant identification based on social image data. Ecol. Inform. 23:22–34.

Kadir A. 2014. A model of plant identification system using GLCM, lacunarity and shen features. Available from: arXiv:1410.0969.

Karpathy A., Fei-Fei L.. 2015. Deep visual-semantic alignments for generating image descriptions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE; p. 3128–3137. PubMed

Khan R., de Weijer J.v., Khan F.S., Muselet D., Ducottet C., Barat C.2013. Discriminative color descriptors. 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE; p. 2866–2873.

Kolbert E. 2014. The sixth extinction: an unnatural history. New York: A&C Black; 319 pp.

Krizhevsky A., Sutskever I., Hinton G.E.. 2012. ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C. J.C., Bottou, L., Weinberger, K.Q., editors. Advances in neural information processing systems 25. Lake Tahoe: IEEE; p. 1097–1105.

Kulkarni A.H., Rai H.M., Jahagirdar K.A., Upparamani P.S.. 2013. A leaf recognition technique for plant classification using RBPNN and Zernike moments. Int. J. Adv. Res. Comput. Commun. Eng. 2:984–988.

Larios N., Deng H., Zhang W., Sarpola M., Yuen J., Paasch R., Moldenke A., Lytle D.A., Correa S.R., Mortensen E.N., Shapiro L.G., Dietterich T.G.. 2008. Automated insect identification through concatenated histograms of local appearance features: feature vector generation and region detection for deformable objects. Mach. Vis. Appl. 19:105–123.

Lam M., Mahasseni B., Todorovic S.. 2017. Fine-grained recognition as HSnet search for informative image parts. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE.

LeCun Y., Bengio Y., Hinton G.. 2015. Deep learning. Nature 521:436–444. PubMed

Li Z., Hoiem D.. 2016. Learning without forgetting. Available from: arXiv:1606.09282. PubMed

Lin M., Chen Q., Yan S.. 2013. Network in network. Available from: arXiv:1312.4400.

Lin T.-Y., RoyChowdhury A., Maji S.. 2015. Bilinear CNNs for fine-grained visual recognition. Proceedings of the IEEE International Conference on Computer Vision. Boston: IEEE; p. 1449–1457.

Liu N., Kan J.-M.. 2016. Plant leaf identification based on the multi-feature fusion and deep belief networks method. J. Beijing For. Univ. 38:110–119

Lowe D.G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60:91–110.

Lytle D.A., Martínez-Muñoz G., Zhang W., Larios N., Shapiro L., Paasch R., Moldenke A., Mortensen E.N., Todorovic S., Dietterich T.G.. 2010. Automated processing and identification of benthic invertebrate samples. J. North Am. Benthol. Soc. 29:867–874.

Maaten L. van der, Hinton G.. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9:2579–2605.

MacLeod N., Benfield M., Culverhouse P.. 2010. Time to automate identification. Nature 467:154–155. PubMed

Martineau M., Conte D., Raveaux R., Arnault I., Munier D., Venturini G.. 2017. A survey on image-based insects classification. Pattern Recognit. 65:273–284.

Mata-Montero E., Carranza-Rojas J.. 2015. A texture and curvature bimodal leaf recognition model for identification of Costa Rican plant species. 2015Latin American Computing Conference (CLEI). Arequipa, Peru: IEEE; p. 1–12.

Mikšić R. 1982. Monographie der Cetoniinae der Paläarktischen und Orientalischen Region (Coleoptera, Lamellicornia). Band 3 Forstinstitut; in Sarajevo: 530 pp.

Murray N., Perronnin F.. 2014. Generalized max pooling. Available from: arXiv:1406.0312.

Nilsback M.E., Zisserman A.. 2008. Automated flower classification over a large number of classes. 2008Sixth Indian Conference on Computer Vision, Graphics Image Processing. Bhubaneswar, India: IEEE; p. 722–729.

ONeill M.A. 2007. DAISY: a practical tool for semi-automated species identification. Automated taxon identification in systematics: theory, approaches, and applications. Boca Raton, FL: CRC Press/Taylor & Francis Group; p. 101–114.

Oquab M., Bottou L., Laptev I., Sivic J.. 2014. Learning and transferring mid-level image representations using convolutional neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition. Columbus: IEEE; p. 17171724.

Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay É. 2011. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12:2825–2830.

Plotly Technologies, I. 2015. Collaborative data science. Montral, QC: Plotly Technologies Inc.

Qian Q., Jin R., Zhu S., Lin Y.. 2014. Fine-Grained visual categorization via multi-stage metric learning. Available from: arXiv:1402.0453.

Rabinovich A., Vedaldi A., Belongie S.J.. 2007. Does image segmentation improve object categorization? San Diego: UCSD CSE Department; Tech. Rep. CS2007-090 p. 1–9

Razavian A. S., Azizpour H., Sullivan J., Carlsson, S. 2014. CNN features off-the-shelf: an astounding baseline for recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus: IEEE; p. 806–813.

Rodner E., Simon M., Brehm G., Pietsch S., Wägele J.W., Denzler J.. 2015. Fine-grained recognition datasets for biodiversity analysis. Available from: arXiv:1507.00913.

Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., Berg A.C., Fei-Fei L.. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115:211–252.

Sabatinelli G. 1981. Le Oxythyrea Muls. del Mediterraneo: studi morfologici sistematici (Coleoptera, Scarabaeoidae). Fragm. Entomol. 16:45–60.

Schröder S., Drescher W., Steinhage V., Kastenholz B.. 1995. An automated method for the identification of bee species (Hymenoptera: Apoidea). Proc. Intern. Symp. on Conserving Europe’s Bees. London (UK): Int. Bee Research Ass. & Linnean Society, London; p. 6–7.

Scriven J.J., Woodall L.C., Tinsley M.C., Knight M.E., Williams P.H., Carolan J.C., Brown M.J.F., Goulson D.. 2015. Revealing the hidden niches of cryptic bumblebees in Great Britain: implications for conservation. Biol. Conserv. 182:126–133.

Simonyan K., Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. Available from: arXiv:1409.1556.

Schmidhuber J. 2015. Deep learning in neural networks: an overview. Neural Netw. 61:85–117. PubMed

Steinhage V., Schröder S., Cremers A.B., Lampe K.-H.. 2007. Automated extraction and analysis of morphological features for species identification. In: MacLeod N, editor. Automated taxon identification in systematics: theory, approaches and applications. Boca Raton, Florida: CRC Press; p. 115–129.

Stallkamp J., Schlipsing M., Salmen J., Igel C.. 2011. The German Traffic Sign Recognition Benchmark: a multi-class classification competition. The 2011 International Joint Conference on Neural Networks. San Jose: IEEE; p. 1453–1460.

Sun Y., Liu Y., Wang G., Zhang H.. 2017. Deep learning for plant identification in natural environment. Comput. Intell. Neurosci. 2017:7361042. PubMed PMC

Szegedy C., Ioffe S., Vanhoucke V., Alemi A.. 2016a. Inception-v4, Inception-ResNet and the impact of residual connections on learning. Available from: arXiv:1602.07261.

Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z.. 2016b. Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE; p. 2818–2826.

Tofilski A. 2004. DrawWing, a program for numerical description of insect wings. J. Insect Sci. 4:17. PubMed PMC

Tofilski A. 2007. Automatic measurement of honeybee wings. In: MacLeod N, editor. Automated taxon identification in systematics: theory, approaches and applications. Boca Raton, Florida: CRC Press; p. 277–288.

Van Horn G., Branson S., Farrell R., Haber S., Barry J., Ipeirotis P., Perona P., Belongie S.. 2015. Building a bird recognition App and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE; p. 595–604.

Vapnik V.N. 1999. An overview of statistical learning theory. IEEE Trans. Neural Netw. 10:988–999. PubMed

Vapnik V., Chapelle O.. 2000. Bounds on error expectation for support vector machines. Neural Comput. 12:20132036. PubMed

Vondráček D. 2011. Population structure of flower chafer Oxythyrea funesta (Poda, 1761) and phylogeny of the genus Oxythyrea Mulsant, 1842. (Diploma thesis). Available from https://dspace.cuni.cz/handle/20.500.11956/41477.

Vondráček D., Hadjiconstantis M., Šipek P.. 2017. Phylogeny of the genus Oxythyrea using molecular, ecological and morphological data from adults and larvae (Coleoptera: Scarabaeidae: Cetoniinae). pp. 857–858. In: Seidel M., Arriaga-Varela E., Vondráček D., editors. Abstracts of the Immature Beetles Meeting 2017, October 5–6th, Prague, Czech Republic. Acta Entomol. Mus. Natl. Pragae 57: 835–859.

Wah C., Branson S., Welinder P., Perona P., Belongie S.. 2011. The Caltech-UCSD birds-200-2011 dataset. Pasadena (CA): California Institute of Technology.

Watson A.T., O’Neill M.A., Kitching I.J.. 2003. Automated identification of live moths (macrolepidoptera) using digital automated identification system (DAISY). Syst. Biodivers. 1:287–300.

Weeks P., Gauld I.D., Gaston K.J., O’Neill M.A.. 1997. Automating the identification of insects: a new solution to an old problem. Bull. Entomol. Res. 87:203–211.

Weeks P. J.D., O’Neill M.A., Gaston K.J., Gauld I.D.. 1999a. Species–identification of wasps using principal component associative memories. Image Vis. Comput. 17:861–866.

Weeks P. J.D., ONeill M.A., Gaston K.J., Gauld, I.D.. 1999b. Automating insect identification: exploring the limitations of a prototype system. J. Appl. Entomol. 123:1–8.

Wei X.-S., Luo J.-H., Wu J., Zhou Z.-H.. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26:2868–2881. PubMed

Wilf P., Zhang S., Chikkerur S., Little S.A., Wing S.L., Serre T.. 2016. Computer vision cracks the leaf code. Proc. Natl. Acad. Sci. USA 113:3305–3310. PubMed PMC

World Health Organization. 2014. A global brief on vector-borne diseases. WHO/DCO/WHD/2014.1.

Wu S.G., Bao F.S., Xu E.Y., Wang Y.X., Chang Y.F., Xiang Q.L.. 2007. A leaf recognition algorithm for plant classification using probabilistic neural network. 2007IEEE International Symposium on Signal Processing and Information Technology., Cairo, Egypt: IEEE; p. 11–16.

Xu K., Ba J., Kiros R., Cho K., Courville A., Salakhudinov R., Zemel R., Bengio Y.. 2015. Show, attend and tell: neural image caption generation with visual attention. Lile, France: International Machine Learning Society (IMLS). p. 2048–2057.

Yang J., Jiang Y.-G., Hauptmann A.G., Ngo C.-W.. 2007. Evaluating bag-of-visual-words representations in scene classification. Proceedings of the International Workshop on Multimedia Information Retrieval. Augsburg, Germany: IEEE; p. 197–206.

Yang H.-P., Ma C.-S., Wen H., Zhan Q.-B., Wang X.-L.. 2015. A tool for developing an automatic insect identification system based on wing outlines. Sci. Rep. 5:12786. PubMed PMC

Yosinski J., Clune J., Bengio Y., Lipson H.. 2014. How transferable are features in deep neural networks? In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., and Weinberger, K.Q., (eds): Advances in Neural Information Processing Systems 27. Curran Associates, Inc. 3320–3328.

Zbontar J., LeCun Y.. 2016. Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17:2.

Zeiler M.D., Fergus R.. 2014. Visualizing and understanding convolutional networks. Comput. Vis. ECCV 2014. 818833.

Zhang Z.-Q. 2011. Animal biodiversity: an outline of higher-level classification and survey of taxonomic richness. Zootaxa 3148:1–237. PubMed

Zhang W., Yan J., Shi W., Feng T., Deng D.. 2017. Refining deep convolutional features for improving fine-grained image recognition. J. VLSI Signal Process. Syst. Signal Image Video Technol. 2017:27.

Zheng L., Zhao Y., Wang S., Wang J., Tian Q. Good practice in CNN feature transfer. 2016 Available from: arXiv:1604.00133.

See more in PubMed

Dryad
10.5061/dryad.20ch6p5

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...