Fully Automated DCNN-Based Thermal Images Annotation Using Neural Network Pretrained on RGB Data
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic
Typ dokumentu časopisecké články
PubMed
33672344
PubMed Central
PMC7926581
DOI
10.3390/s21041552
PII: s21041552
Knihovny.cz E-zdroje
- Klíčová slova
- IR, RGB, YOLO, data annotation, deep convolutional neural networks, object detector, thermal, transfer learning,
- Publikační typ
- časopisecké články MeSH
One of the biggest challenges of training deep neural network is the need for massive data annotation. To train the neural network for object detection, millions of annotated training images are required. However, currently, there are no large-scale thermal image datasets that could be used to train the state of the art neural networks, while voluminous RGB image datasets are available. This paper presents a method that allows to create hundreds of thousands of annotated thermal images using the RGB pre-trained object detector. A dataset created in this way can be used to train object detectors with improved performance. The main gain of this work is the novel method for fully automatic thermal image labeling. The proposed system uses the RGB camera, thermal camera, 3D LiDAR, and the pre-trained neural network that detects objects in the RGB domain. Using this setup, it is possible to run the fully automated process that annotates the thermal images and creates the automatically annotated thermal training dataset. As the result, we created a dataset containing hundreds of thousands of annotated objects. This approach allows to train deep learning models with similar performance as the common human-annotation-based methods do. This paper also proposes several improvements to fine-tune the results with minimal human intervention. Finally, the evaluation of the proposed solution shows that the method gives significantly better results than training the neural network with standard small-scale hand-annotated thermal image datasets.
Zobrazit více v PubMed
Zalud L., Kocmanova P. Fusion of thermal imaging and CCD camera-based data for stereovision visual telepresence; Proceedings of the 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR); Linköping, Sweden. 21–26 October 2013; pp. 1–6.
Krizhevsky A., Sutskever I., Hinton G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012;25:1097–1105. doi: 10.1145/3065386. DOI
Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv. 20141409.1556
Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A. Going deeper with convolutions; Proceedings of the IEEE Conference on Computer Vision and Pattern recognition; Boston, MA, USA. 7–12 June 2015; pp. 1–9.
He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 770–778.
Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. Densely connected convolutional networks; Proceedings of the IEEE Conference on Computer Vision and Pattern recognition; Honolulu, HI, USA. 21–26 July 2017; pp. 4700–4708.
Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015;115:211–252. doi: 10.1007/s11263-015-0816-y. DOI
Ligocki A., Jelinek A., Zalud L. Brno Urban Dataset–The New Data for Self-Driving Agents and Mapping Tasks. arXiv. 20191909.06897
Ligocki A., Jelinek A., Zalud L. Atlas Fusion–Modern Framework for Autonomous Agent Sensor Data Fusion. arXiv. 20202010.11991
FLIR Systems, I. FREE FLIR Thermal Dataset for Algorithm Training. [(accessed on 1 June 2020)]; Available online: https://www.flir.com/oem/adas/adas-dataset-form/
FLIR Systems, I. Enhanced San Francisco Dataset. [(accessed on 1 June 2020)]; Available online: https://www.flir.eu/oem/adas/dataset/san-francisco-dataset/
FLIR Systems, I. FLIR European Regional Thermal Dataset for Algorithm Training. [(accessed on 1 June 2020)]; Available online: https://www.flir.eu/oem/adas/dataset/european-regional-thermal-dataset/
Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118. doi: 10.1038/nature21056. PubMed DOI PMC
Hwang S., Park J., Kim N., Choi Y., Kweon I.S. Multispectral Pedestrian Detection: Benchmark Dataset and Baselines; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Boston, MA, USA. 7–12 June 2015.
Torabi A., Massé G., Bilodeau G.A. An iterative integrated framework for thermal–visible image registration, sensor fusion, and people tracking for video surveillance applications. Comput. Vis. Image Underst. 2012;116:210–221. doi: 10.1016/j.cviu.2011.10.006. DOI
Khellal A., Ma H., Fei Q. International Conference on Intelligent Robotics and Applications. Springer; Berlin/Heidelberg, Germany: 2015. Pedestrian classification and detection in far infrared images; pp. 511–522.
Portmann J., Lynen S., Chli M., Siegwart R. People detection and tracking from aerial thermal views; Proceedings of the 2014 IEEE international conference on robotics and automation (ICRA); Hong Kong, China. 31 May–7 June 2014; pp. 1794–1800.
Davis J.W., Sharma V. Background-subtraction using contour-based fusion of thermal and visible imagery. Comput. Vis. Image Underst. 2007;106:162–182. doi: 10.1016/j.cviu.2006.06.010. DOI
Wu Z., Fuller N., Theriault D., Betke M. A thermal infrared video benchmark for visual analysis; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Columbus, OH, USA. 23–28 June 2014; pp. 201–208.
Geiger A., Lenz P., Stiller C., Urtasun R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013;32:1231–1237. doi: 10.1177/0278364913491297. DOI
Yu F., Xian W., Chen Y., Liu F., Liao M., Madhavan V., Darrell T. Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv. 20181805.04687
Huang X., Wang P., Cheng X., Zhou D., Geng Q., Yang R. The apolloscape open dataset for autonomous driving and its application. IEEE Trans. Pattern Anal. Mach. Intell. 2019;42:2702–2719. doi: 10.1109/TPAMI.2019.2926463. PubMed DOI
Maddern W., Pascoe G., Linegar C., Newman P. 1 year, 1000 km: The Oxford RobotCar dataset. Int. J. Robot. Res. 2017;36:3–15. doi: 10.1177/0278364916679498. DOI
Nyberg A. Transforming Thermal Images to Visible Spectrum Images Using Deep Learning. [(accessed on 20 February 2021)]; Available online: https://www.diva-portal.org/smash/get/diva2:1255342/FULLTEXT01.pdf.
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets. arXiv. 20141406.2661
Zhang L., Gonzalez-Garcia A., van de Weijer J., Danelljan M., Khan F.S. Synthetic data generation for end-to-end thermal infrared tracking. IEEE Trans. Image Process. 2018;28:1837–1850. doi: 10.1109/TIP.2018.2879249. PubMed DOI
Kniaz V.V., Knyaz V.A., Hladuvka J., Kropatsch W.G., Mizginov V. Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset; Proceedings of the European Conference on Computer Vision (ECCV); Munich, Germany. 8–14 September 2018.
Tumas P., Serackis A. Automated image annotation based on YOLOv3; Proceedings of the 2018 IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE); Vilnius, Lithuania. 8–10 November 2018; pp. 1–3.
Ivašić-Kos M., Krišto M., Pobar M. Human detection in thermal imaging using YOLO; Proceedings of the 2019 5th International Conference on Computer and Technology Applications; Istanbul, Turkey. 16–17 April 2019; pp. 20–24.
Gomez A., Conti F., Benini L. Thermal image-based CNN’s for ultra-low power people recognition; Proceedings of the 15th ACM International Conference on Computing Frontiers; Ischia, Italy. 8–10 May 2018; pp. 326–331.
Park J., Chen J., Cho Y.K., Kang D.Y., Son B.J. CNN-based person detection using infrared images for night-time intrusion warning systems. Sensors. 2020;20:34. doi: 10.3390/s20010034. PubMed DOI PMC
Ghose D., Desai S.M., Bhattacharya S., Chakraborty D., Fiterau M., Rahman T. Pedestrian Detection in Thermal Images using Saliency Maps; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Long Beach, CA, USA. 16–20 June 2019; pp. 11–17.
Adam L., Ales J.L.Z. Brno Urban Dataset. [(accessed on 20 January 2021)]; Available online: https://github.com/Robotics-BUT/Brno-Urban-Dataset.
Ultralytics Yolov5. [(accessed on 20 January 2021)]; Available online: https://github.com/ultralytics/yolov5.
Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 779–788.
Girshick R. Fast r-cnn; Proceedings of the IEEE International Conference on Computer Vision; Santiago, Chile. 7–13 December 2015; pp. 1440–1448.
Redmon J., Farhadi A. YOLO9000: Better, faster, stronger; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA. 21–26 July 2017; pp. 7263–7271.
Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., et al. Advances in Neural Information Processing Systems 32. Curran Associates, Inc.; New York, NY, USA: 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library; pp. 8024–8035.
Adam L., Ales J.L.Z. Atlas Fusion. [(accessed on 20 January 2021)]; Available online: https://github.com/Robotics-BUT/Atlas-Fusion.
Merriaux P., Dupuis Y., Boutteau R., Vasseur P., Savatier X. LiDAR point clouds correction acquired from a moving car based on CAN-bus data. arXiv. 20171706.05886
Zhang B., Zhang X., Wei B., Qi C. A Point Cloud Distortion Removing and Mapping Algorithm based on Lidar and IMU UKF Fusion; Proceedings of the 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM); Hong Kong, China. 8–12 July 2019; pp. 966–971.
Lin T.Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L. European Conference on Computer Vision. Springer; Berlin/Heidelberg, Germany: 2014. Microsoft coco: Common objects in context; pp. 740–755.
Jung A.B. imgaug. [(accessed on 30 October 2018)]; Available online: https://github.com/aleju/imgaug.
Zoph B., Cubuk E.D., Ghiasi G., Lin T.Y., Shlens J., Le Q.V. European Conference on Computer Vision. Springer; Berlin/Heidelberg, Germany: 2020. Learning data augmentation strategies for object detection; pp. 566–583.
Oksuz K., Cam B.C., Kalkan S., Akbas E. Imbalance problems in object detection: A review. arXiv. 2019 doi: 10.1109/TPAMI.2020.2981890.1909.00169 PubMed DOI
Everingham M., Van Gool L., Williams C.K., Winn J., Zisserman A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010;88:303–338. doi: 10.1007/s11263-009-0275-4. DOI