Fully Automated DCNN-Based Thermal Images Annotation Using Neural Network Pretrained on RGB Data

. 2021 Feb 23 ; 21 (4) : . [epub] 20210223

Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid33672344

One of the biggest challenges of training deep neural network is the need for massive data annotation. To train the neural network for object detection, millions of annotated training images are required. However, currently, there are no large-scale thermal image datasets that could be used to train the state of the art neural networks, while voluminous RGB image datasets are available. This paper presents a method that allows to create hundreds of thousands of annotated thermal images using the RGB pre-trained object detector. A dataset created in this way can be used to train object detectors with improved performance. The main gain of this work is the novel method for fully automatic thermal image labeling. The proposed system uses the RGB camera, thermal camera, 3D LiDAR, and the pre-trained neural network that detects objects in the RGB domain. Using this setup, it is possible to run the fully automated process that annotates the thermal images and creates the automatically annotated thermal training dataset. As the result, we created a dataset containing hundreds of thousands of annotated objects. This approach allows to train deep learning models with similar performance as the common human-annotation-based methods do. This paper also proposes several improvements to fine-tune the results with minimal human intervention. Finally, the evaluation of the proposed solution shows that the method gives significantly better results than training the neural network with standard small-scale hand-annotated thermal image datasets.

Zobrazit více v PubMed

Zalud L., Kocmanova P. Fusion of thermal imaging and CCD camera-based data for stereovision visual telepresence; Proceedings of the 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR); Linköping, Sweden. 21–26 October 2013; pp. 1–6.

Krizhevsky A., Sutskever I., Hinton G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012;25:1097–1105. doi: 10.1145/3065386. DOI

Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv. 20141409.1556

Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A. Going deeper with convolutions; Proceedings of the IEEE Conference on Computer Vision and Pattern recognition; Boston, MA, USA. 7–12 June 2015; pp. 1–9.

He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 770–778.

Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. Densely connected convolutional networks; Proceedings of the IEEE Conference on Computer Vision and Pattern recognition; Honolulu, HI, USA. 21–26 July 2017; pp. 4700–4708.

Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015;115:211–252. doi: 10.1007/s11263-015-0816-y. DOI

Ligocki A., Jelinek A., Zalud L. Brno Urban Dataset–The New Data for Self-Driving Agents and Mapping Tasks. arXiv. 20191909.06897

Ligocki A., Jelinek A., Zalud L. Atlas Fusion–Modern Framework for Autonomous Agent Sensor Data Fusion. arXiv. 20202010.11991

FLIR Systems, I. FREE FLIR Thermal Dataset for Algorithm Training. [(accessed on 1 June 2020)]; Available online: https://www.flir.com/oem/adas/adas-dataset-form/

FLIR Systems, I. Enhanced San Francisco Dataset. [(accessed on 1 June 2020)]; Available online: https://www.flir.eu/oem/adas/dataset/san-francisco-dataset/

FLIR Systems, I. FLIR European Regional Thermal Dataset for Algorithm Training. [(accessed on 1 June 2020)]; Available online: https://www.flir.eu/oem/adas/dataset/european-regional-thermal-dataset/

Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118. doi: 10.1038/nature21056. PubMed DOI PMC

Hwang S., Park J., Kim N., Choi Y., Kweon I.S. Multispectral Pedestrian Detection: Benchmark Dataset and Baselines; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Boston, MA, USA. 7–12 June 2015.

Torabi A., Massé G., Bilodeau G.A. An iterative integrated framework for thermal–visible image registration, sensor fusion, and people tracking for video surveillance applications. Comput. Vis. Image Underst. 2012;116:210–221. doi: 10.1016/j.cviu.2011.10.006. DOI

Khellal A., Ma H., Fei Q. International Conference on Intelligent Robotics and Applications. Springer; Berlin/Heidelberg, Germany: 2015. Pedestrian classification and detection in far infrared images; pp. 511–522.

Portmann J., Lynen S., Chli M., Siegwart R. People detection and tracking from aerial thermal views; Proceedings of the 2014 IEEE international conference on robotics and automation (ICRA); Hong Kong, China. 31 May–7 June 2014; pp. 1794–1800.

Davis J.W., Sharma V. Background-subtraction using contour-based fusion of thermal and visible imagery. Comput. Vis. Image Underst. 2007;106:162–182. doi: 10.1016/j.cviu.2006.06.010. DOI

Wu Z., Fuller N., Theriault D., Betke M. A thermal infrared video benchmark for visual analysis; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Columbus, OH, USA. 23–28 June 2014; pp. 201–208.

Geiger A., Lenz P., Stiller C., Urtasun R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013;32:1231–1237. doi: 10.1177/0278364913491297. DOI

Yu F., Xian W., Chen Y., Liu F., Liao M., Madhavan V., Darrell T. Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv. 20181805.04687

Huang X., Wang P., Cheng X., Zhou D., Geng Q., Yang R. The apolloscape open dataset for autonomous driving and its application. IEEE Trans. Pattern Anal. Mach. Intell. 2019;42:2702–2719. doi: 10.1109/TPAMI.2019.2926463. PubMed DOI

Maddern W., Pascoe G., Linegar C., Newman P. 1 year, 1000 km: The Oxford RobotCar dataset. Int. J. Robot. Res. 2017;36:3–15. doi: 10.1177/0278364916679498. DOI

Nyberg A. Transforming Thermal Images to Visible Spectrum Images Using Deep Learning. [(accessed on 20 February 2021)]; Available online: https://www.diva-portal.org/smash/get/diva2:1255342/FULLTEXT01.pdf.

Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets. arXiv. 20141406.2661

Zhang L., Gonzalez-Garcia A., van de Weijer J., Danelljan M., Khan F.S. Synthetic data generation for end-to-end thermal infrared tracking. IEEE Trans. Image Process. 2018;28:1837–1850. doi: 10.1109/TIP.2018.2879249. PubMed DOI

Kniaz V.V., Knyaz V.A., Hladuvka J., Kropatsch W.G., Mizginov V. Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset; Proceedings of the European Conference on Computer Vision (ECCV); Munich, Germany. 8–14 September 2018.

Tumas P., Serackis A. Automated image annotation based on YOLOv3; Proceedings of the 2018 IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE); Vilnius, Lithuania. 8–10 November 2018; pp. 1–3.

Ivašić-Kos M., Krišto M., Pobar M. Human detection in thermal imaging using YOLO; Proceedings of the 2019 5th International Conference on Computer and Technology Applications; Istanbul, Turkey. 16–17 April 2019; pp. 20–24.

Gomez A., Conti F., Benini L. Thermal image-based CNN’s for ultra-low power people recognition; Proceedings of the 15th ACM International Conference on Computing Frontiers; Ischia, Italy. 8–10 May 2018; pp. 326–331.

Park J., Chen J., Cho Y.K., Kang D.Y., Son B.J. CNN-based person detection using infrared images for night-time intrusion warning systems. Sensors. 2020;20:34. doi: 10.3390/s20010034. PubMed DOI PMC

Ghose D., Desai S.M., Bhattacharya S., Chakraborty D., Fiterau M., Rahman T. Pedestrian Detection in Thermal Images using Saliency Maps; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Long Beach, CA, USA. 16–20 June 2019; pp. 11–17.

Adam L., Ales J.L.Z. Brno Urban Dataset. [(accessed on 20 January 2021)]; Available online: https://github.com/Robotics-BUT/Brno-Urban-Dataset.

Ultralytics Yolov5. [(accessed on 20 January 2021)]; Available online: https://github.com/ultralytics/yolov5.

Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 779–788.

Girshick R. Fast r-cnn; Proceedings of the IEEE International Conference on Computer Vision; Santiago, Chile. 7–13 December 2015; pp. 1440–1448.

Redmon J., Farhadi A. YOLO9000: Better, faster, stronger; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA. 21–26 July 2017; pp. 7263–7271.

Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., et al. Advances in Neural Information Processing Systems 32. Curran Associates, Inc.; New York, NY, USA: 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library; pp. 8024–8035.

Adam L., Ales J.L.Z. Atlas Fusion. [(accessed on 20 January 2021)]; Available online: https://github.com/Robotics-BUT/Atlas-Fusion.

Merriaux P., Dupuis Y., Boutteau R., Vasseur P., Savatier X. LiDAR point clouds correction acquired from a moving car based on CAN-bus data. arXiv. 20171706.05886

Zhang B., Zhang X., Wei B., Qi C. A Point Cloud Distortion Removing and Mapping Algorithm based on Lidar and IMU UKF Fusion; Proceedings of the 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM); Hong Kong, China. 8–12 July 2019; pp. 966–971.

Lin T.Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L. European Conference on Computer Vision. Springer; Berlin/Heidelberg, Germany: 2014. Microsoft coco: Common objects in context; pp. 740–755.

Jung A.B. imgaug. [(accessed on 30 October 2018)]; Available online: https://github.com/aleju/imgaug.

Zoph B., Cubuk E.D., Ghiasi G., Lin T.Y., Shlens J., Le Q.V. European Conference on Computer Vision. Springer; Berlin/Heidelberg, Germany: 2020. Learning data augmentation strategies for object detection; pp. 566–583.

Oksuz K., Cam B.C., Kalkan S., Akbas E. Imbalance problems in object detection: A review. arXiv. 2019 doi: 10.1109/TPAMI.2020.2981890.1909.00169 PubMed DOI

Everingham M., Van Gool L., Williams C.K., Winn J., Zisserman A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010;88:303–338. doi: 10.1007/s11263-009-0275-4. DOI

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Brno urban dataset: Winter extension

. 2022 Feb ; 40 () : 107667. [epub] 20211203

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...