A comprehensive social media data processing and analytics architecture by using big data platforms: a case study of twitter flood-risk messages
Status PubMed-not-MEDLINE Jazyk angličtina Země Německo Médium print-electronic
Typ dokumentu časopisecké články
PubMed
33727982
PubMed Central
PMC7951942
DOI
10.1007/s12145-021-00601-w
PII: 601
Knihovny.cz E-zdroje
- Klíčová slova
- Data extraction, Floods, Hadoop, Social network, Spark,
- Publikační typ
- časopisecké články MeSH
The main objective of the article is to propose an advanced architecture and workflow based on Apache Hadoop and Apache Spark big data platforms. The primary purpose of the presented architecture is collecting, storing, processing, and analysing intensive data from social media streams. This paper presents how the proposed architecture and data workflow can be applied to analyse Tweets with a specific flood topic. The secondary objective, trying to describe the flood alert situation by using only Tweet messages and exploring the informative potential of such data is demonstrated as well. The predictive machine learning approach based on Bayes Theorem was utilized to classify flood and no flood messages. For this study, approximately 100,000 Twitter messages were processed and analysed. Messages were related to the flooding domain and collected over a period of 5 days (14 May - 18 May 2018). Spark application was developed to run data processing commands automatically and to generate the appropriate output data. Results confirmed the advantages of many well-known features of Spark and Hadoop in social media data processing. It was noted that such technologies are prepared to deal with social media data streams, but there are still challenges that one has to take into account. Based on the flood tweet analysis, it was observed that Twitter messages with some considerations are informative enough to be used to estimate general flood alert situations in particular regions. Text analysis techniques proved that Twitter messages contain valuable flood-spatial information.
Zobrazit více v PubMed
Al-Daihani SM, Abrahams A. A text mining analysis of academic libraries’ tweets. J Acad Libr. 2016;42:135–143. doi: 10.1016/j.acalib.2015.12.014. DOI
Alom Z, Carminati B, Ferrari E. A deep learning model for twitter spam detection. Online Soc Netw Media. 2020;18:100079. doi: 10.1016/j.osnem.2020.100079. DOI
Arthur R, Boulton CA, Shotton H, Williams HTP. Social sensing of floods in the UK. PLoS One. 2018;13:1–18. doi: 10.1371/journal.pone.0189327. PubMed DOI PMC
Baesens B, Gestel TV, Viaene S, Stepanova M, Suykens J, Vanthienen J. Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc. 2003;54:627–635. doi: 10.1057/palgrave.jors.2601545. DOI
Bermejo P, Gamez JA, Puerta JM. Improving the performance of Naïve Bayes multinomial in email foldering by introducing distribution-based balance of datasets. Expert Syst Appl. 2011;38:2072–2080. doi: 10.1016/j.eswa.2010.07.146. DOI
Chianese A, Piccialli F. International workshop on Data Mining of Iot Systems (DaMIS): a service oriented framework for analysing social network activities. Procedia Comput Sci. 2016;98:509–514. doi: 10.1016/j.procs.2016.09.087. DOI
Chu Z, Gianvecchio S, Wang H, Jajodia S. Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE T Depend Secure. 2012;9:811–824. doi: 10.1109/TDSC.2012.75. DOI
Crannell WC, Clark E, Jones C, James TA, Moore J. A pattern-matched twitter analysis of US cancer-patient sentiments. J Surg Res. 2016;206:536–542. doi: 10.1016/j.jss.2016.06.050. PubMed DOI
Eilander D, Trambauer P, Wagemaker J, Loenen AV. Harvesting social media for generation of near real-time flood maps. Procedia Eng. 2016;154:176–183. doi: 10.1016/j.proeng.2016.07.441. DOI
Flood Warning Vs. Watch (2020) https://www.weather.gov/safety/flood-watch-warning. Accessed 5 November 2020
Floodlist (2018) USA – Deadly Storms Hit North East, Flash Floods in Maryland. http://floodlist.com/america/usa/usa-storms-north-east-flash-floods-maryland-may-2018.
Flume 1.9.0 User Guide (2020) https://flume.apache.org/FlumeUserGuide.html. Accessed 5 November 2020
Fohringer J, Dransch D, Kreibich H, Schroter K. Social media as an information source for rapid flood inundation mapping. Nat Hazards Earth Syst Sci. 2015;15:2725–2738. doi: 10.5194/nhess-15-2725-2015. DOI
Harzevili NS, Alizadeh SH. Mixture of latent multinomial naive Bayes classifier. Appl Soft Comput. 2018;69:516–527. doi: 10.1016/j.asoc.2018.04.020. DOI
Hill D, Kerkez B, Rasekh A, Ostfeld A, Minsker B, Banks MK (2014) Sensing and cyberinfrastructure for smarter water management: the promise and challenge of ubiquity. J Water Res Pl 140. 10.1061/(ASCE)WR.1943-5452.0000449, 01814002
Huang Q, Xiao Y. Geographic situational awareness: mining tweets for disaster preparedness, emergency response, impact, and recovery. ISPRS Int Geo-Inf. 2015;4:1549–1568. doi: 10.3390/ijgi4031549. DOI
Jiang L, Wang S, Li C, Zhang L. Structure extended multinomial naive Bayes. Inform Sciences. 2016;329:346–356. doi: 10.1016/j.ins.2015.09.037. DOI
Jongman B, Wagemaker J, Romero BR, Perez ECD. Early flood detection for rapid humanitarian response: harnessing near real-time satellite and twitter signals. ISPRS Int J Geo-Information. 2015;4:2246–2266. doi: 10.3390/ijgi4042246. DOI
Kim J, Hastak M. Social network analysis. Int J Inform Manage. 2018;38:86–96. doi: 10.1016/j.ijinfomgt.2017.08.003. DOI
Landwehr PM, Wei W, Kowalchuck M, Carley KM. Using tweets to support disaster planning, warning and response. Safety Sci. 2016;90:33–47. doi: 10.1016/j.ssci.2016.04.012. DOI
Lansley G, Longley PA. The geography of twitter topics in London. Comput Environ Urban Syst. 2016;58:85–96. doi: 10.1016/j.compenvurbsys.2016.04.002. DOI
Lu HC, Hwang FJ, Huang YH. Parallel and distributed architecture of genetic algorithm on apache Hadoop and spark. Appl Soft Comput. 2020;95:106497. doi: 10.1016/j.asoc.2020.106497. DOI
Martin A, Julian ABA, Cos-Gayon F. Analysis of twitter messages using big data tools to evaluate and locate the activity in the city of Valencia (Spain) Cities. 2019;86:37–50. doi: 10.1016/j.cities.2018.12.014. DOI
Martinez-Rojas M, Pardo-Ferreira MDC, Rubio-Romero JC. Twitter as a tool for the management and analysis of emergency situations: a systematic literature review. Int J Inform Manage. 2018;43:196–208. doi: 10.1016/j.ijinfomgt.2018.07.008. DOI
Melo TD, Figueiredo CMS. A first public dataset from Brazilian twitter and news on COVID-19 in Portuguese. Data Brief. 2020;32:106179. doi: 10.1016/j.dib.2020.106179. PubMed DOI PMC
Muralidharan S, Rasmussen L, Patterson D, Shin JH. Hope for Haiti: an analysis of Facebook and twitter usage during the earthquake relief efforts. Public Relat Rev. 2011;37:175–177. doi: 10.1016/j.pubrev.2011.01.010. DOI
National Climate Report - May 2018 (2018) https://www.ncdc.noaa.gov/sotc/national/201805.
Osman AMS. A novel big data analytics framework for smart cities. Future Gener Comp Sy. 2019;91:620–633. doi: 10.1016/j.future.2018.06.046. DOI
Ozdikis O, Oguztuzun H, Karagoz P. A survey on location estimation techniques for events detected in twitter. Knowl Inf Syst. 2017;52:291–339. doi: 10.1007/s10115-016-1007-z. DOI
Ozturk N, Ayvaz S. Sentiment analysis on twitter: a text mining approach to the Syrian refugee crisis. Telemat Inform. 2018;35:136–147. doi: 10.1016/j.tele.2017.10.006. DOI
Pradeep D, Sundar C. QAOC: novel query analysis and ontology-based clustering for data management in Hadoop. Future Gener Comp Sy. 2020;108:849–860. doi: 10.1016/j.future.2020.03.010. DOI
Rossi C, Acerbo FS, Ylinen K, Juga I, Nurmi P, Bosca A, Tarasconi F, Cristoforetti M, Alikadic A. Early detection and information extraction for weather-induced foods using social media streams. Int J Disast Risk Re. 2018;30:145–157. doi: 10.1016/j.ijdrr.2018.03.002. DOI
Schneider S, Check P. Read all about it: the role of the media in improving construction safety and health. J Saf Res. 2010;41:283–287. doi: 10.1016/j.jsr.2010.05.001. PubMed DOI
Shafiee ME, Barker Z, Rasekh A. Enhancing water system models by integrating big data. Sustain Cities Soc. 2018;37:485–491. doi: 10.1016/j.scs.2017.11.042. DOI
Simon T, Goldberg A, Adini B. Socializing in emergencies – a review of the use of social media in emergency situations. Int J Inf Manag. 2015;35:609–619. doi: 10.1016/j.ijinfomgt.2015.07.001. DOI
Son J, Lee J, Oh O, Lee HK, Woo J. Using a heuristic-systematic model to assess the twitter user profile’s impact on disaster tweet credibility. Int J Inform Manage. 2020;54:102176. doi: 10.1016/j.ijinfomgt.2020.102176. DOI
Storm Prediction Center (2018) https://www.spc.noaa.gov/exper/archive/event.php?date=20180514.
Tallada P, Carretero J, Casals J, Acosta-Silva C, Serrano S, Caubet M, Castander FJ, Cesar E, Crocce M, Delfino M, Eriksen M, Fosalba P, Gaztanaga E, Merino G, Neissner C, Tonello N. CosmoHub: interactive exploration and distribution of astronomical data on Hadoop. Astron Comput. 2020;32:100391. doi: 10.1016/j.ascom.2020.100391. DOI
Twitter Developer (2020) https://developer.twitter.com/en/docs/tutorials.
Twitter User Data (2020) An In-Depth Look at the Most Active Twitter User Data. https://sysomos.com/inside-twitter/most-active-twitter-user-data.
Vera-Burgos CM, Padgett DRG. Using twitter for crisis communications in a natural disaster: hurricane Harvey. Heliyon. 2020;6:e04804. doi: 10.1016/j.heliyon.2020.e04804. PubMed DOI PMC
Wang RQ, Mao H, Wang Y, Rae C, Shaw W. Hyper-resolution monitoring of urban flooding with social media and crowdsourcing data. Comput Geosci. 2018;111:139–147. doi: 10.1016/j.cageo.2017.11.008. DOI
Wang Y, Hao H, Platt LS. Examining risk and crisis communications of government agencies and stakeholders during early-stages of COVID-19 on twitter. Comput Hum Behav. 2021;114:106568. doi: 10.1016/j.chb.2020.106568. PubMed DOI PMC
World Cities Database (2020) https://simplemaps.com/data/world-cities.
Yaqub U, Chun SA, Atluri V, Vaidya J. Analysis of political discourse on twitter in the context of the 2016 US presidential elections. Gov Inform Q. 2017;34:613–626. doi: 10.1016/j.giq.2017.11.001. DOI
Yoo E, Rand W, Eftekhar M, Rabinovich E. Evaluating information diffusion speed and its determinants in social media networks during humanitarian crises. J Oper Manag. 2016;45:123–133. doi: 10.1016/j.jom.2016.05.007. DOI
Zhang YC, Sakhanenko L. The naive Bayes classifier for functional data. Stat Probab Lett. 2019;152:137–146. doi: 10.1016/j.spl.2019.04.017. DOI
Zvara Z, Szabo PGN, Balazs B, Benczur A. Optimizing distributed data stream processing by tracing. Future Gener Comp Sy. 2019;90:578–591. doi: 10.1016/j.future.2018.06.047. DOI