JavaScript NENÍ povolen !

Apache Spark Dotaz Zobrazit nápovědu

2 záznamů v PubMed

Článek

A comprehensive social media data processing and analytics architecture by using big data platforms: a case study of twitter flood-risk messages

The main objective of the article is to propose an advanced architecture and workflow based on Apache ...

Podhoranyi, Michal
Autor Podhoranyi, Michal ORCID IT4Innovations - VSB Technical University, 17.listopadu 15, 70833 Ostrava, Czech Republic

Earth science informatics. 2021 ; 14 (2) : 913-929. [epub] 20210311

Earth Sci Inform
ISSN 1865-0473
Zdroj

The main objective of the article is to propose an advanced architecture and workflow based on Apache Hadoop and Apache Spark big data platforms. The primary purpose of the presented architecture is collecting, storing, processing, and analysing intensive data from social media streams. This paper presents how the proposed architecture and data workflow can be applied to analyse Tweets with a specific flood topic. The secondary objective, trying to describe the flood alert situation by using only Tweet messages and exploring the informative potential of such data is demonstrated as well. The predictive machine learning approach based on Bayes Theorem was utilized to classify flood and no flood messages. For this study, approximately 100,000 Twitter messages were processed and analysed. Messages were related to the flooding domain and collected over a period of 5 days (14 May - 18 May 2018). Spark application was developed to run data processing commands automatically and to generate the appropriate output data. Results confirmed the advantages of many well-known features of Spark and Hadoop in social media data processing. It was noted that such technologies are prepared to deal with social media data streams, but there are still challenges that one has to take into account. Based on the flood tweet analysis, it was observed that Twitter messages with some considerations are informative enough to be used to estimate general flood alert situations in particular regions. Text analysis techniques proved that Twitter messages contain valuable flood-spatial information.

Klíčová slova
Data extraction, Floods, Hadoop, Social network, Spark,
Publikační typ
časopisecké články MeSH

Článek

META-pipe cloud setup and execution

F1000Research. 2017 ; 6 () : . [epub] 20171129

F1000Res
ISSN 2046-1402
Zdroj

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.

Klíčová slova
AAI federation, Amazon Web Services, Apache Spark, EGI Federated Cloud, ELIXIR, META-pipe, OpenStack, Portability,
Publikační typ
časopisecké články MeSH

Publikováno

Filtry

Apache Spark Dotaz Zobrazit nápovědu

Apache Spark Dotaz Zobrazit nápovědu

Upřesnit dle MeSH