CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

. 2025 Feb 26 ; 12 (1) : 338. [epub] 20250226

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid40011509

Grantová podpora
VJ02010024 Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
VJ02010024 Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
VJ02010024 Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
VJ02010024 Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
LM2023054 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
ID:90254 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
SGS23/207/OHK3/3T/18 České Vysoké Učení Technické v Praze (Czech Technical University in Prague)

Odkazy

PubMed 40011509
PubMed Central PMC11865510
DOI 10.1038/s41597-025-04603-x
PII: 10.1038/s41597-025-04603-x
Knihovny.cz E-zdroje

Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. Most approaches to anomaly detection use methods based on forecasting. Extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing overestimation of anomaly detection algorithm performance and fabricating the illusion of progress. This manuscript tackles this issue by introducing a comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network-an ISP network serving approximately half a million customers daily. It captures the behavior of diverse network entities, reflecting the variability typical of an ISP environment. This variability provides a realistic and challenging environment for developing forecasting and anomaly detection models, enabling evaluations that are closer to real-world deployment scenarios. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

Zobrazit více v PubMed

D’Alconzo, A., Drago, I., Morichetta, A., Mellia, M. & Casas, P. A survey on big data for network traffic monitoring and analysis. IEEE Transactions on Network and Service Management16, 800–813 (2019).

Aceto, G., Ciuonzo, D., Montieri, A. & Pescapé, A. Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE transactions on network and service management16, 445–458 (2019).

Koumar, J., Hynek, K., Pešek, J. & Čejka, T. Nettisa: Extended ip flow with time-series features for universal bandwidth-constrained high-speed network traffic classification. Computer Networks240, 110147 (2024).

Akbari, I. et al. Traffic classification in an increasingly encrypted web. Communications of the ACM65, 75–83 (2022).

Luxemburk, J. & Čejka, T. Fine-grained tls services classification with reject option. Computer Networks220, 109467 (2023).

Plny`, R., Hynek, K. & Čejka, T. Decrypto: Finding cryptocurrency miners on isp networks. In Nordic Conference on Secure IT Systems, 139–158 (Springer, 2022).

Guerra, J. L., Catania, C. & Veas, E. Datasets are not enough: Challenges in labeling network traffic. Computers & Security120, 102810 (2022).

Yaacob, A. H., Tan, I. K., Chien, S. F. & Tan, H. K. Arima based network anomaly detection. In 2010 Second International Conference on Communication Software and Networks, 205–209 (IEEE, 2010).

Andrysiak, T., Saganowski, Ł., Choraś, M. & Kozik, R. Network traffic prediction and anomaly detection based on arfima model. In International Joint Conference SOCO’14-CISIS’14-ICEUTE’14: Bilbao, Spain, June 25th-27th, 2014, Proceedings, 545–554 (Springer, 2014).

Wu, R. & Keogh, E. J. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE transactons on knowledge and data engineering35, 2421–2429 (2021).

Ferreira, G. O., Ravazzi, C., Dabbene, F., Calafiore, G. C. & Fiore, M. Forecasting network traffic: A survey and tutorial with open-source comparative evaluation. IEEE Access11, 6018–6044 (2023).

Cho, K., Mitsuya, K. & Kato, A. Traffic data repository at the {WIDE} project. In 2000 USENIX Annual Technical Conference (USENIX ATC 00) (2000).

Fontugne, R., Borgnat, P., Abry, P. & Fukuda, K. MAWILab: Combining Diverse Anomaly Detectors for Automated Anomaly Labeling and Performance Benchmarking. In ACM CoNEXT ’10 (Philadelphia, PA, 2010).

Koumar, J., Hynek, K., Čejka, T. & Šiška, P. Cesnet-timeseries-2023-2024: The dataset for network traffic forecasting and anomaly detection 10.5281/zenodo.13382427 (2024).

Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: A survey. ACM computing surveys (CSUR)41, 1–58 (2009).

Basdekidou, V. A. The momentum & trend-reversal as temporal market anomalies. International Journal of Economics and Finance9, 1–19 (2017).

Aitken, P., Claise, B. & Trammell, B. Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information. RFC 7011 https://www.rfc-editor.org/info/rfc7011 (2013).

Staniford, S., Hoagland, J. A. & McAlerney, J. M. Practical automated detection of stealthy portscans. Journal of Computer Security10, 105–136 (2002).

Bhuyan, M. H., Bhattacharyya, D. K. & Kalita, J. K. Surveying port scans and their detection methodologies. The Computer Journal54, 1565–1581 (2011).

Wu, S.-F., Chang, C.-Y. & Lee, S.-J. Time series forecasting with missing values. In 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom), 151–156 (IEEE, 2015).

Tawn, R., Browell, J. & Dinwoodie, I. Missing data in wind farm time series: Properties and effect on forecasts. Electric Power Systems Research189, 106640 (2020).

Benes, T., Pesek, J. & Cejka, T. Look at my network: An insight into the isp backbone traffic. In 2023 19th International Conference on Network and Service Management (CNSM), 1–7 (IEEE, 2023).

Žliobaitė, I., Pechenizkiy, M. & Gama, J. An overview of concept drift applications. Big data analysis: new algorithms for a new society 91–114 (2016).

Chicco, D., Warrens, M. J. & Jurman, G. The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation. Peerj computer science7, e623 (2021). PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...