CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
VJ02010024
Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
VJ02010024
Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
VJ02010024
Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
VJ02010024
Ministerstvo Vnitra České Republiky (Ministry of the Interior of the Czech Republic)
LM2023054
Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
ID:90254
Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
SGS23/207/OHK3/3T/18
České Vysoké Učení Technické v Praze (Czech Technical University in Prague)
PubMed
40011509
PubMed Central
PMC11865510
DOI
10.1038/s41597-025-04603-x
PII: 10.1038/s41597-025-04603-x
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. Most approaches to anomaly detection use methods based on forecasting. Extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing overestimation of anomaly detection algorithm performance and fabricating the illusion of progress. This manuscript tackles this issue by introducing a comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network-an ISP network serving approximately half a million customers daily. It captures the behavior of diverse network entities, reflecting the variability typical of an ISP environment. This variability provides a realistic and challenging environment for developing forecasting and anomaly detection models, enabling evaluations that are closer to real-world deployment scenarios. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.
CESNET Generála Píky 430 26 160 00 Prague 6 Czech Republic
Czech Technical University Prague Thákurova 9 160 00 Prague 6 Czech Republic
Zobrazit více v PubMed
D’Alconzo, A., Drago, I., Morichetta, A., Mellia, M. & Casas, P. A survey on big data for network traffic monitoring and analysis. IEEE Transactions on Network and Service Management16, 800–813 (2019).
Aceto, G., Ciuonzo, D., Montieri, A. & Pescapé, A. Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE transactions on network and service management16, 445–458 (2019).
Koumar, J., Hynek, K., Pešek, J. & Čejka, T. Nettisa: Extended ip flow with time-series features for universal bandwidth-constrained high-speed network traffic classification. Computer Networks240, 110147 (2024).
Akbari, I. et al. Traffic classification in an increasingly encrypted web. Communications of the ACM65, 75–83 (2022).
Luxemburk, J. & Čejka, T. Fine-grained tls services classification with reject option. Computer Networks220, 109467 (2023).
Plny`, R., Hynek, K. & Čejka, T. Decrypto: Finding cryptocurrency miners on isp networks. In Nordic Conference on Secure IT Systems, 139–158 (Springer, 2022).
Guerra, J. L., Catania, C. & Veas, E. Datasets are not enough: Challenges in labeling network traffic. Computers & Security120, 102810 (2022).
Yaacob, A. H., Tan, I. K., Chien, S. F. & Tan, H. K. Arima based network anomaly detection. In 2010 Second International Conference on Communication Software and Networks, 205–209 (IEEE, 2010).
Andrysiak, T., Saganowski, Ł., Choraś, M. & Kozik, R. Network traffic prediction and anomaly detection based on arfima model. In International Joint Conference SOCO’14-CISIS’14-ICEUTE’14: Bilbao, Spain, June 25th-27th, 2014, Proceedings, 545–554 (Springer, 2014).
Wu, R. & Keogh, E. J. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE transactons on knowledge and data engineering35, 2421–2429 (2021).
Ferreira, G. O., Ravazzi, C., Dabbene, F., Calafiore, G. C. & Fiore, M. Forecasting network traffic: A survey and tutorial with open-source comparative evaluation. IEEE Access11, 6018–6044 (2023).
Cho, K., Mitsuya, K. & Kato, A. Traffic data repository at the {WIDE} project. In 2000 USENIX Annual Technical Conference (USENIX ATC 00) (2000).
Fontugne, R., Borgnat, P., Abry, P. & Fukuda, K. MAWILab: Combining Diverse Anomaly Detectors for Automated Anomaly Labeling and Performance Benchmarking. In ACM CoNEXT ’10 (Philadelphia, PA, 2010).
Koumar, J., Hynek, K., Čejka, T. & Šiška, P. Cesnet-timeseries-2023-2024: The dataset for network traffic forecasting and anomaly detection 10.5281/zenodo.13382427 (2024).
Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: A survey. ACM computing surveys (CSUR)41, 1–58 (2009).
Basdekidou, V. A. The momentum & trend-reversal as temporal market anomalies. International Journal of Economics and Finance9, 1–19 (2017).
Aitken, P., Claise, B. & Trammell, B. Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information. RFC 7011 https://www.rfc-editor.org/info/rfc7011 (2013).
Staniford, S., Hoagland, J. A. & McAlerney, J. M. Practical automated detection of stealthy portscans. Journal of Computer Security10, 105–136 (2002).
Bhuyan, M. H., Bhattacharyya, D. K. & Kalita, J. K. Surveying port scans and their detection methodologies. The Computer Journal54, 1565–1581 (2011).
Wu, S.-F., Chang, C.-Y. & Lee, S.-J. Time series forecasting with missing values. In 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom), 151–156 (IEEE, 2015).
Tawn, R., Browell, J. & Dinwoodie, I. Missing data in wind farm time series: Properties and effect on forecasts. Electric Power Systems Research189, 106640 (2020).
Benes, T., Pesek, J. & Cejka, T. Look at my network: An insight into the isp backbone traffic. In 2023 19th International Conference on Network and Service Management (CNSM), 1–7 (IEEE, 2023).
Žliobaitė, I., Pechenizkiy, M. & Gama, J. An overview of concept drift applications. Big data analysis: new algorithms for a new society 91–114 (2016).
Chicco, D., Warrens, M. J. & Jurman, G. The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation. Peerj computer science7, e623 (2021). PubMed PMC