Network traffic Dotaz Zobrazit nápovědu
Encryption of network traffic should guarantee anonymity and prevent potential interception of information. Encrypted virtual private networks (VPNs) are designed to create special data tunnels that allow reliable transmission between networks and/or end users. However, as has been shown in a number of scientific papers, encryption alone may not be sufficient to secure data transmissions in the sense that certain information may be exposed. Our team has constructed a large dataset that contains generated encrypted network traffic data. This dataset contains a general network traffic model consisting of different types of network traffic such as web, emailing, video conferencing, video streaming, and terminal services. For the same network traffic model, data are measured for different scenarios, i.e., for data traffic through different types of VPNs and without VPNs. Additionally, the dataset contains the initial handshake of the VPN connections. The dataset can be used by various data scientists dealing with the classification of encrypted network traffic and encrypted VPNs.
- Klíčová slova
- IP flow, IPFIX, Machine Learning, Network traffic, SSTP, OpenVPN, Wireguard,
- Publikační typ
- časopisecké články MeSH
In the past decade, Long-Range Wire-Area Network (LoRaWAN) has emerged as one of the most widely adopted Low Power Wide Area Network (LPWAN) standards. Significant efforts have been devoted to optimizing the operation of this network. However, research in this domain heavily relies on simulations and demands high-quality real-world traffic data. To address this need, we monitored and analyzed LoRaWAN traffic in four European cities, making the obtained data and post-processing scripts publicly available. For monitoring purposes, we developed an open-source sniffer capable of capturing all LoRaWAN communication within the EU868 band. Our analysis discovered significant issues in current LoRaWAN deployments, including violations of fundamental security principles, such as the use of default and exposed encryption keys, potential breaches of spectrum regulations including duty cycle violations, SyncWord issues, and misaligned Class-B beacons. This misalignment can render Class-B unusable, as the beacons cannot be validated. Furthermore, we enhanced Wireshark's LoRaWAN protocol dissector to accurately decode recorded traffic. Additionally, we proposed the passive reception of Class-B beacons as an alternative timebase source for devices operating within LoRaWAN coverage under the assumption that the issue of misaligned beacons can be addressed or mitigated in the future. The identified issues and the published dataset can serve as valuable resources for researchers simulating real-world traffic and for the LoRaWAN Alliance to enhance the standard to facilitate more reliable Class-B communication.
- Klíčová slova
- Class-B, IoT, LoRa, LoRaWAN, dataset, network sniffer, time synchronization, traffic monitoring,
- Publikační typ
- časopisecké články MeSH
The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies.
- Klíčová slova
- Encrypted traffic, Network monitoring, QUIC, Traffic classification,
- Publikační typ
- časopisecké články MeSH
Recently, the Internet has adopted the DNS over HTTPS (DoH) resolution mechanism for privacy-aware network applications. As DoH becomes more disseminated, it has also become a network monitoring research topic. For comprehensive evaluation and comparison of developed classifiers, real-world datasets are needed, motivating this contribution. We created a new large-scale collection of datasets consisting of two classes of traffic: i) DoH HTTPS communication and ii) non-DoH HTTPS connections. The DoH traffic is captured for multiple DoH providers and clients to include nuances of various DoH implementations and configurations. The non-DoH HTTPS connections complement the DoH communication aiming to include a wide range of existing network applications. The dataset collection consists of network traffic generated in a controlled environment and traffic captured from a real ISP network. The resulting datasets thus provide real-world network traffic data suitable for evaluating existing classifiers and the development of new methods.
- Klíčová slova
- Computer, DNS, DNS over HTTPS, HTTPS, Monitoring, Network, Network traffic,
- Publikační typ
- časopisecké články MeSH
Cybersecurity research relies on relevant datasets providing researchers a snapshot of network traffic generated by current users and modern applications and services. The lack of datasets coming from a realistic network environment leads to inefficiency of newly designed methods that are not useful in practice. This data article provides network traffic flows and event logs (Linux and Windows) from a two-day cyber defense exercise involving attackers, defenders, and fictitious users operating in a virtual exercise network. The data are stored as structured JSON, including data schemes and data dictionaries, ready for direct processing. Network topology of the exercise network in NetJSON format is also provided.
- Klíčová slova
- Cyber defense exercise, Cybersecurity, Event log, KYPO, Network flow, Network traffic, Syslog,
- Publikační typ
- časopisecké články MeSH
We present a dataset that captures seven days of monitoring data from eight servers hosting more than 800 sites across a large campus network. The dataset contains data from network monitoring and host-based monitoring. The first set of data are packet traces collected by a probe situated on the network link in front of the web servers. The traces contain encrypted HTTP over TLS 1.2 communication between clients and web servers. The second set of data is an event log captured directly on the web servers. The events are generated by the Internet Information Services (IIS) logging and include both the IIS default features and custom features, such as client port and transferred data volume. Anonymization of all features in the dataset has been carefully carried out to prevent private information leakage while preserving the information value of the dataset. The dataset is suitable mainly for training machine learning techniques for anomaly detection and the identification of relationships between network traffic and events on web servers. We also add tools, settings, and a guide to convert the packet traces to IP flows that are often preferred for network traffic analysis.
- Klíčová slova
- Encrypted traffic analysis, Event-flow correlation, HTTPS dataset, Host-based data collection, Network data collection, TLS 1.2 encryption,
- Publikační typ
- časopisecké články MeSH
Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. Most approaches to anomaly detection use methods based on forecasting. Extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing overestimation of anomaly detection algorithm performance and fabricating the illusion of progress. This manuscript tackles this issue by introducing a comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network-an ISP network serving approximately half a million customers daily. It captures the behavior of diverse network entities, reflecting the variability typical of an ISP environment. This variability provides a realistic and challenging environment for developing forecasting and anomaly detection models, enabling evaluations that are closer to real-world deployment scenarios. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.
- Publikační typ
- časopisecké články MeSH
The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.
- Publikační typ
- časopisecké články MeSH
AIM: The susceptibility of children to polluted air has been pointed out several times in the past. Generally, children suffer from higher exposure to air pollutants than adults because of their higher physical activity, higher metabolic rate and the resultant increase in minute ventilation. The aim of this study was to examine the exposure characteristics of public elementary schools in Prague (the capital of the Czech Republic). METHODS: The exposure was examined by two different methods: by the proximity of selected schools to major urban roads and their location within the modeled urban PM10 concentration fields. We determined average daily traffic counts for all roads within 300 m of 251 elementary schools using the national road network database and geographic information system and calculated by means of GIS tools the proximity of the schools to the roads. In the second method we overlapped the GIS layer of predicted annual urban PM10 concentration field with that of geocoded school addresses. RESULTS: The results showed that 208 Prague schools (almost 80%) are situated in a close proximity (<300 m) of roads exhibiting high traffic loads. Both methods showed good agreement in the proportion of highly exposed schools at risk; however, we found significant differences in the locations of schools at risk determined by the two methods. CONCLUSION: We argue that results of similar proximity studies should be treated with caution before they are used in risk based decision-making process, since different methods may provide different outcomes.
- Klíčová slova
- GIS, health effects, particulate matter, schools, traffic density, traffic pollution,
- MeSH
- dítě MeSH
- geografické informační systémy MeSH
- hodnocení rizik MeSH
- lidé MeSH
- městské obyvatelstvo MeSH
- mladiství MeSH
- školy * MeSH
- výfukové emise vozidel analýza MeSH
- vystavení vlivu životního prostředí analýza MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Česká republika MeSH
- Názvy látek
- výfukové emise vozidel MeSH
We have developed a Recurrent Neural Network (RNN)-based phase picker for data obtained from a local seismic monitoring array specifically designated for induced seismicity analysis. The proposed algorithm was rigorously tested using real-world data from a network encompassing nine three-component stations. The algorithm is designed for multiple monitoring of repeated injection within the permanent array. For such an array, the RNN is initially trained on a foundational dataset, enabling the trained algorithm to accurately identify other induced events even if they occur in different regions of the array. Our RNN-based phase picker achieved an accuracy exceeding 80% for arrival time picking when compared to precise manual picking techniques. However, the event locations (based on the arrival picking) had to be further constrained to avoid false arrival picks. By utilizing these refined arrival times, we were able to locate seismic events and assess their magnitudes. The magnitudes of events processed automatically exhibited a discrepancy of up to 0.3 when juxtaposed with those derived from manual processing. Importantly, the efficacy of our results remains consistent irrespective of the specific training dataset employed, provided that the dataset originates from within the network.
- Klíčová slova
- Recurrent Neural Network, automatic arrival time detection, hydraulic fracturing, induced seismicity, location, magnitude, traffic light system,
- Publikační typ
- časopisecké články MeSH