Most cited article - PubMed ID 32577444
Traffic and log data captured during a cyber defense exercise
The research on using process mining in learning analytics of cybersecurity exercises relies on datasets that reflect the real behavior of trainees. Although modern cyber ranges, in which training sessions are organized, can collect behavioral data in the form of event logs, the organization of such exercises is laborious. Moreover, the collected raw data has to be processed and transformed into a specific format required by process mining techniques. We present two datasets with slightly different characteristics. While the first exercise with 52 participants was not limited in time, the second supervised exercise with 42 trainees lasted two hours. Also, the cybersecurity tasks were slightly different. A total of 11757 events were collected. Of these, 3597 were training progress events, 5669 were Bash commands, and 2491 were Metasploit commands. Joint CSV files distilled from the raw event data can be used as input for existing process mining tools.
- Keywords
- Education, Host-based data collection, Learning analytics, Puzzle-based gamification,
- Publication type
- Journal Article MeSH
We present a dataset that captures seven days of monitoring data from eight servers hosting more than 800 sites across a large campus network. The dataset contains data from network monitoring and host-based monitoring. The first set of data are packet traces collected by a probe situated on the network link in front of the web servers. The traces contain encrypted HTTP over TLS 1.2 communication between clients and web servers. The second set of data is an event log captured directly on the web servers. The events are generated by the Internet Information Services (IIS) logging and include both the IIS default features and custom features, such as client port and transferred data volume. Anonymization of all features in the dataset has been carefully carried out to prevent private information leakage while preserving the information value of the dataset. The dataset is suitable mainly for training machine learning techniques for anomaly detection and the identification of relationships between network traffic and events on web servers. We also add tools, settings, and a guide to convert the packet traces to IP flows that are often preferred for network traffic analysis.
- Keywords
- Encrypted traffic analysis, Event-flow correlation, HTTPS dataset, Host-based data collection, Network data collection, TLS 1.2 encryption,
- Publication type
- Journal Article MeSH
We present a dataset of 13446 shell commands from 175 participants who attended cybersecurity training and solved assignments in the Linux terminal. Each acquired data record contains a command with its arguments and metadata, such as a timestamp, working directory, and host identification in the emulated training infrastructure. The commands were captured in Bash, ZSH, and Metasploit shells. The data are stored as JSON records, enabling vast possibilities for their further use in research and development. These include educational data mining, learning analytics, student modeling, and evaluating machine learning models for intrusion detection. The data were collected from 27 cybersecurity training sessions using an open-source logging toolset and two open-source interactive learning environments. Researchers and developers may use the dataset or deploy the learning environments with the logging toolset to generate their own data in the same format. Moreover, we provide a set of common analytical queries to facilitate the exploratory analysis of the dataset.
- Keywords
- Command-line history, Cybersecurity education, Cybersecurity exercise, Educational data mining, Host-based data collection, Learning analytics, Linux shell, Metasploit,
- Publication type
- Journal Article MeSH