Most cited article - PubMed ID 34621929
Dataset of shell commands used by participants of hands-on cybersecurity training
The research on using process mining in learning analytics of cybersecurity exercises relies on datasets that reflect the real behavior of trainees. Although modern cyber ranges, in which training sessions are organized, can collect behavioral data in the form of event logs, the organization of such exercises is laborious. Moreover, the collected raw data has to be processed and transformed into a specific format required by process mining techniques. We present two datasets with slightly different characteristics. While the first exercise with 52 participants was not limited in time, the second supervised exercise with 42 trainees lasted two hours. Also, the cybersecurity tasks were slightly different. A total of 11757 events were collected. Of these, 3597 were training progress events, 5669 were Bash commands, and 2491 were Metasploit commands. Joint CSV files distilled from the raw event data can be used as input for existing process mining tools.
- Keywords
- Education, Host-based data collection, Learning analytics, Puzzle-based gamification,
- Publication type
- Journal Article MeSH
Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees' interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees' learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.
- Keywords
- Cybersecurity education, Data science, Educational data mining, Learning analytics, Security training,
- Publication type
- Journal Article MeSH
We present a dataset of 13446 shell commands from 175 participants who attended cybersecurity training and solved assignments in the Linux terminal. Each acquired data record contains a command with its arguments and metadata, such as a timestamp, working directory, and host identification in the emulated training infrastructure. The commands were captured in Bash, ZSH, and Metasploit shells. The data are stored as JSON records, enabling vast possibilities for their further use in research and development. These include educational data mining, learning analytics, student modeling, and evaluating machine learning models for intrusion detection. The data were collected from 27 cybersecurity training sessions using an open-source logging toolset and two open-source interactive learning environments. Researchers and developers may use the dataset or deploy the learning environments with the logging toolset to generate their own data in the same format. Moreover, we provide a set of common analytical queries to facilitate the exploratory analysis of the dataset.
- Keywords
- Command-line history, Cybersecurity education, Cybersecurity exercise, Educational data mining, Host-based data collection, Learning analytics, Linux shell, Metasploit,
- Publication type
- Journal Article MeSH