• This record comes from PubMed

Student assessment in cybersecurity training automated by pattern mining and clustering

. 2022 ; 27 (7) : 9231-9262. [epub] 20220330

Status PubMed-not-MEDLINE Language English Country Netherlands Media print-electronic

Document type Journal Article

Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees' interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees' learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.

See more in PubMed

Abbott RG, McClain. Anderson B, Nauer K, Silva A, Forsythe C. Log analysis of cyber security training exercises. Procedia Manufacturing. 2015;3:5088–5094. doi: 10.1016/j.promfg.2015.07.523. DOI

Aggarwal, C.C. , Hinneburg A., & Keim D.A. (2001). On the surprising behavior of distance metrics in high dimensional space. In J. Van den Bussche V. Vianu (Eds.) Database Theory — ICDT 2001. 10.1007/3-540-44503-X_27 (pp. 420–434). Berlin, Heidelberg: Springer.

Andreolini M, Colacino VG, Colajanni M, Marchetti M. A framework for the evaluation of trainee performance in cyber range exercises. Mobile Networks and Applications. 2019;25:236–247. doi: 10.1007/s11036-019-01442-0. DOI

Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: Ordering points to identify the clustering structure. SIGMOD Record. 1999;28(2):49–60. doi: 10.1145/304181.304187. DOI

Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “nearest neighbor” meaningful? In C. Beeri P. Buneman (Eds.) Database Theory — ICDT’99. 10.1007/3-540-49257-7_15 (pp. 217–235). Berlin, Heidelberg: Springer.

Bienkowski, M., Feng, M., & Means, B. (2012). Enhancing teaching and learning through educational data mining and learning analytics: An issue brief. US Department of Education, Office of Educational Technology, 1, 1–57. https://files.eric.ed.gov/fulltext/ED611199.pdf

Birant D, Kut A. ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering. 2007;6(1):208–221. doi: 10.1016/j.datak.2006.01.013. DOI

CC2020 Task Force . Computing Curricula 2020: Paradigms for global computing education. New York NY, USA: Association for Computing Machinery; 2020.

Dutt A, Ismail MA, Herawan T. A systematic review on educational data mining. IEEE Access. 2017;5:15991–16005. doi: 10.1109/ACCESS.2017.2654247. DOI

Emerson, A., Smith, A., Rodriguez, F.J., Wiebe, E.N., Mott, B.W., Boyer, K.E., & Lester, J.C. (2020). Cluster-based analysis of novice coding misconceptions in block-based programming. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 10.1145/3328778.3366924 (pp. 825–831). New York, NY, USA: Association for Computing Machinery.

Fournier-Viger, P. (2013a). How to auto-adjust the minimum support threshold according to the data size. Retrieved February 9, 2022 from http://data-mining.philippe-fournier-viger.com/how-to-auto-adjust-the-minimum-support-threshold-according-to-the-data-size/

Fournier-Viger, P. (2013b). An introduction to frequent pattern mining. Retrieved February 9, 2022 from http://data-mining.philippe-fournier-viger.com/introduction-frequent-pattern-mining/

Fournier-Viger, P. (2017). An introduction to data mining. Retrieved February 9, 2022 from http://data-mining.philippe-fournier-viger.com/introduction-data-mining/

Fournier-Viger, P. (2021a). Datasets. Retrieved February 9, 2022 from https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

Fournier-Viger, P. (2021b). SPMF: An open-source data mining library. Retrieved February 9, 2022 from https://www.philippe-fournier-viger.com/spmf/

Fournier-Viger, P., Gomariz, A., Campos, M., & Thomas, R. (2014). Fast vertical mining of sequential patterns using co-occurrence information. In Advances in knowledge discovery and data mining. 10.1007/978-3-319-06608-0_4(pp. 40–52). Springer International Publishing.

Fournier-Viger, P. , Lin, J.C- W., Gomariz, A. , Gueniche, T. , Soltani, A., Deng, Z., & Lam, H.T. (2016). The SPMF open-source data mining library Version 2. In Machine learning and knowledge discovery in databases. 10.1007/978-3-319-46131-1_8 (pp. 36–40). Springer International Publishing.

Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R. A survey of sequential pattern mining. Data Science and Pattern Recognition. 2017;1(1):54–77.

Fournier-Viger, P. , Wu, C-W., & Tseng, V.S. (2012). Mining Top-K association rules. In Advances in artificial intelligence. 10.1007/978-3-642-30353-1_6 (pp. 61–73). Springer Berlin Heidelberg.

Fumarola F, Lanotte PF, Ceci M, Malerba D. CloFAST: Closed sequential pattern mining using sparse and vertical Id-Lists. Knowledge and Information Systems. 2016;48(2):429–463. doi: 10.1007/s10115-015-0884-x. DOI

Gao, G., Marwan, S., & Price, T.W. (2021). Early performance prediction using interpretable patterns in programming process data. In Proceedings of the 52nd ACM technical symposium on computer science education. 10.1145/3408877.3432439 (pp. 342–348). New York NY, USA: Association for Computing Machinery.

García, E., Romero, C., Ventura, S., de Castro, C., & Calders, T. (2010). Association rule mining in learning management systems. In C. Romero, S. Ventura, M. Pechenizkiy, & R.S. Baker (Eds.) Handbook of educational data mining. 10.1201/b10274 (pp. 93–103). Boca Raton, FL, USA: CRC Press.

Granåsen M, Andersson D. Measuring team effectiveness in cyber-defense exercises: a cross-disciplinary case study. Cognition, Technology & Work. 2016;18(1):121–143. doi: 10.1007/s10111-015-0350-2. DOI

Han J, Kamber M, Pei J. Data mining: Concepts and techniques. 3rd edn. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 2011.

Henshel, D.S., Deckard, G.M., Lufkin, B., Buchler, N., Hoffman, B., Rajivan, P., & Collman, S. (2016). Predicting proficiency in cyber defense team exercises. In MILCOM 2016 – IEEE military communications conference. 10.1109/MILCOM.2016.7795423 (pp. 776–781). New York, NY, USA: IEEE.

(ISC)2. (2021). Cybersecurity workforce study (Tech. Rep.) https://www.isc2.org/Research/Workforce-Study

Joint Task Force on Cybersecurity Education. (2017). Cybersecurity curricular guideline. Retrieved February 9, 2022 from http://cybered.acm.org/

Kobayashi Y. Computer-aided error analysis of L2 spoken English: A data mining approach. Proceedings of the Conference on Language and Technology. 2014;2014:127–134.

Labuschagne, W.A., & Grobler, M. (2017). Developing a capability to classify technical skill levels within a cyber range. In 16th European conference on cyber warfare and security, ECCWS 2017. https://www.proquest.com/docview/1966803837(pp. 224–234). Red Hook NY, USA: Curran Associates Inc.

Lancaster, T., Robins, A.V., & Fincher, S.A. (2019). Assessment and plagiarism. In S. A. Fincher A. V. Robins (Eds.) The cambridge handbook of computing education research. 10.1017/9781108654555.015 (pp. 414–444). Cambridge, United Kingdom: Cambridge University Press.

Lang, C., Wise, A., & Gašević, D. (Eds). (2017). Handbook of learning analytics (1st ed) Society for Learning Analytics Research (SoLAR). 10.18608/hla17

Lloyd S. Least squares quantization in PCM. IEEE Transactions on Information Theory. 1982;28(2):129–137. doi: 10.1109/TIT.1982.1056489. DOI

Madhulatha TS. An overview on clustering methods. IOSR Journal of Engineering. 2012;2(4):719–725. doi: 10.9790/3021-0204719725. DOI

Maennel, K. (2020). Learning analytics perspective: Evidencing learning from digital datasets in cybersecurity exercises. In 2020 IEEE european symposium on security and privacy workshops (EuroSPW). 10.1109/EuroSPW51379.2020.00013 (pp. 27–36).

Maennel, K., Ottis, R., & Maennel, O. (2017). Improving and measuring learning effectiveness at cyber defense exercises. In 22nd nordic conference on secure IT systems, NordSec 2017. 10.1007/978-3-319-70290-2_8 (pp. 123–138). Vienna, Austria: Springer.

Malekian, D., Bailey, J., & Kennedy, G. (2020). Prediction of students’ assessment readiness in online learning environments: The sequence matters. In Proceedings of the tenth international conference on learning analytics & knowledge. 10.1145/3375462.3375468 (pp. 382–391). New York NY, USA: Association for Computing Machinery.

Masaryk University. (2021). KYPO cyber range platform. Retrieved February 9, 2022 from https://crp.kypo.muni.cz

Masaryk University. (2022a). Cyber sandbox creator. Retrieved February 9, 2022 from https://gitlab.ics.muni.cz/muni-kypo-csc/cyber-sandbox-creator

Masaryk University. (2022b). The listing of all cybersecurity games and common instructions. Retrieved February 9, 2022 from https://gitlab.ics.muni.cz/muni-kypo-trainings/games/all-games-index

McBroom, J., Jeffries, B., Koprinska, I., & Yacef, K. (2016). Mining behaviours of students in autograding submission system logs. In Proceedings of the 9th international conference on educational data mining. https://www.educationaldatamining.org/EDM2016/proceedings/paper_172.pdf

McCall D, Kölling M. A new look at novice programmer errors. ACM Transactions on Computing Education. 2019;19(4):38:1–38:30. doi: 10.1145/3335814. DOI

McClain J, Silva A, Emmanuel G, Anderson B, Nauer K, Abbott R, Forsythe C. Human performance factors in cyber security forensic analysis. Procedia Manufacturing. 2015;3:5301–5307. doi: 10.1016/j.promfg.2015.07.621. DOI

Mirkovic, J., Aggarwal, A., Weinman, D., Lepe, P., Mache, J., & Weiss, R. (2020). Using terminal histories to monitor student progress on hands-on exercises. In Proceedings of the 51st ACM technical symposium on computer science education. 10.1145/3328778.3366935 (pp. 866–872). New York NY, USA: Association for Computing Machinery.

Mochizuki, Y. (2019). Apyori. Retrieved February 9, 2022 from https://github.com/ymoch/apyori/

Offensive Security. (2022a). Kali Linux. Retrieved February 9, 2022 from https://www.kali.org/

Offensive Security. (2022b). Metasploit unleashed. Retrieved February 9, 2022 from https://www.offensive-security.com/metasploit-unleashed/

Palmer, N. (2019). Automating the assessment of network security in higher education. In 2019 international conference on computing, electronics communications engineering (iCCECE). 10.1109/iCCECE46942.2019.8941804 (pp. 141–146).

Parrish, A., Impagliazzo, J., Raj, R.K., Santos, H., Asghar, M.R., Jøsang, A., & Stavrou, E. (2018). Global perspectives on cybersecurity education for 2030: A case for a meta-discipline. In Proceedings companion of the 23rd annual ACM conference on innovation and technology in computer science education. 10.1145/3293881.3295778 (pp. 36–54). New York, NY, USA: ACM.

Pelánek, R., Effenberger, T., Vaněk, M., Sassmann, V., & Gmiterko, D. (2018). Measuring item similarity in introductory programming. In Proceedings of the fifth annual ACM conference on learning at scale. 10.1145/3231644.3231676 (pp. 1–4). New York NY, USA: Association for Computing Machinery.

Piech, C., Sahami, M., Koller, D., Cooper, S., & Blikstein, P. (2012). Modeling how students learn to program. In Proceedings of the 43rd ACM technical symposium on computer science education. 10.1145/2157136.2157182(pp. 153–160). New York NY, USA: Association for Computing Machinery.

Popovič, D. (2021). Clustering of command histories from cybersecurity training (Bachelor thesis, Masaryk University, Faculty of Informatics. https://is.muni.cz/th/fefjq/?lang=en

Romero C, Ventura S. Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2010;40(6):601–618. doi: 10.1109/TSMCC.2010.2053532. DOI

Romero C, Ventura S. Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery. 2020;10(3):e1355. doi: 10.1002/widm.1355. DOI

Romero C, Ventura S, Pechenizkiy M, Baker RS, editors. Handbook of educational data mining. Boca Raton, FL, USA: CRC Press; 2010.

scikit-learn developers. (2021). sklearn.preprocessing.MaxAbsScaler. Retrieved February 9, 2022 from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.htmlx

Shirkhorshidi. Aghabozorgi. Wah A Comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE. 2015;10(12):1–20. doi: 10.1371/journal.pone.0144059. PubMed DOI PMC

Tang, W., Pi, D., & He, Y. (2016). A density-based clustering algorithm with sampling for travel behavior analysis. In H. Yin et al (Eds.) Intelligent data engineering and automated learning – IDEAL 2016. 10.1007/978-3-319-46257-8_25 (pp. 231–239). Cham: Springer International Publishing.

Tian Z, Cui Y, An L, Su S, Yin X, Yin L, Cui X. A real-time correlation of host-level events in cyber range service for smart campus. IEEE Access. 2018;6:35355–35364. doi: 10.1109/ACCESS.2018.2846590. DOI

Tkáčik, K. (2020). Pattern mining in command histories from cybersecurity training (Bachelor thesis, Masaryk University, Faculty of Informatics). https://is.muni.cz/th/cxvr2/?lang=en

Vellido, A., Castro, F., & Nebot, A. (2010). Clustering educational data. In C. Romero, S. Ventura, M Pechenizkiy, & R.S. Baker (Eds.) Handbook of educational data mining. 10.1201/b10274 (pp. 75–92). Boca Raton FL, USA: CRC Press.

Vinlove Q, Mache J, Weiss R. Predicting student success in cybersecurity exercises with a support vector classifier. Journal of Computing Sciences in Colleges. 2020;36(1):26–34.

Švábenský, V., Vykopal, J., Seda, P., & Čeleda, P. (2021). Dataset of shell commands used by participants of hands-on cybersecurity training. Data in Brief. 10.1016/j.dib.2021.107398 PubMed PMC

Švábenský, V., Vykopal, J., Tovarňák, D., & Čeleda, P. (2021). Toolset for collecting shell commands and its application in hands-on cybersecurity training. In Proceedings of the 51st IEEE frontiers in education conference. 10.1109/FIE49875.2021.9637052 (pp. 1–9). New York NY, USA: IEEE.

Švábenský, V., Vykopal, J., Čeleda, P., Tkáčik, K., & Popovič, D. (2022). Supplementary Materials: Student assessment in cybersecurity training automated by pattern mining and clustering. Zenodo. 10.5281/zenodo.6024825 PubMed PMC

Švábenský, V., Weiss, R., Cook, J., Vykopal, J., Čeleda, P., Mache, J., & Chattopadhyay, A. (2022). Evaluating two approaches to assessing student progress in cybersecurity exercises. In Proceedings of the 53rd ACM technical symposium on computer science education. 10.1145/3478431.3499414. New York NY, USA: Association for Computing Machinery.

Vykopal, J., Čeleda, P., Seda, P., Švábenský, V., & Tovarňák, D. (2021). Scalable learning environments for teaching cybersecurity hands-on. In Proceedings of the 51st IEEE frontiers in education conference. 10.1109/FIE49875.2021.9637180 (pp. 1–9). New York, NY, USA: IEEE.

Weiss, R., Locasto, M.E., & Mache, J. (2016). A reflective approach to assessing student performance in cybersecurity exercises. In Proceedings of the 47th ACM technical symposium on computing science education. 10.1145/2839509.2844646 (pp. 597–602). New York, NY, USA: ACM.

Weiss R, Turbak F, Mache J, Locasto ME. Cybersecurity education and assessment in EDURange. IEEE Security & Privacy. 2017;15(3):90–95. doi: 10.1109/MSP.2017.54. DOI

Wiggins, J.B., Fahid, F.M., Emerson, A., Hinckle, M., Smith, A., Boyer, K.E., & Lester, J. (2021). Exploring novice programmers’ hint requests in an intelligent block-based coding environment. In Proceedings of the 52nd ACM technical symposium on computer science education. 10.1145/3408877.3432538(pp. 52–58). New York, NY, USA: Association for Computing Machinery.

Yin, H., Moghadam, J., & Fox, A. (2015). Clustering student programming assignments to multiply instructor leverage. In Proceedings of the second (2015) ACM conference on learning at scale. 10.1145/2724660.2728695 (pp. 367–372). New York, NY, USA: Association for Computing Machinery.

Newest 20 citations...

See more in
Medvik | PubMed

Student assessment in cybersecurity training automated by pattern mining and clustering

. 2022 ; 27 (7) : 9231-9262. [epub] 20220330

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...