We present a dataset of 13446 shell commands from 175 participants who attended cybersecurity training and solved assignments in the Linux terminal. Each acquired data record contains a command with its arguments and metadata, such as a timestamp, working directory, and host identification in the emulated training infrastructure. The commands were captured in Bash, ZSH, and Metasploit shells. The data are stored as JSON records, enabling vast possibilities for their further use in research and development. These include educational data mining, learning analytics, student modeling, and evaluating machine learning models for intrusion detection. The data were collected from 27 cybersecurity training sessions using an open-source logging toolset and two open-source interactive learning environments. Researchers and developers may use the dataset or deploy the learning environments with the logging toolset to generate their own data in the same format. Moreover, we provide a set of common analytical queries to facilitate the exploratory analysis of the dataset.
- Keywords
- Command-line history, Cybersecurity education, Cybersecurity exercise, Educational data mining, Host-based data collection, Learning analytics, Linux shell, Metasploit,
- Publication type
- Journal Article MeSH
Recent advances in Next-Generation Sequencing (NGS) make comparative analyses of the composition and diversity of whole microbial communities possible at a far greater depth than ever before. This brings new challenges, such as an increased dependence on computation to process these huge datasets. The demand on system resources usually requires migrating from Windows to Linux-based operating systems and prior familiarity with command-line interfaces. To overcome this barrier, we developed a fully automated and easy-to-install package as well as a complete, easy-to-follow pipeline for microbial metataxonomic analysis operating in the Windows Subsystem for Linux (WSL)-Bioinformatics Through Windows (BTW). BTW combines several open-access tools for processing marker gene data, including 16S rRNA, bringing the user from raw sequencing reads to diversity-related conclusions. It includes data quality filtering, clustering, taxonomic assignment and further statistical analyses, directly in WSL, avoiding the prior need of migrating from Windows to Linux. BTW is expected to boost the use of NGS amplicon data by facilitating rapid access to a set of bioinformatics tools for Windows users. Moreover, several Linux command line tools became more reachable, which will enhance bioinformatics accessibility to a wider range of researchers and practitioners in the life sciences and medicine. BTW is available in GitHub (https://github.com/vpylro/BTW). The package is freely available for noncommercial users.
- Keywords
- 16S rRNA, Marker gene, Metataxonomics, Microbiome, Windows,
- Publication type
- Journal Article MeSH
SUMMARY: CrocoBLAST is a tool for dramatically speeding up BLAST+ execution on any computer. Alignments that would take days or weeks with NCBI BLAST+ can be run overnight with CrocoBLAST. Additionally, CrocoBLAST provides features critical for NGS data analysis, including: results identical to those of BLAST+; compatibility with any BLAST+ version; real-time information regarding calculation progress and remaining run time; access to partial alignment results; queueing, pausing, and resuming BLAST+ calculations without information loss. AVAILABILITY AND IMPLEMENTATION: CrocoBLAST is freely available online, with ample documentation (webchem.ncbr.muni.cz/Platform/App/CrocoBLAST). No installation or user registration is required. CrocoBLAST is implemented in C, while the graphical user interface is implemented in Java. CrocoBLAST is supported under Linux and Windows, and can be run under Mac OS X in a Linux virtual machine. CONTACT: jkoca@ceitec.cz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Cybersecurity research relies on relevant datasets providing researchers a snapshot of network traffic generated by current users and modern applications and services. The lack of datasets coming from a realistic network environment leads to inefficiency of newly designed methods that are not useful in practice. This data article provides network traffic flows and event logs (Linux and Windows) from a two-day cyber defense exercise involving attackers, defenders, and fictitious users operating in a virtual exercise network. The data are stored as structured JSON, including data schemes and data dictionaries, ready for direct processing. Network topology of the exercise network in NetJSON format is also provided.
- Keywords
- Cyber defense exercise, Cybersecurity, Event log, KYPO, Network flow, Network traffic, Syslog,
- Publication type
- Journal Article MeSH
SUMMARY: Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. AVAILABILITY AND IMPLEMENTATION: KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- Publication type
- Journal Article MeSH
UNLABELLED: Expertomica Cells is a program for the creation and analysis of pedigree plots from time-lapse micrographs of cell monolayers. It enables recording the basic events in a cell cycle, cell neighbourhoods and spatial migration. The output is both numeric and graphical. The software helps to lower main hurdles in the manual analysis of cell monolayer development to practical limits; it reduces the operator processing time of typical experiment containing 5000 consecutive images from the usual 3 months to 3-10 h. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://www.expertomicacells.tk or http://www.expertomicacells.wu.cz. The source code is implemented in JAVA 6 and supported by Linux, Mac and MS Windows. SUPPLEMENTARY INFORMATION: Supplementary data available at Bioinformatics online.
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
- Keywords
- differentially expressed genes, functional annotation, transcriptome,
- MeSH
- Molecular Sequence Annotation methods MeSH
- Humans MeSH
- Sequence Analysis, RNA methods MeSH
- Software * MeSH
- Gene Expression Profiling methods MeSH
- Transcriptome * MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
This article provides an innovative approach for verification by model checking of programs that undergo continuous changes. To tackle the problem of repeating the entire model checking for each new version of the program, our approach verifies programs incrementally. It reuses computational history of the previous program version, namely function summaries. In particular, the summaries are over-approximations of the bounded program behaviors. Whenever reusing of summaries is not possible straight away, our algorithm repairs the summaries to maximize the chance of reusability of them for subsequent runs. We base our approach on satisfiability modulo theories (SMT) to take full advantage of lightweight modeling approach and at the same time the ability to provide concise function summarization. Our approach leverages pre-computed function summaries in SMT to localize the checks of changed functions. Furthermore, to exploit the trade-off between precision and performance, our approach relies on the use of an SMT solver, not only for underlying reasoning, but also for program modeling and the adjustment of its precision. On the benchmark suite of primarily Linux device drivers versions, we demonstrate that our algorithm achieves an order of magnitude speedup compared to prior approaches.
- Keywords
- Craig interpolation, Incremental verification, Program changes, SMT solving, Symbolic model checking,
- Publication type
- Journal Article MeSH
MOTIVATION: One of the objectives of protein engineering is to propose and construct modified proteins with improved activity for the substrate of interest. Systematic computational investigation of many protein variants requires the preparation and handling of a large number of data files. The type of the data generated during the modelling of protein variants and the estimation of their activities offers the possibility of process automatization. RESULTS: The graphical program TRITON has been developed for modelling protein mutants and assessment of their activities. Protein mutants are modelled from the wild type structure by homology modelling using the external program MODELLER. Chemical reactions taking place in the mutants active site are modelled using the semi-empirical quantum mechanic program MOPAC. Semi-quantitative predictions of mutants activities can be achieved by evaluating the changes in energies of the system and partial atomic charges of active site residues during the reaction. The program TRITON offers graphical tools for the preparation of the input data files, for calculation and for the analysis of the generated output data. AVAILABILITY: The program TRITON can run under operating systems IRIX, Linux and NetBSD. The software is available at http://www.chemi.muni.cz/lbsd/triton.ht ml.
- MeSH
- Models, Chemical MeSH
- Enzymes chemistry genetics MeSH
- Catalysis MeSH
- Mutation genetics MeSH
- Computer Simulation * MeSH
- Protein Engineering methods MeSH
- Sequence Homology, Amino Acid MeSH
- Software * MeSH
- User-Computer Interface MeSH
- Binding Sites genetics MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Enzymes MeSH
MOTIVATION: Automatic tracking of cells in multidimensional time-lapse fluorescence microscopy is an important task in many biomedical applications. A novel framework for objective evaluation of cell tracking algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2013 Cell Tracking Challenge. In this article, we present the logistics, datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. RESULTS: The main contributions of the challenge include the creation of a comprehensive video dataset repository and the definition of objective measures for comparison and ranking of the algorithms. With this benchmark, six algorithms covering a variety of segmentation and tracking paradigms have been compared and ranked based on their performance on both synthetic and real datasets. Given the diversity of the datasets, we do not declare a single winner of the challenge. Instead, we present and discuss the results for each individual dataset separately. AVAILABILITY AND IMPLEMENTATION: The challenge Web site (http://www.codesolorzano.com/celltrackingchallenge) provides access to the training and competition datasets, along with the ground truth of the training videos. It also provides access to Windows and Linux executable files of the evaluation software and most of the algorithms that competed in the challenge.