Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset

. 2025 Jun ; 60 () : 111599. [epub] 20250428

Status PubMed-not-MEDLINE Jazyk angličtina Země Nizozemsko Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid40475077
Odkazy

PubMed 40475077
PubMed Central PMC12138947
DOI 10.1016/j.dib.2025.111599
PII: S2352-3409(25)00331-2
Knihovny.cz E-zdroje

The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi's usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance.

Zobrazit více v PubMed

Yang L., Moubayed A., Shami A. MTH-IDS: a multi-tiered hybrid intrusion detection system for internet of vehicles. IEEE Internet Things J. 2021;8(6):4531–4541. doi: 10.1109/JIOT.2021.3084796. DOI

Macedo F., et al. Proceedings of the 2023 International Conference on Smart Systems and Industrial Technology (ICSSIT) 2023. Intrusion detection system in vehicular network using Deep learning approach; pp. 1–6. DOI

Santos J.C., et al. Proceedings of the 2023 IEEE Symposium Series on Computational Intelligence (SSCI) 2023. Optimized machine learning-based intrusion detection system for internet of vehicles; pp. 1–8. DOI

Genser A., Makridis M.A., Yang K., Abmühl L., Menendez M., Kouvelas A. A traffic signal and loop detector dataset of an urban intersection regulated by a fully actuated signal control system. Data Br. 2023;48 doi: 10.1016/j.dib.2023.109117. PubMed DOI PMC

Zähringer M., Junior T., Adenaw L. Watt matters most – Survey data results of private passenger vehicle owners and commercial vehicle drivers. Data Br. 2023;48 doi: 10.1016/j.dib.2023.109942. PubMed DOI PMC

Collins K., Der Wartanian R., Hou Y., Ayyagari S.K., Khatri B.P. Using big data to analyze long-haul vs regional-short-haul trips for medium- and heavy-duty vehicles. Data Br. 2025;59 doi: 10.1016/j.dib.2025.111370. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...