A Robust Machine Learning Architecture for Accurate Malware Classification in Large-Scale Cyber Threat Dataset

Dinesh Rajasekharan

doi:10.63282/3050-922X.IJERET-V7I2P131

Authors

Dinesh Rajasekharan Senior Director of Product Management, Vellore Institute of Technology USA. Author

DOI:

https://doi.org/10.63282/3050-922X.IJERET-V7I2P131

Keywords:

Attack, Malware Detection, Internet-Of-Things (Iot), Cybersecurity, Machine Learning, Malware Classification, EMBER Dataset

Abstract

Malware is a significant threat to information systems and software security as the technology progresses. As malware continues to evolve, it is essential to have quick and precise detection systems to deal with any potential threats. Machine learning might be a solution, as it is challenging for standard detection technologies to keep up with the rapid evolution of malware. This work introduces the EMBER dataset as the base for a machine learning malware classification system. Data pretreatment approaches and exploratory data analysis are used in order to make the data more consistent and improve the feature representation. Random Forest (RF) and Multi-Layer Perceptron (MLP) classifiers are used to develop and evaluate classifiers based on accuracy, precision, recall, F1-score and Area Under the ROC Curve (AUC). The experimental outcomes reveal that the MLP model exhibits a higher accuracy of 97.63%, recall of 95.43%, and F1 score of 95.44% and AUC of 99.13% compared with RF model with 94.66% accuracy and 92.25% AUC. The effectiveness of the proposed framework is further substantiated through a comparative study with other ML and DL approaches in terms of prediction power and generalization ability. The proposed framework is expected to offer strong malware classification capabilities, enabling cybersecurity decision-making in the evolving threat landscape, and to offer scalable and reliable detection capabilities.

References

[1] R. rao Thallada and N. Alapati, “Privacy and Cybersecurity Convergence: GRC Controls for Data Protection,” J. Bus. Manag. Stud., vol. 8, no. 5, pp. 42–48, March, 2026, doi: 10.32996/jbms.

[2] S. Irfan, “Identification of Financial Fraud Transactions: A Cybersecurity via Machine Learning Methods,” in 2026 IEEE International Conference on AI Engineering and Innovations (AIEI), NIT Jamshedpur, India: IEEE, 2026, pp. 1–6, May. doi: 10.1109/AIEI69164.2026.11497127.

[3] B. Mohan, V. R. Surasani, and R. Kumar, “Autonomous Data Stewardship: Multi-Agent AI for Real-Time Master Data Management in Financial Services,” in 2026 IEEE 16th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA: IEEE, 2026, pp. 0689–0698, February. doi: 10.1109/CCWC67433.2026.11393832.

[4] B. Pargi, D. S. Degadwala, and M. Joshi, “Hybrid Malware Analysis using Static and Dynamic Techniques with Machine Learning,” Int. J. Sci. Res. Sci. Eng. Technol., vol. 13, no. 1, pp. 22–27, Jan. 2026, doi: 10.32628/IJSRSET2613101.

[5] V. K. Bollu, “Threat Landscape in Artificial Intelligence Systems: Taxonomy, Attack Vectors and Security Implications,” World J. Adv. Res. Rev., vol. 29, no. 1, pp. 285–294, January, 2026, doi: https://doi.org/10.30574/wjarr.2026.29.1.0007.

[6] J. B. Mehta, “Securing Test Automation in Zero Trust Architectures: A Framework for Continuous Verification,” in 2025 International Conference on Computer and Applications (ICCA), IEEE, Dec. 2025, pp. 1–5. doi: 10.1109/ICCA66035.2025.11430950.

[7] R. Alguliyev, R. Aliguliyev, and L. Sukhostat, “Radon transform based malware classification in cyber-physical system using deep learning,” Results Control Optim., vol. 14, p. 100382, Mar. 2024, doi: 10.1016/j.rico.2024.100382.

[8] S. Priyadarshini, C. Althati, M. Tomar, K. R. Jinna, T. Pichaimani, and V. P. Rambabu, “A Scalable Digital Twin Architecture for Intelligent Cyber Physical Systems,” in 2026 International Conference on Machine Learning and Autonomous Systems (ICMLAS), Bangkok, Thailand: IEEE, 2026, pp. 1841–1847, March. doi: 10.1109/ICMLAS67792.2026.11483673.

[9] J. Singh and J. Singh, “A survey on machine learning-based malware detection in executable files,” J. Syst. Archit., vol. 112, p. 101861, Jan. 2021, doi: 10.1016/j.sysarc.2020.101861.

[10] A. Pathak, U. Barman, and T. S. Kumar, “Machine learning approach to detect android malware using feature-selection based on feature importance score,” J. Eng. Res., vol. 13, no. 2, pp. 712–720, Jun. 2025, doi: 10.1016/j.jer.2024.04.008.

[11] D. Javaheri, H. Chizari, M. Fahmideh, M. H. Nadimi-Shahraki, and J. Hur, “DeepRadar: A cyber-defence interceptor for early warning and defusing malware injection attacks,” Knowledge-Based Syst., vol. 331, p. 114830, Jan. 2026, doi: 10.1016/j.knosys.2025.114830.

[12] J. B. Mehta, “Autonomous Patch Validation for Zero-Day Exploits in Enterprise Clouds,” Int. J. Appl. Math., vol. 38, no. 4s, pp. 1270–1285, Oct. 2025, doi: 10.12732/ijam.v38i4s.685.

[13] D. Bhattacharjee, “Design and Evaluation of Deep Generative AI Model for Intrusion Detection in Cyber Threat Monitoring,” in 2025 7th International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Mohali, Punjab, India: IEEE, 2025, pp. 1–6, December. doi: https://doi.org/10.1109/ISAECT68904.2025.11318752.

[14] B. Madupati, M. M. Mohammed, L. Upadhyay, D. P. Guda, K. Kaushik, and M. Soni, “Integrating Artificial Intelligence with Cybersecurity for Resilient Wireless Communication Against Advanced Threats,” in 2025 International Conference on Artificial Intelligence and Machine Vision (AIMV), IEEE, Aug. 2025, pp. 1–5. doi: 10.1109/AIMV66517.2025.11203666.

[15] S. K. Chintagunta and S. Amrale, “Enhancing Cloud Database Security Through Intelligent Threat Detection and Risk Mitigation,” Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., vol. 8, no. 3, November-December, pp. 756–768, 2022, [Online]. Available: https://ijsrcseit.com/CSEIT22556

[16] P. R. S and S. B. C. K, “AI Driven Exploit Mitigation for Zero Day Vulnerability Using SVM and Autoencoder,” in 2025 3rd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), 2025, pp. 1–4. doi: 10.1109/ICAECA63854.2025.11012512.

[17] K. Dickey, D. Hwang, and D. Kim, “Analyzing Various Machine Learning Approaches for Detecting Android Malware,” in Conference Proceedings - IEEE SOUTHEASTCON, 2024. doi: 10.1109/SoutheastCon52093.2024.10500178.

[18] W. Cassel and N. E. Majd, “A Lightweight Obfuscated Malware Multi-class Classifier for IoT Using Machine Learning,” in 2024 International Conference on Computing, Networking and Communications (ICNC), IEEE, Feb. 2024, pp. 239–243. doi: 10.1109/ICNC59896.2024.10555986.

[19] A. Sharma and H. Babbar, “An Analysis of Android Malware and IoT Attack Detection with Machine Learning,” in 2023 3rd International Conference on Intelligent Technologies (CONIT), IEEE, Jun. 2023, pp. 1–5. doi: 10.1109/CONIT59222.2023.10205931.

[20] B. Bokolo, R. Jinad, and Q. Liu, “A Comparison Study to Detect Malware using Deep Learning and Machine learning Techniques,” in 2023 IEEE 6th International Conference on Big Data and Artificial Intelligence (BDAI), IEEE, Jul. 2023, pp. 1–6. doi: 10.1109/BDAI59165.2023.10256957.

[21] H. Broome, Y. Shrestha, N. Harrison, and N. Rahimi, “SMS Malware Detection: A Machine Learning Approach,” in 2022 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, Dec. 2022, pp. 936–941. doi: 10.1109/CSCI58124.2022.00167.

[22] M. R. C. MUKKOLAKKAL, “IntelliStore: An Intelligent AI Agent Framework for Autonomous Storage and Database Optimization in Cloud-Native Microservices,” Int. J. Sci. Res. Mod. Technol., vol. 3, no. 12, pp. 243–250, Dec, 2024, doi: https://doi.org/10.38124/ijsrmt.v3i12.1024.

[23] D. R. Arikkat et al., “DroidTTP: Mapping android applications with TTP for Cyber Threat Intelligence,” J. Inf. Secur. Appl., vol. 93, p. 104162, Sep. 2025, doi: 10.1016/j.jisa.2025.104162.

[24] R. D, A. T, and T. M, “Malware Classification Using Machine Learning and Deep Learning: A Comprehensive Approach,” Cureus J. Comput. Sci., Jul. 2025, doi: 10.7759/s44389-025-05024-y.

[25] M. Ababneh, A. Al-Droos, and A. El-Hassan, “Modern Mobile Malware Detection Framework Using Machine Learning and Random Forest Algorithm,” Comput. Syst. Sci. Eng., vol. 48, no. 5, pp. 1171–1191, 2024, doi: 10.32604/csse.2024.052875.

[26] V. R. and S. K.P., “DeepMalNet: Evaluating shallow and deep networks for static PE malware detection,” ICT Express, vol. 4, no. 4, pp. 255–258, Dec. 2018, doi: 10.1016/j.icte.2018.10.006.

[27] D. B. D. G. Gabin, D. D. Jerome, K. Koffi, and O. Souleymane, “Innovation in Cyber Threat Detection: Transformer-Based Approach,” Int. J. Adv. Res., vol. 12, no. 11, pp. 1375–1389, Nov. 2024, doi: 10.21474/IJAR01/19953.

[28] H. Alkahtani and T. H. H. Aldhyani, “Artificial Intelligence Algorithms for Malware Detection in Android-Operated Mobile Devices,” Sensors, vol. 22, no. 6, p. 2268, Mar. 2022, doi: 10.3390/s22062268.

A Robust Machine Learning Architecture for Accurate Malware Classification in Large-Scale Cyber Threat Dataset

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications