All Issue

2023 Vol.19, Issue 4 Preview Page

Original Article

31 December 2023. pp. 924-935
Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified ‘R2L(Remote to Local)’ and ‘U2R (User to Root)’ attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.
  1. Bace, R.G., Mell, P. (2001). Intrusion Detection Systems. National Institute of Standards and Technology, NIST Special Publication on Intrusion Detection Systems, Gaithersburg, US. 10.6028/NIST.SP.800-31
  2. Jeong, M.K., Lee, S.H., Kim, C.S. (2020). "A study on the safety index service model by disaster sector using big data analysis." Journal of the Korea Society of Disaster Information, Vol. 16, No. 4, pp. 682-690.
  3. KDD CUP 1999 Data, The UCI KDD Archive,
  4. Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J. (2019). "Survey of intrusion detection systems: Techniques, datasets and challenges." Cybersecurity, Vol. 2, No. 1, pp. 1-22. 10.1186/s42400-019-0038-7
  5. Le, T.-T.-H., Kim, H., Kang, H., Kim, H. (2022). "Classification and explanation for intrusion detection system based on ensemble trees and SHAP method." Sensors, Vol. 22, No. 3, 1154. 10.3390/s22031154 35161899 PMC8840013
  6. NSL-KDD dataset, University of New Brunswick,
  7. Ribeiro, M.T., Singh, S., Guestrin, C. (2016). ""Why should I trust you?": Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135-1144. 10.1145/2939672.2939778
  8. Shapley, L.S. (1953). A value for n-person games, Contributions to Theory Games, vol. 2, Princeton University Press, Princeton, US. 10.1515/9781400881970-018
  9. So, B.G., Jeong, J.S. (2021). "Cyber risk management of SMEs to prevent personal information leakage accidents." Journal of the Korea Society of Disaster Information, Vol. 17, No. 2, pp. 375-390.
  10. Wali, S., Khan, I. (2021). "Explainable AI and random forest based reliable intrusion detection system." [online] Available: 10.36227/techrxiv.17169080
  11. Wang, M., Zheng, K., Yang, Y., Wang, X. (2020). "An explainable machine learning framework for intrusion detection systems." IEEE Access, Vol. 8, pp. 73127-73141. 10.1109/ACCESS.2020.2988359
  12. Wang, Y., Wang, P., Wang, Z., Cao, M. (2021). "An explainable intrusion detection system." IEEE International Conference on High Performance Computing and Communications (HPCC), 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Haikou, Hainan, China, pp. 1657-1662. 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00244
  • Publisher :The Korean Society of Disaster Information
  • Publisher(Ko) :한국재난정보학회
  • Journal Title :Journal of the Society of Disaster Information
  • Journal Title(Ko) :한국재난정보학회논문집
  • Volume : 19
  • No :4
  • Pages :924-935