Forecasting Hardware Failures or Resource Bottlenecks Before They Occur

Authors

  • Nagireddy Karri Senior IT Administrator Database, Sherwin-Williams, USA. Author
  • Partha Sarathi Reddy Pedda Muntala Software Developer at Cisco Systems, Inc, USA. Author
  • Sandeep Kumar Jangam Lead Consultant, Infosys Limited, USA. Author

DOI:

https://doi.org/10.63282/3050-922X.IJERET-V3I2P111

Keywords:

Predictive Maintenance, Hardware Failure, Resource Bottlenecks, Machine Learning, Anomaly Detection, System Monitoring

Abstract

Hardware failures and resources bottlenecks are unpredictable, and this is crucial in determining high system availability, reliability and performance. The predictive analytics based on advanced machine learning algorithms and history of the system of the past will be introduced in this paper as the key to tropical predictions of the errors in the hardware and excessive consumption of resources. System logs, performance power tools and anomaly detectors will help us locate any trends that may warn us about imminent failures. As our methodology, we will use pre-processing of our data, feature discrimination, model training, and real-time monitoring, which will translate into predictive models that will provide warnings to the administrators on critical issues, even before they manifest. The findings confirm that proactive prediction can ensure a considerable decrease in the downtime and improve the use of resources and operational costs. The study has helped in the study of predictive maintenance and anticipatory resource management as it provides a systematized process of failure prediction in various computing systems

References

[1] Georgoulopoulos, N., Hatzopoulos, A., Karamitsios, K., Tabakis, I. M., Kotrotsios, K., & Metsai, A. I. (2021, July). A survey on hardware failure prediction of servers using machine learning and deep learning. In 2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST) (pp. 1-5). IEEE.

[2] Gargiulo, F., Duellmann, D., Arpaia, P., & Schiano Lo Moriello, R. (2021). Predicting hard disk failure by means of automatized labeling and machine learning approach. Applied Sciences, 11(18), 8293.

[3] Tomer, V., Sharma, V., Gupta, S., & Singh, D. P. (2021). Hard disk drive failure prediction using SMART attribute. Materials Today: Proceedings, 46, 11258-11262.

[4] Minovski, D., Ögren, N., Mitra, K., & Åhlund, C. (2021). Throughput prediction using machine learning in LTE and 5G networks. IEEE Transactions on Mobile Computing, 22(3), 1825-1840.

[5] Sun, X., Chakrabarty, K., Huang, R., Chen, Y., Zhao, B., Cao, H., ... & Jiang, L. (2019, June). System-level hardware failure prediction using deep learning. In Proceedings of the 56th Annual Design Automation Conference 2019 (pp. 1-6).

[6] Maron, M., McAlpine, C. A., Watson, J. E., Maxwell, S., & Barnard, P. (2015). Climate‐induced resource bottlenecks exacerbate species vulnerability: a review. Diversity and Distributions, 21(7), 731-743.

[7] Lima, A. L. D. C. D., Aranha, V. M., Carvalho, C. J. D. L., & Nascimento, E. G. S. (2021). Smart predictive maintenance for high-performance computing systems: a literature review. The Journal of Supercomputing, 77(11), 13494-13513.

[8] Zhu, T., Ran, Y., Zhou, X., & Wen, Y. (2019). A survey of predictive maintenance: Systems, purposes and approaches. arXiv preprint arXiv:1912.07383.

[9] Cachada, A., Barbosa, J., Leitño, P., Gcraldcs, C. A., Deusdado, L., Costa, J., ... & Romero, L. (2018, September). Maintenance 4.0: Intelligent and predictive maintenance system architecture. In 2018 IEEE 23rd international conference on emerging technologies and factory automation (ETFA) (Vol. 1, pp. 139-146). IEEE.

[10] Li, S., Jia, Z., Li, Y., Liao, X., Xu, E., Liu, X., ... & Gao, L. (2019). Detecting Performance Bottlenecks Guided by Resource Usage. IEEE Access, 7, 117839-117849.

[11] Sinha, S., Goyal, N. K., & Mall, R. (2019). Early prediction of reliability and availability of combined hardware-software systems based on functional failures. Journal of Systems Architecture, 92, 23-38.

[12] Khalil, K., Eldash, O., Kumar, A., & Bayoumi, M. (2020). Machine learning-based approach for hardware faults prediction. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(11), 3880-3892.

[13] Ibidunmoye, O., Hernández-Rodriguez, F., & Elmroth, E. (2015). Performance anomaly detection and bottleneck identification. ACM Computing Surveys (CSUR), 48(1), 1-35.

[14] Li, L., Chang, Q., & Ni, J. (2009). Data driven bottleneck detection of manufacturing systems. International Journal of production research, 47(18), 5019-5036.

[15] Borissova, D., & Mustakerov, I. (2012). An integrated framework of designing a decision support system for engineering predictive maintenance. Int. J. Inf. Technol. Knowl, 6(4), 366-376.

[16] Lee, J., Ni, J., Singh, J., Jiang, B., Azamfar, M., & Feng, J. (2020). Intelligent maintenance systems and predictive manufacturing. Journal of Manufacturing Science and Engineering, 142(11), 110805.

[17] Fernandes, M., Canito, A., Bolón-Canedo, V., Conceição, L., Praça, I., & Marreiros, G. (2019). Data analysis and feature selection for predictive maintenance: A case-study in the metallurgic industry. International journal of information management, 46, 252-262.

[18] Tiddens, W. W., Braaksma, A. J. J., & Tinga, T. (2018). Selecting suitable candidates for predictive maintenance. International Journal of Prognostics and Health Management, 9(1).

[19] Ong, K. S. H., Wang, W., Niyato, D., & Friedrichs, T. (2021). Deep-reinforcement-learning-based predictive maintenance model for effective resource management in industrial IoT. IEEE Internet of Things Journal, 9(7), 5173-5188.

[20] Hashemian, H. M. (2010). State-of-the-art predictive maintenance techniques. IEEE Transactions on Instrumentation and measurement, 60(1), 226-236.

Downloads

Published

2022-06-30

Issue

Section

Articles

How to Cite

1.
Karri N, Pedda Muntala PSR, Jangam SK. Forecasting Hardware Failures or Resource Bottlenecks Before They Occur. IJERET [Internet]. 2022 Jun. 30 [cited 2026 Jan. 27];3(2):99-109. Available from: https://ijeret.org/index.php/ijeret/article/view/314