Reliability at the Edge: SRE for Distributed Cloud and IoT Platforms
DOI:
https://doi.org/10.63282/3050-922X.IJERET-V6I2P106Keywords:
Site Reliability Engineering (SRE), Edge Computing, Distributed Cloud, IoT, Resilience Engineering, Latency Optimization, Reliability Metrics, DevOps, Observability, Chaos Engineering, Infrastructure AutomationAbstract
As computer paradigms travel to the edge to provide resilience, scalability, and uptime in more distributed environments, site reliability engineering (SRE) is redefining itself. This work explores the development of SRE approaches to fit distributed cloud and Internet of Things (IoT) platforms which are distinguished by fragmented architectures, varied network circumstances, and different hardware which are marked by fragmented architectures, varied network circumstances, and different hardware. Site Reliability Engineering (SRE) is the application of software engineering approaches in operations aimed at optimal availability and performance. While distributed clouds span several sites to encourage flexibility and scalability, edge computing puts processing resources closer to the data source to reduce latency. IoT systems link physically active data-generating devices needing fast response. Taken together, these technologies create new dependability problems like limited local observability, edge node failures, intermittent connection, and regional variability. This paper defines these difficulties coupled with plausible SRE solutions based on federated configuration management, autonomous remediation, localized alerting and metrics aggregation, and resilient rollout strategies. One important component is a useful case study on an edge-driven smart logistics platform where adaptive load balancing driven by artificial intelligence-driven predictive maintenance greatly reduced demand and hence improved resource use by way of reducing downtime. This case shows how SRE ideas blameless postmortems, error budgets, and Service Level Objectives (SLOs) adapted to edge-centric ecosystems. The story still stays anchored in pragmatic language, highlighting the dependability of position as a human and commercial issue instead of merely a technical one. Though their main goal is still to provide smooth and consistent digital experiences over a vast, intelligent infrastructure, readers will be well aware of how conventional SRE techniques are evolving to accommodate the complexity of edge activities
References
[1] Scotece, Domenico. "Edge computing for extreme reliability and scalability." (2020).
[2] Haseeb, Khalid, et al. "Ddr-esc: a distributed and data reliability model for mobile edge-based sensor-cloud." IEEE Access 8 (2020): 185752-185760.
[3] Chelliah, Pethuru Raj, Shreyash Naithani, and Shailender Singh. Practical Site Reliability Engineering: Automate the process of designing, developing, and delivering highly reliable apps and services with SRE. Packt Publishing Ltd, 2018.
[4] Paidy, Pavan. “Unified Threat Detection Platform With AI, SIEM, and XDR”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 6, no. 1, Jan. 2025, pp. 95-104
[5] Sriram Datla, Lalith, and Samardh Sai Malay. “Zero-Touch Decommissioning in Healthcare Clouds: An Automation Playbook With AWS Nuke and GuardRails”. Los Angeles Journal of Intelligent Systems and Pattern Recognition, vol. 5, Mar. 2025, pp. 1-24
[6] Atluri, Anusha, and Vijay Reddy. “Cognitive HR Management: How Oracle HCM Is Reinventing Talent Acquisition through AI”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 6, no. 1, Jan. 2025, pp. 85-94
[7] Benson, Kyle E., et al. "Ride: A resilient IoT data exchange middleware leveraging SDN and edge cloud resources." 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI). IEEE, 2018.
[8] Balkishan Arugula. “Order Management Optimization in B2B and B2C Ecommerce: Best Practices and Case Studies”. Artificial Intelligence, Machine Learning, and Autonomous Systems, vol. 8, June 2024, pp. 43-71
[9] Jani, Parth, and Sangeeta Anand. "Compliance-Aware AI Adjudication Using LLMs in Claims Engines (Delta Lake+ LangChain)." International Journal of Artificial Intelligence, Data Science, and Machine Learning 5.2 (2024): 37-46.
[10] Maciel, Paulo, et al. "A survey on reliability and availability modeling of edge, fog, and cloud computing." Journal of Reliable Intelligent Environments (2022): 1-19.
[11] Talakola, Swetha. “Enhancing Financial Decision Making With Data Driven Insights in Microsoft Power BI”. Essex Journal of AI Ethics and Responsible Innovation, vol. 4, Apr. 2024, pp. 329-3
[12] Kupanarapu, Sujith Kumar. "AI-POWERED SMART GRIDS: REVOLUTIONIZING ENERGY EFFICIENCY IN RAILROAD OPERATIONS." INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY (IJCET) 15.5 (2024): 981-991.
[13] Jabbar Mohammad, Abdul. “Integrating Timekeeping and Payroll Systems During Organizational Transitions—Mergers, Layoffs, Spinoffs, and Relocations”. Los Angeles Journal of Intelligent Systems and Pattern Recognition, vol. 5, Feb. 2025, pp. 25-53
[14] Jonathan, Albert, et al. "Ensuring reliability in geo-distributed edge cloud." 2017 Resilience Week (RWS). IEEE, 2017.
[15] Balkishan Arugula, and Suni Karimilla. “Modernizing Core Banking Systems: Leveraging AI and Microservices for Legacy Transformation”. Artificial Intelligence, Machine Learning, and Autonomous Systems, vol. 9, Feb. 2025, pp. 36-67
[16] Veluru, Sai Prasad, and Mohan Krishna Manchala. "Using LLMs as Incident Prevention Copilots in Cloud Infrastructure." International Journal of AI, BigData, Computational and Management Studies 5.4 (2024): 51-60.
[17] Chaganti, Krishna Chaitanya. "Ethical AI for Cybersecurity: A Framework for Balancing Innovation and Regulation." Authorea Preprints (2025).
[18] Sangeeta Anand. “Fully Autonomous AI-Driven ETL Pipelines for Continuous Medicaid Data Processing”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 13, no. 1, Feb. 2025, pp. 108–126
[19] Xing, Liudong. "Reliability in Internet of Things: Current status and future perspectives." IEEE Internet of Things Journal 7.8 (2020): 6704-6721.
[20] Talakola, Swetha. “Automated End to End Testing With Playwright for React Applications”. International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 1, Mar. 2024, pp. 38-47
[21] Duc, Thang Le, et al. "Machine learning methods for reliable resource provisioning in edge-cloud computing: A survey." ACM Computing Surveys (CSUR) 52.5 (2019): 1-39.
[22] Paidy, Pavan. “Leveraging AI in Threat Modeling for Enhanced Application Security”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 4, no. 2, June 2023, pp. 57-66
[23] Huang, Cheng-Fu, Ding-Hsiang Huang, and Yi-Kuei Lin. "Network reliability evaluation for a distributed network with edge computing." Computers & Industrial Engineering 147 (2020): 106492.
[24] Abdul Jabbar Mohammad, and Guru Modugu. “Behavioral Timekeeping—Using Behavioral Analytics to Predict Time Fraud and Attendance Irregularities”. Artificial Intelligence, Machine Learning, and Autonomous Systems, vol. 9, Jan. 2025, pp. 68-95
[25] Jani, Parth. "AI AND DATA ANALYTICS FOR PROACTIVE HEALTHCARE RISK MANAGEMENT." INTERNATIONAL JOURNAL 8.10 (2024).
[26] Duan, Sijing, et al. "Distributed artificial intelligence empowered by end-edge-cloud computing: A survey." IEEE Communications Surveys & Tutorials 25.1 (2022): 591-624.
[27] Mehdi Syed, Ali Asghar, and Shujat Ali. “Kubernetes and AWS Lambda for Serverless Computing: Optimizing Cost and Performance Using Kubernetes in a Hybrid Serverless Model”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 5, no. 4, Dec. 2024, pp. 50-60
[28] Veluru, Sai Prasad. "Zero-Interpolation Models: Bridging Modes with Nonlinear Latent Spaces." International Journal of AI, BigData, Computational and Management Studies 5.1 (2024): 60-68.
[29] Tarra, Vasanta Kumar. “Personalization in Salesforce CRM With AI: How AI ML Can Enhance Customer Interactions through Personalized Recommendations and Automated Insights”. International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 4, Dec. 2024, pp. 52-61
[30] Chaganti, Krishna Chaitanya. "A Scalable, Lightweight AI-Driven Security Framework for IoT Ecosystems: Optimization and Game Theory Approaches." Authorea Preprints (2025).
[31] El-Sayed, Hesham, et al. "Edge of things: The big picture on the integration of edge, IoT and the cloud in a distributed computing environment." ieee access 6 (2017): 1706-1717.
[32] Kiran, Neelakanta Sarvashiva, et al. "Danio rerio: A Promising Tool for Neurodegenerative Dysfunctions." Animal Behavior in the Tropics: Vertebrates. Singapore: Springer Nature Singapore, 2025. 47-67.
[33] Tarra, Vasanta Kumar. “Telematics & IoT-Driven Insurance With AI in Salesforce”. International Journal of AI, BigData, Computational and Management Studies, vol. 5, no. 3, Oct. 2024, pp. 72-80
[34] Talakola, Swetha. “Transforming BOL Images into Structured Data Using AI”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 6, no. 1, Mar. 2025, pp. 105-14
[35] Li, Junlong, et al. "Edge-cloud computing systems for smart grid: state-of-the-art, architecture, and applications." Journal of Modern Power Systems and Clean Energy 10.4 (2022): 805-817.
[36] Jani, Parth. "Generative AI in Member Portals for Benefits Explanation and Claims Walkthroughs." International Journal of Emerging Trends in Computer Science and Information Technology 5.1 (2024): 52-60.
[37] Paidy, Pavan, and Krishna Chaganti. “Securing AI-Driven APIs: Authentication and Abuse Prevention”. International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 1, Mar. 2024, pp. 27-37
[38] Lalith Sriram Datla, and Samardh Sai Malay. “Transforming Healthcare Cloud Governance: A Blueprint for Intelligent IAM and Automated Compliance”. Journal of Artificial Intelligence & Machine Learning Studies, vol. 9, Jan. 2025, pp. 15-37
[39] Yasodhara Varma. “Managing Data Security & Compliance in Migrating from Hadoop to AWS”. American Journal of Autonomous Systems and Robotics Engineering, vol. 4, Sept. 2024, pp. 100-19
[40] Escamilla-Ambrosio, P. J., et al. "Distributing computing in the internet of things: cloud, fog and edge computing overview." NEO 2016: Results of the Numerical and Evolutionary Optimization Workshop NEO 2016 and the NEO Cities 2016 Workshop held on September 20-24, 2016 in Tlalnepantla, Mexico. Springer International Publishing, 2018.
[41] Chaganti, Krishna Chaitanya. "AI-Powered Patch Management: Reducing Vulnerabilities in Operating Systems." International Journal of Science And Engineering 10.3 (2024): 89-97.
[42] Tarra, Vasanta Kumar. “Automating Customer Service With AI in Salesforce ”. International Journal of AI, BigData, Computational and Management Studies, vol. 5, no. 3, Oct. 2024, pp. 61-71
[43] Abdul Jabbar Mohammad. “Biometric Timekeeping Systems and Their Impact on Workforce Trust and Privacy”. Journal of Artificial Intelligence & Machine Learning Studies, vol. 8, Oct. 2024, pp. 97-123
[44] Arugula, Balkishan. “Prompt Engineering for LLMs: Real-World Applications in Banking and Ecommerce”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 6, no. 1, Jan. 2025, pp. 115-23
[45] Lalith Sriram Datla. “Centralized Monitoring in a Multi-Cloud Environment: Our Experience Integrating CMP and KloudFuse”. Journal of Artificial Intelligence & Machine Learning Studies, vol. 8, Jan. 2024, pp. 20-41
[46] Amiri, Zahra, et al. "Resilient and dependability management in distributed environments: A systematic and comprehensive literature review." Cluster Computing 26.2 (2023): 1565-1600.
[47] Veluru, Sai Prasad. "Dynamic Loss Function Tuning via Meta-Gradient Search." International Journal of Emerging Research in Engineering and Technology 5.2 (2024): 18-27.
[48] Pan, Jianli, and James McElhannon. "Future edge cloud and edge computing for internet of things applications." IEEE Internet of Things Journal 5.1 (2017): 439-449.
[49] L. N. R. Mudunuri and V. Attaluri, “Urban development challenges and the role of cloud AI-powered blue-green solutions,” In Advances in Public Policy and Administration, IGI Global, USA, pp. 507–522, 2024. - 1