Engineering Scalable Messaging Applications Using Stream Processing, Microservices, and Latency-Aware Data Pipelines
DOI:
https://doi.org/10.63282/3050-922X.IJERET-V1I1P112Keywords:
Stream Processing, Microservice Architecture, Event Processing, Latency Optimization, Distributed Systems Scalable Messaging, Real Time Data Pipelines, Cloud Native ArchitectureAbstract
Ultra-low latency and high-throughput messaging are demanded by modern real-time apps (financial trading platform, IoT telemetry system, online gaming infrastructures and massive analytics pipelines etc). But the conventional monolithic and batch based messaging architecture is unable to support the strict performance needs when faced with active and concurrent loads. The current distributed streaming solutions are scalable, yet they do not typically offer the built-in latency-aware optimization, dynamic scaling, and unified fault-tolerance, specific to microservices-based environments and established on clouds. This study overcomes these drawbacks by establishing this gap in high-throughput stream processing and architecture design sensitive to latency, especially in systems requiring exactly-once semantics, quick failure recovery and scalability when required to be elastic across a distributed cluster. We suggest a scalable, latency-conscious microservices-based message system to seal this gap incorporating both distributed event streaming provided by Apache Kafka and real time processing of streams provided by Apache Flink as well as orchestration of the new environment provided by Kubernetes. Adaptive partitioning, checkpoint-enabled fault tolerance, horizontal auto scaling, and backpressure-sensitive data flows are built in to the framework. The experimental assessment shows a 40-60 percent drop in end-to-end latency and almost linear increase in throughput due to addition of more workload, quick fault recovery within a few seconds and enhanced energy efficiency in using the CPU, when compared with the conventional batch architecture. These are the contributions such as a latency optimal architectural design, exhaustive scalability/fault tolerance design and validated performance benchmarks that are capable of supporting next generation distributed real-time systems.
References
[1] Akidau, T., Chernyak, S., & Lax, R. (2018). Streaming systems: the what, where, when, and how of large-scale data processing. " O'Reilly Media, Inc.".
[2] Lee, I. (2019). The Internet of Things for enterprises: An ecosystem, architecture, and IoT service business model. Internet of things, 7, 100078.
[3] Aulkemeier, F., Iacob, M. E., & van Hillegersberg, J. (2019). Platform-based collaboration in digital ecosystems. Electronic Markets, 29(4), 597-608.
[4] Singh, M. P., Hoque, M. A., & Tarkoma, S. (2016). A survey of systems for massive stream analytics. IEEE Communications Surveys & Tutorials, 18(3), 2325–2353.
[5] Gürcan, F., & Berigel, M. (2018, October). Real-time processing of big data streams: Lifecycle, tools, tasks, and challenges. In 2018 2nd International symposium on multidisciplinary studies and innovative technologies (ISMSIT) (pp. 1-6). IEEE.
[6] Kreps, J., Narkhede, N., & Rao, J. (2011, June). Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB (Vol. 11, No. 2011, pp. 1-7).
[7] Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2015). Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering, 38(4).
[8] Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., & Wilkes, J. (2015, April). Large-scale cluster management at Google with Borg. In Proceedings of the tenth european conference on computer systems (pp. 1-17).
[9] DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review, 41(6), 205-220.
[10] Guerraoui, R., & Rodrigues, L. (2006). Introduction to reliable distributed programming. Berlin, Heidelberg: Springer Berlin Heidelberg.
[11] Zhelev, S., & Rozeva, A. (2019). Using microservices and event driven architecture for big data stream processing. In AIP Conference Proceedings (Vol. 2172). American Institute of Physics. https://doi.org/10.1063/1.5133587
[12] Röger, H., & Mayer, R. (2019). A comprehensive survey on parallelization and elasticity in stream processing. Future Generation Computer Systems, 93, 651–668. https://doi.org/10.1016/j.future.2018.11.023
[13] Pietzuch, P. R., Shand, B., & Bacon, J. (2003, June). A framework for event composition in distributed systems. In ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing (pp. 62-82). Berlin, Heidelberg: Springer Berlin Heidelberg.
[14] Saifullah, A., Xu, Y., Lu, C., & Chen, Y. (2014). End-to-end communication delay analysis in industrial wireless networks. IEEE Transactions on Computers, 64(5), 1361-1374.
[15] Saxena, S., & Gupta, S. (2017). Practical real-time data processing and analytics: distributed computing and event processing using Apache Spark, Flink, Storm, and Kafka. Packt Publishing Ltd.
[16] Dialani, V., Miles, S., Moreau, L., De Roure, D., & Luck, M. (2002, August). Transparent fault tolerance for web services based architectures. In European Conference on Parallel Processing (pp. 889-898). Berlin, Heidelberg: Springer Berlin Heidelberg.
[17] Gilbert, J. (2018). Cloud Native Development Patterns and Best Practices: Practical architectural patterns for building modern, distributed cloud-native systems. Packt Publishing Ltd.
[18] Gannon, D., Barga, R., & Sundaresan, N. (2017). Cloud-native applications. IEEE Cloud Computing, 4(5), 16-21.
[19] Feitelson, D. G. (2002, September). Workload modeling for performance evaluation. In IFIP International Symposium on Computer Performance Modeling, Measurement and Evaluation (pp. 114-141). Berlin, Heidelberg: Springer Berlin Heidelberg.
[20] Lutteroth, C., & Weber, G. (2008, September). Modeling a realistic workload for performance testing. In 2008 12th International IEEE Enterprise Distributed Object Computing Conference (pp. 149-158). IEEE.
[21] Banzai, T., Koizumi, H., Kanbayashi, R., Imada, T., Hanawa, T., & Sato, M. (2010, May). D-cloud: Design of a software testing environment for reliable distributed systems using cloud computing technology. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (pp. 631-636). IEEE.
[22] Mozafari, B., Curino, C., Jindal, A., & Madden, S. (2013, June). Performance and resource modeling in highly-concurrent OLTP workloads. In Proceedings of the 2013 acm sigmod international conference on management of data (pp. 301-312).
[23] Park, S., Park, S., & Park, Y. B. (2018). An architecture framework for orchestrating context-aware IT ecosystems: A case study for quantitative evaluation. Sensors, 18(2), 562.
[24] Faurholt-Jepsen, M., Frost, M., Vinberg, M., Christensen, E. M., Bardram, J. E., & Kessing, L. V. (2015). Smartphone data as objective measures of bipolar disorder symptoms. Psychiatry Research, 217(1–2), 124–127. https://doi.org/10.1016/j.psychres.2014.03.009