Optimizing Multi-Tenant Resource Allocation in Cloud-Based Distributed Systems for Large-Scale AI Model Training Using In-Memory Computing
DOI:
https://doi.org/10.63282/3050-922X.IJERET-V2I1P105Keywords:
Multi-Tenant Systems, Cloud Computing, Distributed Systems, AI Model Training, In-Memory Computing, Resource Allocation, Scheduling, Load BalancingAbstract
The number of applications driven by AI is increasing exponentially, thus presenting a growth in the number of scalable and efficient training mechanisms to be done in cloud facilities. The need to provide computational resources is a scrupulous issue in the multi-tenant setting of distributed systems, especially to train large-scale AI models. Conventional disk-based processing models are not keeping pace because of excessive latency and contentions. In this paper, the reader is presented with a new framework to optimize resource allocation in multi-tenant cloud-based distributed systems by utilizing the approaches of In-Memory Computing (IMC). To help developers improve training time and achieve a higher training throughput, we suggest a resource-efficient scheduling algorithm that optimally anticipates the dynamic profiling of workloads, real-time data localization and memory-aware execution plan, to achieve a model running time. Instead, we compare our methodology with industrial and realistic workloads on different platforms of a cloud. Experimental evidence proves the existence of massive increases in performance and resource-saving as opposed to conventional methods. The paper also discusses how tenant-conscious memory partitioning and traditionally forecasted load balancing help in increasing system scalability and fairness
References
[1] Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., ... & Warfield, A. (2003). Xen and the art of virtualization. ACM SIGOPS operating systems review, 37(5), 164-177.
[2] Merkel, D. (2014). Docker: lightweight Linux containers for consistent development and deployment. Linux j, 239(2), 2.
[3] Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
[4] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). {TensorFlow}: a system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16) (pp. 265-283).
[5] Paszke, A. (2019). Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703.
[6] Sergeev, A., & Del Balso, M. (2018). Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799.
[7] Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. In the 2nd USENIX workshop on hot topics in cloud computing (HotCloud 10).
[8] Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., & Stoica, I. (2011). Dominant resource fairness: Fair allocation of multiple resource types. In the 8th USENIX symposium on networked systems design and implementation (NSDI 11).
[9] Mao, M., Li, J., & Humphrey, M. (2010, October). Cloud auto-scaling with deadline and budget constraints. In 2010, 11th IEEE/ACM International Conference on Grid Computing (pp. 41-48). IEEE.
[10] Hennessy, J. L., & Patterson, D. A. (2011). Computer architecture: a quantitative approach. Elsevier.
[11] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., ... & Yoon, D. H. (2017, June). In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture (pp. 1-12).
[12] Gupta, U., Wu, C. J., Wang, X., Naumov, M., Reagen, B., Brooks, D., ... & Zhang, X. (2020, February). The architectural implications of Facebook's DNN-based personalised recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 488-501). IEEE.
[13] Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020, August). Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters, in Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 3505-3506).
[14] Kherraf, N., Alameddine, H. A., Sharafeddine, S., Assi, C. M., & Ghrayeb, A. (2019). Optimized provisioning of edge computing resources with heterogeneous workload in IoT networks. IEEE Transactions on Network and Service Management, 16(2), 459-474.
[15] Ma, X., Wang, S., Zhang, S., Yang, P., Lin, C., & Shen, X. (2019). Cost-efficient resource provisioning for dynamic requests in cloud-assisted mobile edge computing. IEEE Transactions on Cloud Computing, 9(3), 968-980.
[16] Verma, N., Jia, H., Valavi, H., Tang, Y., Ozatay, M., Chen, L. Y., & Deaville, P. (2019). In-memory computing: Advances and prospects. IEEE solid-state circuits magazine, 11(3), 43-55.
[17] Ielmini, D., & Wong, H. S. P. (2018). In-memory computing with resistive switching devices. Nature Electronics, 1(6), 333-343.
[18] Karataş, G., Can, F., Doğan, G., Konca, C., & Akbulut, A. (2017, September). Multi-tenant architectures in the cloud: A systematic mapping study. In 2017 International Artificial Intelligence and Data Processing Symposium (IDAP) (pp. 1-4). IEEE.
[19] Mangla, N., Singh, M., & Rana, S. K. (2016). Resource Scheduling in Cloud Environments: A Survey. Advances in Science and Technology. Research Journal, 10(30), 38-50.
[20] Zhan, Z. H., Liu, X. F., Gong, Y. J., Zhang, J., Chung, H. S. H., & Li, Y. (2015). Cloud computing resource scheduling and a survey of its evolutionary approaches. ACM Computing Surveys (CSUR), 47(4), 1-33.