GPU Fleet FinOps: Scheduling, Right-Sizing, and Cost Governance for DGX, MIG, and Preemptible Capacity

Santosh Pashikanti

doi:10.63282/3050-922X.AECTIC-104

Authors

Santosh Pashikanti Lead Cloud Architect, Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-922X.AECTIC-104

Keywords:

GPU Fleet Finops, Cost Governance, GPU Scheduling, Right-Sizing, NVIDIA DGX, Multi-Instance GPU (MIG), Preemptible Capacity, Kubernetes, Unit Economics

Abstract

As organizations increasingly scale Generative AI (GenAI) and Large Language Model (LLM) workloads, GPU-accelerated computing has become the dominant line item in cloud expenditure ¹,.² This paper presents a "GPU Fleet FinOps" blueprint, a unified operating model for the financial and operational optimization of large-scale GPU fleets, including NVIDIA DGX systems, MIG-partitioned GPUs, and preemptible (spot) capacity. We identify critical, unaddressed challenges: chronic low utilization of premium hardware ³, ⁴, ⁵, "capacity island" fragmentation from Multi-Instance GPU (MIG) ⁶, ⁷, ⁸, ⁹, the high failure rate of workloads on preemptible instances ¹⁰, ¹¹, ¹², and a lack of financial accountability. We propose an integrated solution built on three pillars: a FinOps-aware GPU scheduling layer ¹³, policy-driven right-sizing with quota management ¹⁴, ¹⁵, and robust, interruption-aware job design ¹⁶,.¹⁷ This framework connects low-level scheduling and governance decisions to business-centric unit economics, such as "cost per training run" and "cost per 1k inferences" ¹⁸, ¹⁹, providing a practical architecture for aligning high-performance GPU investments with measurable business value

References

[1] Federated Learning-based Personalized Recommendation Systems: An Overview on Security and Privacy Challenges - CyberSecDome, accessed October 29, 2025, https://cybersecdome.eu/wp-content/uploads/2024/01/IEEE-Transactions-on-Consumer- Electronics-Federated-Learning-based-Personalized-Recommendati-2023.pdf

[2] (PDF) E-commerce Personalized Recommendations: a Deep Neural Collaborative Filtering Approach - ResearchGate, accessed October 29, 2025, https://www.researchgate.net/publication/377081826_E- commerce_Personalized_Recommendations_a_Deep_Neural_Collaborative_Filtering_Ap proach

[3] (PDF) Federated Learning on Recommender Systems - ResearchGate, accessed October 29, 2025, https://www.researchgate.net/publication/388088244_Federated_Learning_on_Recommen der_Systems

[4] Federated Learning on Recommender Systems - IEEE Computer Society, accessed October 29, 2025, https://www.computer.org/csdl/proceedings- article/bigdata/2024/10825895/23yjUjOpcOY

[5] (PDF) A Survey on Federated Recommendation Systems, accessed October 29, 2025, https://www.researchgate.net/publication/366821201_A_Survey_on_Federated_Recomme ndation_Systems

[6] Recommendation Systems Using Federated Learning - Meegle, accessed October 29, 2025, https://www.meegle.com/en_us/topics/recommendation- algorithms/recommendation-systems-using-federated-learning

[7] Analysis of Privacy Preservation Enhancements in Federated Learning Frameworks - Shaping the Future of IoT with Edge Intelligence - NCBI, accessed October 29, 2025, https://www.ncbi.nlm.nih.gov/books/NBK602365/

[8] Digital Markets Act Summary: EU DMA Law Explained - Usercentrics, accessed October 29, 2025, https://usercentrics.com/knowledge-hub/digital-markets-act-dma-impacts-user- privacy-and-consent-management/

[9] The Digital Markets Act: Shaping Fair Competition in the Digital Age, accessed October 29, 2025, https://business.trustedshops.com/blog/digital-markets-act

[10] Federated Learning: The Decentralized Revolution Transforming AI While Preserving Privacy | by Nicolasseverino | Oct, 2025 | Medium, accessed October 29, 2025, https://medium.com/@nicolasseverino/federated-learning-the-decentralized-revolution- transforming-ai-while-preserving-privacy-2e0a0122d8b8

[11] How is federated learning used in personalized recommendations?, accessed October 29, 2025, https://milvus.io/ai-quick-reference/how-is-federated-learning-used-in-personalized- recommendations

[12] Federated Learning: A Privacy-Preserving Approach to ... - Netguru, accessed October 29, 2025, https://www.netguru.com/blog/federated-learning

[13] Low-Latency Collaborative Predictive Maintenance: Over-the-Air Federated Learning in Noisy Industrial Environments - MDPI, accessed October 29, 2025, https://www.mdpi.com/1424-8220/23/18/7840

[14] LoLaFL: Low-Latency Federated Learning via Forward-only ..., accessed October 29, 2025, https://arxiv.org/abs/2412.14668

[15] Privacy-Preserving Federated Learning - Hasso-Plattner-Institut, accessed October 29, 2025, https://hpi.de/arnrich/research-areas/privacy-preserving-federated-learning.html

[16] (PDF) Federated Learning Architectures for Privacy-Preserving Artificial Intelligence Applications on Edge Devices - ResearchGate, accessed October 29, 2025, https://www.researchgate.net/publication/392749199_Federated_Learning_Architectures_f or_Privacy-Preserving_Artificial_Intelligence_Applications_on_Edge_Devices

[17] Federated Learning for Cybersecurity: A Privacy-Preserving Approach, accessed October 29, 2025, https://www.mdpi.com/2076-3417/15/12/6878

[18] State of Cloud Costs | Datadog, accessed November 18, 2025, https://www.datadoghq.com/state-of-cloud-costs/

[19] Optimizing GenAI Usage: A FinOps Perspective on Cost ..., accessed November 18, 2025, https://www.finops.org/wg/optimizing-genai-usage/

[20] How one company went from 28% GPU utilization to 73% with Run:ai, accessed November 18, 2025, https://pages.run.ai/hubfs/PDFs/Case-Study-from-28-to-73-percent- GPU-Utilization.pdf

[21] (PDF) Flex-MIG: Enabling Distributed Execution on MIG, accessed November 18, 2025, https://www.researchgate.net/publication/397556277_Flex- MIG_Enabling_Distributed_Execution_on_MIG

[22] Maximize GPU Efficiency: Smarter Fixes for Checkpointing Challenges - Clockwork.io, accessed November 18, 2025, https://clockwork.io/blog/maximize-gpu-efficiency-smarter- fixes-for-checkpointing-challenges/

[23] Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances - arXiv, accessed November 18, 2025, https://arxiv.org/html/2403.14097v1

[24] Batch Scheduling on Kubernetes: Comparing Apache YuniKorn ..., accessed November 18, 2025, https://www.infracloud.io/blogs/batch-scheduling-on-kubernetes/

[25] Compare Custom Schedulers for Kubernetes - Rafay Product ..., accessed November 18, 2025, https://docs.rafay.co/blog/2024/10/11/compare-custom-schedulers-for-kubernetes/

[26] A Slurm on Kubernetes Implementation for HPC and ... - CoreWeave, accessed November 18, 2025, https://www.coreweave.com/blog/sunk-slurm-on-kubernetes-implementations

[27] [PDF] Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances, accessed November 18, 2025, https://www.semanticscholar.org/paper/01f6de9e8670b613f4eccf0a699acb45673c19c5

[28] How Nvidia DGX Cloud Uses Kyverno to Enforce Kubernetes Pod Security Standards, accessed November 18, 2025, https://nirmata.com/2024/12/15/how-nvidia-dgx-cloud-uses- kyverno/

[29] Cloud Cost Management for AI/ML Workloads - emma, accessed November 18, 2025, https://www.emma.ms/cloud-solutions/ai-cost-management

[30] FinOps for AI: A Guide To Managing AI Cloud Costs - ProsperOps, accessed November 18, 2025, https://www.prosperops.com/blog/finops-for-ai/

[31] AI and ML perspective: Cost optimization | Cloud Architecture Center ..., accessed November 18, 2025, https://docs.cloud.google.com/architecture/framework/perspectives/ai-ml/cost- optimization

[32] Role of AI in cloud cost optimization and FinOps (Financial Operations) - | World Journal of Advanced Engineering Technology and Sciences, accessed November 18, 2025, https://journalwjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0218.pdf

[33] FinOps For AI: How Crawl, Walk, Run Works For Managing AI Costs - CloudZero, accessed November 18, 2025, https://www.cloudzero.com/blog/finops-for-ai/

[34] DGX Platform: Built for Enterprise AI - NVIDIA, accessed November 18, 2025, https://www.nvidia.com/en-us/data-center/dgx-platform/

[35] Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking, accessed November 18, 2025, https://developer.nvidia.com/blog/measure- and-improve-ai-workload-performance-with-nvidia-dgx-cloud-benchmarking/

[36] Case Study: NVIDIA Boosts BMW Group's Production Efficiency with AI, accessed November 18, 2025, https://www.nvidia.com/en-us/customer-stories/bmw-optimizes- production-with-ai-and-dgx-systems/

[37] Scalable AI Infrastructure Accelerates Autonomous Vehicle Development, accessed November 18, 2025, http://images.nvidia.cn/content/dgx/dgx-1-zenuity-case-study-us- 675763-r7-web.pdf

[38] NVIDIA A100 Tensor Core GPU Architecture, accessed November 18, 2025, https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere- architecture-whitepaper.pdf.

[39] [2502.01909] A Multi-Objective Framework for Optimizing GPU-Enabled VM Placement in Cloud Data Centers with Multi-Instance GPU Technology - arXiv, accessed November 18, 2025, https://arxiv.org/abs/2502.01909

[40] More-efficient recovery from failures during large-ML-model training - Amazon Science, accessed November 18, 2025, https://www.amazon.science/blog/more-efficient-recovery- from-failures-during-large-ml-model-training

[41] Generative AI Cost Optimization Strategies | AWS Cloud Enterprise Strategy Blog, accessed November 18, 2025, https://aws.amazon.com/blogs/enterprise- strategy/generative-ai-cost-optimization-strategies/

[42] Monitoring GPU workloads on Amazon EKS using AWS managed open-source services, accessed November 18, 2025, https://aws.amazon.com/blogs/mt/monitoring-gpu- workloads-on-amazon-eks-using-aws-managed-open-source-services/

[43] Seamless Nvidia GPU Observability! | by Nitee Shah - Medium, accessed November 18, 2025, https://niteeshah95.medium.com/seamless-nvidia-gpu-observability-b8291e4fa2d1

[44] Best Practices for GPU Observability in Modern AI Infrastructure - Techstrong.ai, accessed November 18, 2025, https://techstrong.ai/social-facebook/best-practices-for-gpu- observability-in-modern-ai-infrastructure/

[45] Integrating observability stack into your Kubernetes cluster - Crusoe Cookbook, accessed November 18, 2025, https://cookbook.crusoe.ai/observability-kubernetes

[46] GPU observability in Azure Kubernetes Service (AKS) - Microsoft Learn, accessed November 18, 2025, https://learn.microsoft.com/en-us/azure/aks/monitor-gpu-metrics

[47] The FinOps playbook for AI: Optimizing costs and performance - Flexera, accessed November 18, 2025, https://www.flexera.com/blog/finops/the-finops-playbook-for-ai- optimizing-costs-and-performance/

GPU Fleet FinOps: Scheduling, Right-Sizing, and Cost Governance for DGX, MIG, and Preemptible Capacity

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications