Toggle accesibility mode

Conference paper

Azure Kubernetes Service Design Principles in Machine Learning Systems

Y. Bershchanskyi, H. Klym (Lviv Polytechnic National Univ., Ukraine)

The deployment of machine learning (ML) systems at scale necessitates a robust, flexible, and well-orchestrated infrastructure. Azure Kubernetes Service (AKS) has emerged as a key platform for managing ML workloads, offering scalability, automation, and integration with cloud-native AI services. This article explores the fundamental design principles for architecting ML systems on AKS, focusing on scalability, security, cost efficiency, and operational reliability. Key architectural considerations are analyzed, including cluster resource management, model training and deployment strategies, and observability practices. Furthermore, security and governance frameworks are examined to ensure compliance and data protection in ML workflows. Real-world case studies and best practices illustrate successful implementations of ML on AKS across various industries. Finally, emerging trends and challenges are discussed, emphasizing the continuous evolution of Kubernetes-based ML infrastructures and the need for adaptive design strategies in cloud-native AI ecosystems.

Download one page abstract

Receipt of papers:

March 15th, 2025

Notification of acceptance:

April 30th, 2025

Registration opening:

May 2nd, 2025

Final paper versions:

May 15th, 2025