Operationalizing MLOps with Databricks Pipelines: Scalable Machine Learning in Cloud Environments
DOI:
https://doi.org/10.32628/CSEIT25113573Keywords:
Databricks Pipelines, MLOps, Machine Learning Deployment, Model Governance, MLflow, Delta Lake, Cloud Computing, Model Monitoring, Data Engineering, Scalable AI SystemsAbstract
The operationalization of machine learning models at scale remains a central challenge for data-driven enterprises due to complexities in deployment automation, governance, and post-deployment performance management. This article presents a structured MLOps framework leveraging Databricks pipelines to enable scalable, secure, and continuously monitored machine learning model deployment in cloud environments. The proposed approach integrates Delta Lake and Delta Live Tables for reliable data ingestion and transformation, MLflow for experiment tracking and model lifecycle management, and Databricks Model Serving for real-time and batch inference. The study demonstrates how automated pipelines support end-to-end orchestration of data preprocessing, distributed model training, hyperparameter optimization, deployment, and continuous performance monitoring. Built-in observability mechanisms and drift detection techniques are employed to ensure sustained model accuracy and reliability in production. By utilizing cloud-native infrastructure across AWS and Azure, the framework enhances scalability, fault tolerance, and operational efficiency while reducing manual intervention. The results highlight measurable improvements in deployment speed, model governance, and system reliability, underscoring the effectiveness of Databricks-based MLOps pipelines for enterprise-grade machine learning systems.
Downloads
References
Zaharia, M., et al., “Apache Spark: A Unified Engine for Big Data Processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016. DOI: https://doi.org/10.1145/2934664
Breck, E., et al., “The ML Test Score: A Rubric for ML Production Readiness,” Proc. IEEE Big Data, 2017. DOI: https://doi.org/10.1109/BigData.2017.8258038
Sculley, D., et al., “Hidden Technical Debt in Machine Learning Systems,” NIPS, 2015.
Baylor, D., et al., “TFX: A TensorFlow-Based Production-Scale ML Platform,” Proc. KDD, 2017. DOI: https://doi.org/10.1145/3097983.3098021
Amershi, S., et al., “Software Engineering for Machine Learning: A Case Study,” ICSE, 2019. DOI: https://doi.org/10.1109/ICSE-SEIP.2019.00042
Sandeep Kamadi. (2022). Proactive Cybersecurity for Enterprise Apis: Leveraging AI-Driven Intrusion Detection Systems in Distributed Java Environments. International Journal of Research in Computer Applications and Information Technology (IJRCAIT), 5(1), 34-52. https://iaeme.com/MasterAdmin/Journal_uploads/IJRCAIT/VOLUME_5_ISSUE_1/IJRCAIT_05_01_004.pdf DOI: https://doi.org/10.34218/IJRCAIT_05_01_004
Polyzotis, N., et al., “Data Lifecycle Challenges in Production Machine Learning,” SIGMOD, 2018. DOI: https://doi.org/10.1145/3299887.3299891
Gujjala, Praveen Kumar Reddy. (2022). Data science pipelines in lakehouse architectures: A scalable approach to big data analytics. World Journal of Advanced Research and Reviews. 16. 1412-1425. 10.30574/wjarr.2022.16.3.1305. DOI: https://doi.org/10.30574/wjarr.2022.16.3.1305
Shankar, S., et al., “Model Monitoring and Model Maintenance,” Stanford ML Systems Seminar, 2019.
Villalobos, M., et al., “MLOps: Continuous Delivery and Automation Pipelines in Machine Learning,” IEEE Software, vol. 38, no. 5, pp. 56–63, 2021.
Hummer, W., et al., “ModelOps: Cloud-Based Lifecycle Management for ML Models,” IEEE Cloud Computing, vol. 6, no. 2, pp. 28–35, 2019. DOI: https://doi.org/10.1109/IC2E.2019.00025
Sandeep Kamadi. (2022). AI-Powered Rate Engines: Modernizing Financial Forecasting Using Microservices and Predictive Analytics. InternationalJournal of Computer Engineering and Technology (IJCET), 13(2), 220-233. https://iaeme.com/MasterAdmin/Journal_uploads/IJCET/VOLUME_13_ISSUE_2/IJCET_13_02_024.pdf DOI: https://doi.org/10.34218/IJCET_13_02_024
MLOps Definition and Benefits https://www.databricks.com/glossary/mlops
AI and machine learning on Databricks https://docs.databricks.com/aws/en/machine-learning/
Delta Live Tables Databricks Documentation https://docs.databricks.com/aws/en/delta-live-tables/
Chandra Sekhar Oleti. (2022). Serverless Intelligence: Securing J2ee-Based Federated Learning Pipelines on AWS. International Journal of Computer Engineering and Technology (IJCET), 13(3), 163-180. https://iaeme.com/MasterAdmin/Journal_uploads/IJCET/VOLUME_13_ISSUE_3/IJCET_13_03_017.pdf DOI: https://doi.org/10.34218/IJCET_13_03_017
MLOps workflows https://docs.databricks.com/aws/en/machine-learning/mlops/mlops-workflow
Aritra Ghosh, How to orchestrate MLOps by using Azure Databricks? https://www.linkedin.com/pulse/how-orchestrate-mlops-using-azure-databricks-aritra-ghosh/
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.