Master Data Management in Multi-Cloud Environments: A Survey with Operational Evidence from Banking and Insurance Deployments

Authors

  • Kuladeep Sandra Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-922X.IJERET-V6I3P119

Keywords:

Master Data Management, Entity Resolution, Multi-Cloud Architecture, Data Governance, Privacy Constraints, Data Residency, Record Linkage

Abstract

Master Data Management (MDM) is exponentially harder in multi-cloud, multi-region, multi-domain environments than in the single-cloud monolithic settings for which classical MDM patterns were designed. We survey the evolution of MDM patterns in light of operational evidence drawn from a hybrid on-premises and multi-cloud deployment spanning 6 business units, 14 source systems, and roughly 4 million source identifiers that consolidate to approximately 2.5 million unique entities. We examine four architectural patterns (centralized hub, distributed registry, federated masters, and hybrid) and report that data residency constraints make centralized hubs operationally infeasible for our context, while federated masters with reconciliation contracts have proven viable. We report production entity resolution results: 92% precision and 45% recall for deterministic rules, 87% and 62% for probabilistic Fellegi-Sunter matching, and 89% precision with 76% recall for a learned Siamese model, with cross-region privacy constraints reducing matching accuracy by 3 to 5 percent. We describe a compliance audit that discovered Personally Identifiable Information (PII) in 14 unknown analytical tables, requiring 40TB of deletion and a rebuild of the pseudonymization pipeline. We discuss governance structures that succeed where centralized governance creates bottlenecks, and identify open research challenges in private entity resolution, cross-organizational governance, and validation of MDM correctness under GDPR and CCPA constraints.

References

[1] P. Christen, Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Berlin, Germany: Springer, 2012. https://scholar.google.com/scholar?q=Data+Matching+Concepts+and+Techniques+for+Record+Linkage+Christen

[2] H. Köpcke, A. Thor, and E. Rahm, "Evaluation of entity resolution approaches on real-world match problems," Proc. VLDB Endowment, vol. 3, no. 1, pp. 484–493, 2010. https://scholar.google.com/scholar?q=Evaluation+of+entity+resolution+approaches+on+real-world+match+problems

[3] F. Provost, M. Allen, and S. Rogers, Practical Master Data Management. Sebastopol, CA: O'Reilly Media, 2018. https://scholar.google.com/scholar?q=Practical+Master+Data+Management

[4] E. A. Durham, M. Kantarcioglu, Y. Xue, C. Toth, M. Kuzu, and B. Malin, "Composite Bloom filters for secure record linkage," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 12, pp. 2956–2968, 2014. https://scholar.google.com/scholar?q=Composite+Bloom+filters+for+secure+record+linkage

[5] J. Vaidya, C. Clifton, and M. Zhu, Privacy-Preserving Data Mining. Boston, MA: Springer, 2005. https://scholar.google.com/scholar?q=Privacy-Preserving+Data+Mining+Vaidya

[6] D. Loshin, Master Data Management. Burlington, MA: Morgan Kaufmann, 2010. https://scholar.google.com/scholar?q=Master+Data+Management+Loshin

[7] W. H. Inmon, B. O'Neil, and L. Fryman, Master Data Management and Enterprise Information Management. Basking Ridge, NJ: Technics Publications, 2008. https://scholar.google.com/scholar?q=Master+Data+Management+and+Enterprise+Information+Management+Inmon

[8] DAMA International, DAMA-DMBoK: Data Management Body of Knowledge, 2nd ed. Basking Ridge, NJ: Technics Publications, 2017https://scholar.google.com/scholar?q=DAMA-DMBoK+Data+Management+Body+of+Knowledge.

[9] P. P. Tallon, "Corporate governance of big data: Perspectives on value, risk, and cost," IEEE Computer, vol. 46, no. 6, pp. 32–38, 2013. https://scholar.google.com/scholar?q=Corporate+governance+of+big+data+Tallon

[10] I. P. Fellegi and A. B. Sunter, "A theory for record linkage," Journal of the American Statistical Association, vol. 64, no. 328, pp. 1183–1210, 1969. https://scholar.google.com/scholar?q=A+theory+for+record+linkage+Fellegi+Sunter

[11] Apache Software Foundation, "Apache Iceberg table format specification v2," Technical Documentation, 2024. [Online]. Available: https://iceberg.apache.org/spec/https://scholar.google.com/scholar?q=Apache+Iceberg+table+format+specification

[12] S. Shankar, R. Garcia, J. M. Hellerstein, and A. G. Parameswaran, "Operationalizing machine learning: Challenges and best practices," IEEE Software, vol. 41, no. 2, pp. 42–51, 2024. https://scholar.google.com/scholar?q=Operationalizing+machine+learning+Challenges+and+best+practices

[13] M. Stonebraker and I. F. Ilyas, "Data integration: The current status and the way forward," IEEE Data Engineering Bulletin, vol. 41, no. 2, pp. 3–9, 2018. https://scholar.google.com/scholar?q=Data+integration+The+current+status+and+the+way+forward

Downloads

Published

2025-09-22

Issue

Section

Articles

How to Cite

1.
Sandra K. Master Data Management in Multi-Cloud Environments: A Survey with Operational Evidence from Banking and Insurance Deployments. IJERET [Internet]. 2025 Sep. 22 [cited 2026 May 19];6(3):152-7. Available from: https://ijeret.org/index.php/ijeret/article/view/564