Preventive Data Quality Enforcement at the Source: A Shift-Left Approach for FinTech and HealthTech
DOI:
https://doi.org/10.63282/3050-922X.ICRCEDA25-106Keywords:
Compliance, Data Quality, Data Governance, FinTech, HealthTech, Preventive Validation, Shift Left, Source Data Control, Data Management, Data Stewardship, Metadata ManagementAbstract
Data quality problems in enterprise datasets lead to significant costs, risks, and inefficiencies. This paper proposes a shift-left approach to data quality, enforcing validation and cleansing at the point of data entry (the source) rather than downstream. By preventing erroneous or incomplete data from entering systems, organizations – particularly in financial technology (FinTech) and healthcare technology (HealthTech) sectors – can reduce costly downstream cleaning, comply with stringent regulations, and improve the reliability of analytics. We present a framework for preventive data quality enforcement, discuss supporting tools and technologies, and model the return on investment (ROI) of early data quality interventions. We also make an argument to how early prevention, and the proposed Shift Left mechanism, can help organizational practitioners to save as much as 60% of their resources on time and monetary budget, helping them drive superior outcomes in both operational efficiency and strategic decision-making
References
[1] T. C. Redman, “Bad Data Costs the U.S. $3 Trillion per Year,” Harvard Business Review, Sep. 2016.
[2] G. Press, “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says,” Forbes, Mar. 2016.
[3] Basel Committee on Banking Supervision, Principles for Effective Risk Data Aggregation and Risk Reporting (BCBS 239), Bank for International Settlements, Jan. 2013.
[4] N. G. Weiskopf and C. Weng, “Methods for assessment of electronic health record data quality and reuse for clinical research,” Journal of the American Medical Informatics Association, vol. 20, no. 1, pp. 144–151, 2013.
[5] B. Boehm and P. Papaccio, “Understanding and controlling software costs,” IEEE Transactions on Software Engineering, vol. 14, no. 10, pp. 1462–1477, 1988.
[6] D. P. Ballou and H. L. Pazer, “Modeling data and process quality in multi-input, multi-output information systems,” Management Science, vol. 31, no. 2, pp. 150–162, 1985.
[7] Haug, F. Zachariassen, and D. van Liempd, “The costs of poor data quality,” Journal of Industrial Engineering and Management, vol. 4, no. 2, pp. 168–193, 2011.
[8] IBM Corporation, “What is shift-left testing?,” IBM Think Blog, Oct. 2021, [Online]. Available: https://www.ibm.com/think/topics/shift-left-testing.
[9] S. Ester and P. Huppertz, "Great Expectations: Data Validation for Modern Data Teams," in Proceedings of the 2021 Conference on Data Engineering, 2021, pp. 245-258.
[10] N. Marz and J. Warren, Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications, 2015.
[11] S. Schelter, D. Lange, P. Schmidt, M. Celikel, F. Biessmann, and A. Grafberger, "Automating Large-Scale Data Quality Verification," Proceedings of the VLDB Endowment, vol. 11, no. 12,pp. 1781-1794, 2018.
[12] Gartner, Inc., "Magic Quadrant for Data Quality Solutions," Gartner Research, 2023.
[13] T. Friedman and M. Smith, "Market Guide for Data Quality Solutions," Gartner Research, 2022.
[14] J. Kreps, N. Narkhede, and J. Rao, "Kafka: A distributed messaging system for log processing," 2, vol. 11, 2011, pp. 1-7 "ISO 20022 Financial Services - Universal financial industry message scheme," International Organization for Standardization, 2013.
[15] J. M. Hellerstein, M. Stonebraker, and J. Hamilton, "Architecture of a Database System," Foundations and Trends in Databases, vol. 1, no. 2, pp. 141-259, 2007.