Data Quality in the Age of Big Data: Challenges and Best Practices

Authors

  • Sonika Darshan Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-922X.IJERET-V2I3P105

Keywords:

Big Data, Data Quality, Data Governance, Metadata Management, Data Architecture, Data Integration

Abstract

In the era of big data, more and more organizations use huge, various, and fast-moving big data to perform data analysis, create products, and make critical decisions. Nonetheless, this has compounded the data quality issue, an important parameter for obtaining meaningful and reliable statistics from the data. Conventional concepts such as accuracy, completeness, and data consistency do not fully correspond to the conditions of the big data veins: volume, variety, velocity, and veracity. The present research aims to discuss the existing and emerging facets of data quality in big data, including various issues like heterogeneity, real-time validation for enormous datasets, absence of standardization, and lack of metadata. It discusses previous studies and tools that check the quality of data according to the multiple and complex qualities, such as availability, usability, reliability, relevance, and presentation of the data. In the following section, we discuss the data governance process and automated validation, the importance of data metadata management, and how emerging research applications can be used in identifying anomaly and data quality prediction. Moreover, the paper highlights the validity issues and the necessity to monitor, protect, and respect people’s data with regard to relevant legislation. This day highlights the trends of growing integration between quality assurance and data architecture, and the future trends consider semantic expansion, quality control on the edge level, and the accountability. This work is intended to serve as a reference for researchers, practitioners, and organizations on how to create sustainable solutions for data quality management for big data systems

References

[1] Cai, L., & Zhu, Y. (2015). The challenges of data quality and data quality assessment in the big data era. Data science journal, 14, 2-2.

[2] Abdullah, N., Ismail, S. A., Sophiayati, S., & Sam, S. M. (2015). Data quality in big data: a review. Int. J. Advance Soft Compu. Appl, 7(3), 17-27.

[3] Ramasamy, A., & Chowdhury, S. (2020). Big data quality dimensions: a systematic literature review. JISTEM-Journal of Information Systems and Technology Management, 17, e202017003.

[4] Saha, B., & Srivastava, D. (2014, March). Data quality: The other face of big data. In 2014 IEEE 30th International Conference on data engineering (pp. 1294-1297). IEEE.

[5] Cai, L., & Zhu, Y. (2015). Data quality and data quality assessment challenges in the big data era. Data Science Journal, 14, 2-2.

[6] Taleb, I., Serhani, M. A., & Dssouli, R. (2018, July). Big data quality: A survey. In 2018 IEEE International Congress on Big Data (BigData Congress) (pp. 166-173). IEEE.

[7] Kwon, O., Lee, N., & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. International journal of information management, 34(3), 387-394.

[8] Batini, C., Rula, A., Scannapieco, M., & Viscusi, G. (2015). From data quality to big data quality. Journal of Database Management (JDM), 26(1), 60-82.

[9] Cappiello, C., Pernici, B., & Villani, L. (2014, October). Strategies for data quality monitoring in business processes. In International Conference on Web Information Systems Engineering (pp. 226-238). Cham: Springer International Publishing.

[10] Bhaskaran, S. V. (2020). Integrating data quality services (dqs) in big data ecosystems: Challenges, best practices, and opportunities for decision-making. Journal of Applied Big Data Analytics, Decision-Making, and Predictive Modelling Systems, 4(11), 1-12.

[11] Furner, J. (2020). Definitions of “metadata”: A brief survey of international standards. Journal of the Association for Information Science and Technology, 71(6), E33-E42.

[12] Kwon, O., Lee, N., & Shin, B. (2014). Data quality management, data usage experience, and acquisition intention of big data analytics. International journal of information management, 34(3), 387-394.

[13] Batini, C., Rula, A., Scannapieco, M., & Viscusi, G. (2015). From data quality to big data quality. Journal of Database Management (JDM), 26(1), 60-82.

[14] Bickmore, T. W. (1994). Real-time sensor data validation (No. E-8672).

[15] Becker, D., King, T. D., & McMullen, B. (2015, October). Big data, big data quality problem. In 2015 IEEE international conference on big data (big data) (pp. 2644-2653). IEEE.

[16] Abdallah, M. (2019, February). Big data quality challenges. In 2019 International Conference on Big Data and Computational Intelligence (ICBDCI) (pp. 1-3). IEEE.

[17] Dautov, R., & Distefano, S. (2017, December). Quantifying volume, velocity, and variety to support (Big) data-intensive application development. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 2843-2852). IEEE.

[18] Karkouch, A., Mousannif, H., Al Moatassime, H., & Noel, T. (2016, May). A model-driven architecture-based data quality management framework for the Internet of Things. In 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech) (pp. 252-259). IEEE.

[19] Jarke, M., Jeusfeld, M. A., Quix, C., & Vassiliadis, P. (1998). Architecture and quality in data warehouses. In Advanced Information Systems Engineering: 10th International Conference, CAiSE'98 Pisa, Italy, June 8–12, 1998 Proceedings 10 (pp. 93-113). Springer Berlin Heidelberg.

[20] Nelson, C., Lindell, M., Hopkins, E., Abramowitz, A., Hinkley, P., de Kerchove, G. & Flynn-Heapes, E. (2007). Managing quality in architecture. Routledge.

Downloads

Published

2021-10-31

Issue

Section

Articles

How to Cite

1.
Darshan S. Data Quality in the Age of Big Data: Challenges and Best Practices. IJERET [Internet]. 2021 Oct. 31 [cited 2025 Sep. 22];2(3):43-52. Available from: https://ijeret.org/index.php/ijeret/article/view/111