Federated Learning in Practice: Building Collaborative Models While Preserving Privacy
DOI:
https://doi.org/10.63282/3050-922X.IJERET-V3I2P109Keywords:
Federated Learning, Privacy-Preserving Machine Learning, Distributed Systems, Model Aggregation, Edge Computing, GDPR, Differential PrivacyAbstract
Federated Learning (FL) is a new method in the machine learning context that enables the training of models on decentralized devices with local data samples without transferring them. This conceptual shift offers significant benefits in terms of privacy protection, scalability, and the distribution of computing. This paper provides a detailed analysis of Federated Learning in practice, focusing on its architecture, protocols, methods for maintaining privacy, and practical applications. The literature reviews commence with the inspiration of FL, driven by increasing concerns about data protection laws such as GDPR and HIPAA. We talk about how FL can help in mitigating centralized data breaches by allowing edge devices to jointly learn a common model of prediction, but without performing the model training off-device. An extensive survey of the literature is provided, covering the current state of the art and systems that existed prior to 2022. The paper then proceeds to the methodology of FL, further describing the model aggregation techniques (FedAvg, FedProx), the design of the systems, and secure multi-party computation. We further present simulation and real-time experiment results on FL stays, such as TensorFlow Federated and PySyft. By applying a comparative study, the research establishes the potential of FL in sectors such as healthcare, finance and intelligent devices. Final observations indicate that through FL, an important privacy issue is resolved; however, open issues remain, including model drift, communication overhead, and heterogeneity. This paper serves as a primary source for scientists and others seeking to understand and apply federated learning in privacy-conscious settings
References
[1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and statistics (pp. 1273-1282). PMLR.
[2] Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... & Seth, K. (2017, October). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175-1191).
[3] Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and trends® in theoretical computer science, 9(3–4), 211-407.
[4] Acar, A., Aksu, H., Uluagac, A. S., & Conti, M. (2018). A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys (Csur), 51(4), 1-35.
[5] Mohassel, P., & Zhang, Y. (2017, May). Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 19-38). IEEE.
[6] Geyer, R. C., Klein, T., & Nabi, M. (2017). Differentially private federated learning: A client-level perspective. arXiv preprint arXiv:1712.07557.
[7] Ryffel, T., Trask, A., Dahl, M., Wagner, B., Mancuso, J., Rueckert, D., & Passerat-Palmbach, J. (2018). A generic framework for privacy-preserving deep learning. arXiv preprint arXiv:1811.04017.
[8] Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with non-IID data. arXiv preprint arXiv:1806.00582.
[9] Hardy, S., Henecka, W., Ivey-Law, H., Nock, R., Patrini, G., Smith, G., & Thorne, B. (2017). Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677.
[10] Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), 1-19.
[11] Ziller, A., Trask, A., Lopardo, A., Szymkow, B., Wagner, B., Bluemke, E., ... & Kaissis, G. (2021). Pysyft: A library for easy federated learning. In Federated learning systems: Towards next-generation AI (pp. 111-139). Cham: Springer International Publishing.
[12] Sheller, M. J., Edwards, B., Reina, G. A., Martin, J., Pati, S., Kotrotsou, A., ... & Bakas, S. (2020). Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific reports, 10(1), 12598.
[13] Bonawitz, K., Kairouz, P., McMahan, B., & Ramage, D. (2021). Federated learning and privacy: Building privacy-preserving systems for machine learning and data science on decentralized data. Queue, 19(5), 87-114.
[14] Lyu, L., Yu, J., Nandakumar, K., Li, Y., Ma, X., Jin, J., ... & Ng, K. S. (2020). Towards Fair and Privacy-Preserving Federated Deep Models. IEEE Transactions on Parallel and Distributed Systems, 31(11), 2524-2541.
[15] Li, L., Fan, Y., Tse, M., & Lin, K. Y. (2020). A review of applications in federated learning. Computers & Industrial Engineering, 149, 106854.
[16] Zhang, C., Xia, J., Yang, B., Puyang, H., Wang, W., Chen, R., ... & Yan, F. (2021, November). Citadel: Protecting data privacy and model confidentiality for collaborative learning. In Proceedings of the ACM symposium on cloud computing (pp. 546-561).
[17] Liu, B., Jiang, Y., Sha, F., & Govindan, R. (2012, November). Cloud-enabled privacy-preserving collaborative learning for mobile sensing. In Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems (pp. 57-70).
[18] Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., & Gao, Y. (2021). A survey on federated learning. Knowledge-Based Systems, 216, 106775.
[19] Aledhari, M., Razzak, R., Parizi, R. M., & Saeed, F. (2020). Federated learning: A survey on enabling technologies, protocols, and applications. IEEE Access, 8, 140699-140725.
[20] Savazzi, S., Nicoli, M., Bennis, M., Kianoush, S., & Barbieri, L. (2021). Opportunities of federated learning in connected, cooperative, and automated industrial systems. IEEE Communications Magazine, 59(2), 16-21.
[21] Pappula, K. K., & Anasuri, S. (2020). A Domain-Specific Language for Automating Feature-Based Part Creation in Parametric CAD. International Journal of Emerging Research in Engineering and Technology, 1(3), 35-44. https://doi.org/10.63282/3050-922X.IJERET-V1I3P105
[22] Rahul, N. (2020). Vehicle and Property Loss Assessment with AI: Automating Damage Estimations in Claims. International Journal of Emerging Research in Engineering and Technology, 1(4), 38-46. https://doi.org/10.63282/3050-922X.IJERET-V1I4P105
[23] Enjam, G. R. (2020). Ransomware Resilience and Recovery Planning for Insurance Infrastructure. International Journal of AI, BigData, Computational and Management Studies, 1(4), 29-37. https://doi.org/10.63282/3050-9416.IJAIBDCMS-V1I4P104
[24] Pappula, K. K., & Anasuri, S. (2021). API Composition at Scale: GraphQL Federation vs. REST Aggregation. International Journal of Emerging Trends in Computer Science and Information Technology, 2(2), 54-64. https://doi.org/10.63282/3050-9246.IJETCSIT-V2I2P107
[25] Pedda Muntala, P. S. R. (2021). Integrating AI with Oracle Fusion ERP for Autonomous Financial Close. International Journal of AI, BigData, Computational and Management Studies, 2(2), 76-86. https://doi.org/10.63282/3050-9416.IJAIBDCMS-V2I2P109
[26] Rahul, N. (2021). AI-Enhanced API Integrations: Advancing Guidewire Ecosystems with Real-Time Data. International Journal of Emerging Research in Engineering and Technology, 2(1), 57-66. https://doi.org/10.63282/3050-922X.IJERET-V2I1P107
[27] Enjam, G. R., Chandragowda, S. C., & Tekale, K. M. (2021). Loss Ratio Optimization using Data-Driven Portfolio Segmentation. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(1), 54-62. https://doi.org/10.63282/3050-9262.IJAIDSML-V2I1P107