A MULTI-LAYER DELTA LAKEHOUSE FOR EPIDEMIOLOGICAL MONITORING AND FORECASTING UNDER EMERGENCIES
Main Article Content
Abstract
Public health emergencies demand fast, dependable analytics that combine real-time signals with trustworthy historical data. Open, interoperable platforms that support streaming and batch workflows can shorten the time from detection to action while preserving data quality and auditability. Aim: To design and justify an information system architecture for analyzing epidemic threats under emergency conditions that is scalable, reliable, and fit for integration with clinical and non-traditional data sources. Methods: We conducted a structured review of three data analytics architectures (Lambda, Kappa, Delta) and mapped their strengths and limits to crisis surveillance needs. Based on functional and non-functional requirements, we specified a Delta Lake–based lakehouse with bronze-silver-gold tiers, unified batch/stream ingestion with Spark Structured Streaming, ACID tables with time travel and schema control, and an analytics layer that supports forecasting with MLOps for monitoring, drift checks, retraining, and lineage. Results: The proposed architecture meets core emergency needs for timeliness, integrity, and reproducibility through ACID transactions, versioned datasets, and curated tiers; supports standards-based interoperability and the inclusion of wastewater, mobility, and other environmental feeds; provides a single code path for batch and streaming to reduce reconciliation burden; and sets operational guardrails for latency versus cost when running many near-real-time tables. We outline practical considerations for quality checks in the silver tier, promotion rules to gold, and model governance. Conclusions: A Delta-based lakehouse offers a clear path to an emergency-ready surveillance platform that scales with data growth, integrates heterogeneous sources, and supports reliable forecasting. The next steps are a pilot deployment with public health partners, live latency and cost measurements, and prospective validation of forecasting and alerting in real-world settings.
Article Details
References
Dotsenko, N., Chumachenko, I., Kraivskyi, B., Railian, M. and Litvinov, A. (2024), “Methodological Support for Managing of Critical Competences in Agile Transformation Projects within a Multi-Project Medical Environment”, Advanced Information Systems, vol. 8, no. 4, pp. 26–33, doi: https://doi.org/10.20998/2522-9052.2024.4.04
(2023), Early Warning Alert and Response (EWAR) in Emergencies: An Operational Guide, World Health Organization, available at: https://www.who.int/publications/i/item/9789240063587
(2025), The Epidemic Intelligence from Open Sources Initiative, World Health Organization, available at: https://www.who.int/initiatives/eios
Williams, G.S., Koua, E.L., Abdelmalik, P., Kambale, F., Kibangou, E., Nguna, J., Okot, C., Akpan, G., Moussana, F., Kimenyi, J.P. and Gueye, A. S. (2025), “Evaluation of the Epidemic Intelligence from Open Sources (EIOS) System for the Early Detection of Outbreaks and Health Emergencies in the African Region”, BMC Public Health, vol. 25, 857, doi: https://doi.org/10.1186/s12889-025-21998-9
Singh, S., Ahmed, A.I., Almansoori, S., Alameri, S., Adlan, A., Odivilas, G., Chattaway, M.A., Salem, S.B., Brudecki, G. and Elamin, W. (2024), “A Narrative Review of Wastewater Surveillance: Pathogens of Concern, Applications, Detection Methods, and Challenges”, Frontiers in Public Health, vol. 12, 1445961, doi: https://doi.org/10.3389/fpubh.2024.1445961
(2025), Wastewater Surveillance for Emerging Pathogen Threats, National Academies of Sciences, Engineering, and Medicine, available at: https://www.ncbi.nlm.nih.gov/books/NBK610710/
(2025), Public Health US Situational Awareness Framework for Reporting Home - US Situational Awareness Framework for Reporting (US SAFR) Implementation Guide, V1.0.0, HL7 International, available at: https://build.fhir.org/ig/HL7/us-safr/
Essaid, S., Andre, J., Brooks, I.M., Hohman, K.H., Hull, M., Jackson, S.L., Kahn, M.G., Kraus, E.M., Mandadi, N., Martinez, A.K. and Soares A. (2024), “MENDS-On-FHIR: Leveraging the OMOP Common Data Model and FHIR Standards for National Chronic Disease Surveillance”, JAMIA Open, vol. 7, doi: https://doi.org/10.1093/jamiaopen/ooae045
Alhaffar, B.A., Abbara, A., Almhawish, N., Tarnas, M.C., AlFaruh, Y. and Eriksson, A. “The Early Warning and Response Systems in Syria: A Functionality and Alert Threshold Assessment”, IJID Regions, vol. 14, article number 100563, doi: https://doi.org/10.1016/j.ijregi.2024.100563
Alhammadi, O.A.S., Mohamed, H.I., Musa, S.S., Ahmed, M.M., Lemma, M.A., Joselyne, U., Roméo, B., Abdullahi, Y., Othman, Z.K., Hamid, M.R. and Okesanya O.J. (2024), “Advancing Digital Health in Yemen: Challenges, Opportunities, and Way Forward”, Exploration of Digital Health Technologies, vol. 2, pp. 369–386, doi: https://doi.org/10.37349/edht.2024.00035
Fieldhouse, J.K., Nakiire, L., Kayiwa, J., Mirzazadeh, A., Brindis, C.D., Mitchell, A., Sepulveda, J., Makumbi, I., Ario, A.R., Fair, E. and Lamorde M. (2025), “An Analysis of One Health Timeliness Metrics across Multisectoral Public Health Emergencies in Uganda”, Communications Medicine, vol. 5, 192, doi: https://doi.org/10.1038/s43856-025-00893-9
Brown, H.L., Pursley, I.G., Horton, D.L. and La, R.M. (2024), “One Health: A Structured Review and Commentary on Trends and Themes”, One Health Outlook, vol. 6, article number 17, doi: https://doi.org/10.1186/s42522-024-00111-x
Chen, B., Zhu, L., Da, W. and Cheng, J. (2021), “Research on the Design of Mass Recommendation System Based on Lambda Architecture”, Journal of Web Engineering, vol. 20, pp. 1971–1990, doi: https://doi.org/10.13052/jwe1540-9589.20614
Daki, H., El Hannani, A. and Ouahmane, H. (2020), “Big Data Architectures Benchmark for Forecasting Electricity Consumption”, 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech), Marrakesh, Morocco, pp. 1–6, doi: https://doi.org/10.1109/cloudtech49835.2020.9365912
Izonin, I., Tkachenko, R., Berezsky, O., Krak, I., Kováč, M. and Fedorchuk, M. (2024), “Improvement of the ANN-Based Prediction Technology for Extremely Small Biomedical Data Analysis”, Technologies, vol. 12, article number 112, doi: https://doi.org/10.3390/technologies12070112
Farki, A., Noughabi, E.A. (2023), “Real-Time Blood Pressure Prediction Using Apache Spark and Kafka Machine Learning”, 2023 9th Int. Conf. on Web Research, pp. 161–166, doi: https://doi.org/10.1109/icwr57742.2023.10138962
Chumachenko, D., Bazilevych, K., Butkevych, M., Meniailov, I., Parfeniuk, Y., Sidenko, I. and Chumachenko, T. (2024), “Methodology for Assessing the Impact of Emergencies on the Spread of Infectious Diseases” Radioelectronic and Computer Systems, vol. 3, pp. 6–26, doi: https://doi.org/10.32620/reks.2024.3.01
Bazilevych, K., Kyrylenko, O., Parfeniuk, Y. and Meniailov, I. (2025), “Emerging Technologies in Infectious Disease Surveillance and Control: Current Solutions and Future Directions”, Lecture notes in networks and systems, article number 1473, pp. 196–207, doi: https://doi.org/10.1007/978-3-031-94845-9_17
Cerezo, F., Cuesta, C.E., Moreno-Herranz, J.C. and Vela, B. (2019), “Deconstructing the Lambda Architecture: An Experience Report”, Proceedings - 2019 IEEE International Conference on Software Architecture - Companion, ICSA-C 2019, pp. 196–201, doi: https://doi.org/10.1109/icsa-c.2019.00042
Nkamla Penka, J.B., Mahmoudi, S. and Debauche, O. (2021), “A New Kappa Architecture for IoT Data Management in Smart Farming”, Procedia Computer Science, vol. 191, pp. 17–24, doi: https://doi.org/10.1016/j.procs.2021.07.006
Vouros, G., Glenis, A. and Doulkeridis, C. (2020), “The Delta Big Data Architecture for Mobility Analytics”, Proceedings - 2020 IEEE 6th International Conference on Big Data Computing Service and Applications, BigDataService 2020, pp. 25–32, doi: https://doi.org/10.1109/bigdataservice49289.2020.00012
Chen, Z., Shao, H., Li, Y., Lu, H. and Jin, J. (2022), “Policy-Based Access Control System for Delta Lake”, Proceedings - 2022 10th International Conference on Advanced Cloud and Big Data, CBD 2022, pp. 60–65, doi: https://doi.org/10.1109/cbd58033.2022.00020
Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., Torres, J., Van Hovell, H., Ionescu, A., Łuszczak, A. and Zaharia M. (2020), “Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores”, Proceedings of the VLDB Endowment, vol. 13, pp. 3411–3424, doi: https://doi.org/10.14778/3415478.3415560
(2025), Build Reliable Data Lakes with Delta Lake, Announcing Delta Lake 4.0 on Apache Spark™ 4.0, The Linux Foundation, Delta Lake, available at: https://delta.io/
Armbrust, M., Ghodsi, A., Xin, R. and Zaharia, M. (2021), “Lakehouse: A New Generation of Open Platforms That Unify Data Warehousing and Advanced Analytics”, 11th Annual Conference on Innovative Data Systems Research (CIDR ’21), available at: https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf
Mandl, K.D., Gottlieb, D., Mandel, J.C., Ignatov, V., Sayeed, R., Grieve, G., Jones, J., Ellis, A. and Culbertson, A. (2020), “Push Button Population Health: The SMART/HL7 FHIR Bulk Data Access Application Programming Interface”, npj Digital Medicine, vol. 3, 151, doi: https://doi.org/10.1038/s41746-020-00358-4
Jones, J.R., Gottlieb, D., McMurry, A.J., Atreja, A., Desai, P.M., Dixon, B.E., Payne, P.R.O., Saldanha, A.J., Shankar, P., Solad, Y. and Mandl K.D. (2024), “ Real World Performance of the 21st Century Cures Act Population-Level Application Programming Interface”, Journal of the American Medical Informatics Association, vol. 31, pp. 1144–1150, doi: https://doi.org/10.1093/jamia/ocae040
Parkins, M.D., Lee, B.E., Acosta, N., Bautista, M., Hubert, C., Hrudey, S.E., Frankowski, K. and Pang, X.-L. (2023), “Wastewater-Based Surveillance as a Tool for Public Health Action: SARS-CoV-2 and Beyond”, Clinical Microbiology Reviews, vol. 37, e00103-22, doi: https://doi.org/10.1128/cmr.00103-22
van der Drift, A.-M.R., Welling, A., Arntzen, V., Nagelkerke, E., van der Beek, R.F.H.J. and Maria, A. (2025), “Wastewater Surveillance Studies on Pathogens and Their Use in Public Health Decision-Making: A Scoping Review”, The Science of The Total Environment, vol. 993, 179982, doi: https://doi.org/10.1016/j.scitotenv.2025.179982
Lewis, A.L., Weiskopf, N.G., Abrams, Z.B., Foraker, R.E., Lai, A.M., Payne, P. and Gupta, A. (2023), “Electronic Health Record Data Quality Assessment and Tools: A Systematic Review”, Journal of the American Medical Informatics Association, vol. 30, pp. 1730–1740, doi: https://doi.org/10.1093/jamia/ocad120
Ghalavand, H., Shirshahi, S., Rahimi, A., Zarrinabadi, Z. and Amani, F. (2024), “Common Data Quality Elements for Health Information Systems: A Systematic Review”, BMC Medical Informatics and Decision Making, vol. 24, 243, doi: https://doi.org/10.1186/s12911-024-02644-7
Rilkoff, H., Struck, S., Ziegler, C., Faye, L., Paquette, D. and Buckeridge, D. (2024), “Innovations in Public Health Surveillance: An Overview of Novel Use of Data and Analytic Methods”, Canada Communicable Disease Report, vol. 50, pp. 93–101, doi: https://doi.org/10.14745/ccdr.v50i34a02
Bizzotto, A., Guzzetta, G., Marziano, V., Manso, M.D., Urdiales, A.M., Petrone, D., Cannone, A., Sacco, C., Poletti, P., Manica, M. and Merler S. (2024), “ Increasing Situational Awareness through Nowcasting of the Reproduction Number”, Frontiers in Public Health, vol. 12, article number 1430920, doi: https://doi.org/10.3389/fpubh.2024.1430920
Richard, D.M., Susswein, Z., Connolly, S., Myers y Gutiérrez, A., Thalathara, R., Carey, K., Koumans, E.H., Khan, D., Masters, N.B., McIntosh, N. and Gostic K. (2024), “Detection of Real-Time Changes in Direction of COVID-19 Transmission Using National- and State-Level Epidemic Trends Based on R Estimates – United States Overall and New Mexico, April–October 2024”, MMWR. Morbidity and Mortality Weekly Report, vol. 73, pp. 1058–1063, doi: https://doi.org/10.15585/mmwr.mm7346a3
Rajagopal, A., Ayanian, S., Ryu, A.J., Qian, R., Legler, S.R., Peeler, E.A., Issa, M., Coons, T.J. and Kawamoto, K. (2024), “Machine Learning Operations in Health Care: A Scoping Review”, Mayo Clinic Proceedings Digital Health, vol. 2, pp. 421–437, doi: https://doi.org/10.1016/j.mcpdig.2024.06.009
Ng, M.Y., Youssef, A., Pillai, M., Shah, V. and Hernandez-Boussard, T. (2024), “Scaling Equitable Artificial Intelligence in Healthcare with Machine Learning Operations”, BMJ Health & Care Informatics, vol. 31, article number e101101, doi: https://doi.org/10.1136/bmjhci-2024-101101
Ribeiro, V., Wolffram, D., Moraga, P. and Bracher, J. (2025), “Post-Processing and Weighted Combination of Infectious Disease Nowcasts”, PLoS Computational Biology, vol. 21, e1012836, doi: https://doi.org/10.1371/journal.pcbi.1012836
Yan, A.P., Guo, L.L., Inoue, J., Arciniegas, S.E., Vettese, E., Wolochacz, A., Crellin-Parsons, N., Purves, B., Wallace, S., Patel, A. and Sung L. (2025), “A Roadmap to Implementing Machine Learning in Healthcare: From Concept to Practice”, Frontiers in Digital Health, vol. 7, 1462751, doi: https://doi.org/10.3389/fdgth.2025.1462751
Hayman, D., Adisasmito, W., Almuhairi, S., Behravesh, C.B., Bilivogui, P., Bukachi, S.A., Casas, N., Margarita, N., Charron, D., Chaudhary, A. and Koopmans M. (2023), “Developing One Health Surveillance Systems”, One Health, vol. 17, article number 100617, doi: https://doi.org/10.1016/j.onehlt.2023.100617
Jain, P., Kraft, P., Power, C., Das, T., Stoica, I. and Zaharia, M. (2023), “Analyzing and Comparing Lakehouse Storage Systems”, 13th Annual Conference on Innovative Data Systems Research (CIDR ’23), available at: https://www.cidrdb.org/cidr2023/papers/p92-jain.pdf