Big data and precision public health are terms that are now integral to our daily lives, but what do they mean?
Big data is often described using the 4Vs – Volume, Velocity, Veracity and Variety.
What makes big data “Big” is the sheer Volume, which can be at least 100 terabytes of data for prominent companies or organisations. Velocity is the speed of incoming data for processing, while veracity describes the data’s accuracy and trustworthiness. Finally, variety refers to the different types of information collected.
Precision public health is “an emerging practice to more granularly (with high level of detail) predict and understand public health risks and customise treatments for more specific and homogeneous subpopulations; often using new data, technologies, and methods.”
Basically, the goal is to deliver the right intervention to the right population at the right time. Big data has been successfully employed for surveillance and signal detection, predicting future risk, targeted interventions and understanding disease.
Big data in the public health clinic setting
The 4Vs of big data can be better understood in the context of public health clinics. Available patient health records represent the volume of data. Velocity refers to the daily new consultations taking place while veracity is the accuracy of data entries made by the healthcare workers. Data variety can be divided into structured (clinical and laboratory results) and unstructured (electrocardiograms, radiographs, client notes and feedback) data.
The value of big data depends on whether information can be processed and analysed within a period that meets its objectives. In Malaysia, the bulk of records in many health clinics is in manual form. Therefore, the data can only be more meaningful to clinicians, public health professionals and policy makers once it is digitised to enable analysis.
Big data in the Malaysian healthcare system
There are several notable databases and disease registries in Malaysia. Non-communicable disease (NCD) registries include the National Diabetes Registry, National Cancer Registry, National Stroke Registry, National Renal Registry, National Cardiovascular Disease Database and National Eye Database. NCD risk factors-related databases include the NCD Integrated System and National Health and Morbidity Survey. There is also the Teleprimary Care Clinical Information System in many public health clinics. Cancer Research Malaysia, Subang Jaya Medical Centre and University of Cambridge have recently collaborated to build the largest genetic and genomic database of Asian breast cancers.
These registries and databases are spearheaded by the respective fraternities from public, private and non-government organisations. The Malaysian Health Data Warehouse by the Ministry of Health (MOH) is the overarching body responsible for streamlining all the generated data and making sense of it using Big Data Analytics.
Existing databases need to be linked in order to scale up the data volume and maximise its value. One good example of this is the linkage of the Hong Kong Diabetes Register with healthcare facilities’ electronic medical records to enable to development and validation of risk equations to predict clinical outcomes. Big data concepts were used to monitor secular trends identify unmet needs and implement interventions. As a result, all major complications among diabetic patients in Hong Kong reduced by 30-60%.
Data integration and future directions
The linking of databases comes with its own set of challenges. Besides the demand of resources such as money, material and technical expertise, differences in database structures, ownership issues, data confidentiality and ethical/legal concerns are real.
However, all is not lost. Data integration – also known as data fusion, data matching and data merging – is an emerging field that enables researchers to pool data drawn from multiple sources. In a nutshell, it is the process of merging information from different data sets with some common variables. The creation of a new, combined data set allows for more flexibility in the analysis than the studying each data set separately.
Our project is to merge six different data sets from the National Diabetes Registry to form a five-year longitudinal cohort data set. The resulting information was used to answer several research objectives including the trend of HbA1c, blood pressure and LDL-cholesterol and time to treatment intensification among diabetic patients with poorly-controlled HbA1c. This improved our understanding of the quality of care of diabetes in public health clinics and identified high-risk subpopulations that can benefit from targeted intervention. Clinical inertia in diabetes management was also quantified.
The data integration approach provides a feasible alternative while awaiting the implementation of Big Data Analytics in the Malaysian healthcare system. Healthcare professionals from all disciplines must be willing to share our data, following relevant ethics approval of course. We should move from viewing data sharing purely from the research perspective and see it as a tool to aid our daily decision making.
Along with advances in artificial intelligence, robotics, the Internet of Things, 3D printing and more, Big Data is set to revolutionise healthcare in Malaysia.
References
- Dolley S. Big Data’s Role in Precision Public Health. Frontiers in Public Health. 2018;6(68).
- IBM Big data & analytic hub. The four V’s of big data. https://www.ibmbigdatahub.com/infographic/four-vs-big-data. Accessed 2021, February 19.
- Jason W. The 4 V’s of big data. Dummies, Wiley. https://www.dummies.com/careers/find-a-job/the-4-vs-of-big-data/#:~:text=In%20most%20big%20data%20circles,variety%2C%20velocity%2C%20and%20veracity. Accessed 2021, February 19.
- CodeBlue. Malaysian and Cambridge scientists build genetic database of Asian breast cancers. https://codeblue.galencentre.org/2021/01/27/malaysian-and-cambridge-scientists-build-genetic-database-of-asian-breast-cancers/. Published 2021, January 27. Accessed 2021, February 19.
- Ministry of Health Malaysia. Malaysian Health Data Warehouse (MyHDW). 2015-2016 Start up: Initiation. 2017. https://myhdw.moh.gov.my/public/pub
- Chan JCN, Lim LL, Luk AOY, et al. From Hong Kong Diabetes Register to JADE Program to RAMP-DM for Data-Driven Actions. Diabetes Care. 2019;42(11):2022-2031.
- Marcoulides KM, Grimm KJ. Data integration approaches to longitudinal growth modeling. Educ Psychol Meas. 2017;77(6):971-989.
- Wan KS, Moy FM, Mohd Yusof K, Mustapha FI, Mohd Ali Z, Hairi NN. Clinical inertia in type 2 diabetes management in a middle-income country: A retrospective cohort study. PLoS One. 2020;15(10):e0240531.
- Wan KS, Trends and Predictors of Glycosylated Haemoglobin A1C, Blood Pressure, and LDL-Cholesterol among Type 2 Diabetes Patients in Negeri Sembilan, Malaysia. (unpublished academic exercise). University of Malaya, 2021.
Dr Wan Kim Sui, Professor Dr Moy Foong Ming and Professor Dr Noran Naqiah Hairi.
Centre for Epidemiology and Evidence-Based Practice, Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya.