top of page

Data Mining

Data Mining/algorithm development for identification of individuals at risk

Work Package 1

The primary objective of this work is to identify patients at high risk of oesophageal cancer that are suitable for Cytosponge-TFF3 test. Over the past decade, Professor Hippisley-Cox’s team have developed, validated, and implemented a novel set of risk prediction algorithms collectively known as the QCancer algorithms which predict the risk of different types of cancer using readily available information from routine electronic health data.  They aim to update and validate the algorithms with a particular focus on oesophageal cancers. They will be mining the electronic health records including prescription and endoscopy data in order to derive and validate an algorithm aimed at risk-stratifying patients for whom early investigation will be beneficial. They will determine the patterns of PPI use which includes type, dosage, and duration to evaluate the long-term consequence of PPI use in patients experiencing chronic reflux symptoms. They envisage that the updated and validated algorithm could help to improve early recognition of the oesophageal cancers and to reduce over-use of prescription acid-regulating therapies.


Dr Hall’s team will explore the regulatory and ethical challenges associated with accessing and processing patient data for risk stratification and personalised prevention. This work will include consideration of the policy landscape for utilising data mining using conventional methods or AI/ML for risk stratification and personalised prevention; issues relating to the nature and quality of the data (to the extent that they impact on regulatory factors such as bias and discrimination); the reasonable expectations of key stakeholders for data processing (including patients and health providers); and assessing the potential legal and regulatory challenges associated with compliance with the GDPR, particularly if data mining is solely automated. This will include consideration of the requirements for information provision, transparency, and explanation (GDPR Articles 5, 13-15 and Article 22 of the GDPR).


Professor Sasieni’s team will carry out a literature review of all published risk prediction models for oesophageal cancer with the aim to aid the development of the algorithm to identify individuals suitable for referral to a Cytosponge test. Once the algorithm has been developed, they aim to validate it by independently testing it using an existing primary dataset of over 250,000 reflux patients.

bottom of page