A new study shows the potential of machine learning in the early identification of people with inflammatory arthritis.

Ankylosing spondylitis
Ankylosing spondylitis


A study by Swansea University has revealed how machine learning can help early detect Ankylosing Spondylitis (AS) inflammatory arthritis and revolutionise how people are detected and diagnosed by their GPs.

Published in the open-access journal PLOS ONE, the study, funded by UCB Pharma and Health and Care Research Wales, has been carried out by data analysts and researchers from the National Centre for Population Health & Wellbeing Research (NCPHWR).

The team used machine learning methods to develop a profile of the characteristics of people likely to be diagnosed with AS, the second most common cause of inflammatory arthritis. 

Machine learning, a type of artificial intelligence, is a method of data analysis that automates model building to improve performance and accuracy. Its algorithms build a model based on sample data to make predictions or decisions without being explicitly programmed to do so.

Using the Secure Anonymised Information Linkage (SAIL) Databank based at Swansea University Medical School, a national data repository allowing anonymised person-based data linkage across datasets, patients with AS were identified and matched with those with no record of a condition diagnosis.

The data was analysed separately for men and women, with a model developed using feature/variable selection and principal component analysis to build decision trees.

The findings revealed:

  • In men, lower back pain, uveitis (inflammation of the eye’s middle layer), and non-steroidal anti-inflammatory drug (NSAID) use under age 20 are associated with AS development.
  • Women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications.
  • The test data had a good prediction rate of around 70%-80%; however, when applying the model to a general population, the team felt multiple models might be needed to narrow down the population over time to improve the predictive value and reduce the time to diagnose AS.