Tutorial Machine learning for complex medical temporal sequences

25.02.2021 -

TUTORIAL - Machine learning for complex medical temporal sequences

at AIME 2022, June 14-17 2022.

Tutorialists: Panagiotis Papapetrou and Myra Spiliopoulou

panagiotis_papapetrou-e1609583228911-768x797 EinzelV2_watermark

Abstract

Advances in machine learning research and their application to medical data sources have received increasing attention recently and have demonstrated immense benefits for patients and practitioners. The adoption of Electronic Health Records (EHRs) in combination with the penetration of smart technologies and the Internet of Things give a further boost to initiatives for patient self-management and empowerment, with new forms of health-relevant data becoming available and requiring new data acquisition and analytics’ workflows. Two particular challenges that arise include data sparsity and missing values, as well as lack of model interpretability.

In this tutorial, we focus on sequential forms of health-related data – spatial trajectories, panel data from longitudinal studies, time series signals (such as ECGs), event sequences (such as sequences containing EHR events) and mHealth data. We elaborate on the questions that medical researchers and clinicians pose on those data, and on the instruments they use. We elaborate on what questions are asked with those instruments, on what questions can be answered from those data, on ML advances and achievements on such data, and on ways of responding to the medical experts’ questions about the derived models. Furthermore, we emphasize the need for interpretable and explainable models that can inspire trust and facilitate informed decision making. Towards this goal we elaborate on actionable models and counterfactual explanations for sequential medical data, and discuss how to apply them for the interpretation of black-box models, such as deep learning architectures.

Motivation

The proliferation of applications for medical data has increased the need for extracting useful knowledge that can be effectively used by healthcare experts. This tutorial elaborates on the complexity of temporal medical data. While earlier tutorials in AIME and in venues like KDD and ECMLPKDD have explored the potential of machine learning on medical data, there is less discussion on the challenges of sequential/temporal medical data, as well as on the need for trust by the medical practitioners.

Former tutorials

Panagiotis Papapetrou and Myra Spiliopoulou have offered the following recent tutorials with the number of attendees ranging between 25 and 60:

Learning from hospital data and learning from cohorts”, ECML/PKDD 2016
Mining Cohorts & Patient Data: Challenges and Solutions for the Pre-Mining, the Mining and the Post-MininPhases”, ICDM 2017
Knowledge Discovery from Cohorts, Electronic Health Records and further Patient-related data”, KDD 2018
Mining and model understanding on medical data”, KDD 2019
Learning from complex medical data” IEEE Big Data 2020

The proposed tutorial differs from all of the above tutorials as it focuses on (1) explainable and actionable models for healthcare in the form of counterfactuals, (2) sequential and temporal data only, and (3) challenges and mitigations of data quality, missingness, cleaning and gleaning.

Format

Half-day tutorial (must be morning) with following structure:

PART I: Introduction [Both]

What temporal sequences are there?

Spatial trajectories
Panel data from longitudinal studies
Time series of signals (EEG, ECG, …)
mHealth EMA

Why are sequences short?

Short period of observation
gaps due to device errors
gaps as non-adherence

PART II: Counterfactuals for medical sequences [Panos]

what are counterfactuals
LIME and SHAP
finding CFs in sequences of events
finding CFs in conventional (multivariate) time series

PART III: Learning on medical sequences with gaps [Myra]

filling the gaps: imputation approaches for short sequences
idiographic vs nomothetic approaches for learning
predicting the next h values: recursive forecasting vs multi-output forecasting
learning-for-one and explaining-for-many
learning patterns on the sequences
learning from gaps and explaining the gaps

PART IV: Conclusions and open issues [both]

how to deal with mixed data (categorical + numerical)
how to exploit phenotypes and digital markers
how to visualize panel data / multivariate time series and the trends in them

Our Affiliations

Prof. Panagiotis Papapetrou

Prof. Myra Spiliopoulou

Data Science group (DS@SU)

Dept. of Computer and Systems Sciences

Stockholm University, Stockholm, Sweden

Email: panagiotis@dsv.su.se

https://papapetrou.blogs.dsv.su.se

Knowledge Management and Discovery lab (KMD),

Faculty of Computer Science

Otto-von-Guericke-University Magdeburg

PO Box 4120, D-39016 Magdeburg, Germany

Email: myra@ovgu.de

www.kmd.ovgu.de/Team/Academic+Staff/Myra+Spiliopoulou.html

Myra Spiliopoulou is Professor of Business Information Systems at the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany. Her main research is on mining dynamic complex data. Her publications are on mining complex streams, mining evolving objects, adapting models to drift and building models that capture drift. Her research has been published in renowned international conferences and journals. She is regularly presenting tutorials on different aspects of complex data mining, and recently on medical mining. She is involved as (senior) reviewer in major conferences on data mining and knowledge discovery. In 2016 and 2019, she served as a PC Chair of the IEEE Int. Symposium on Computer-Based Medical Systems (CBMS). In 2021, she served as a Special Sessions Chair of the IEEE DSAA (Data Science And Analytics) conference and as ECML PKDD 2021 Awards Chair. In 2022, she serves as a Special Track Chair at the IEEE CBMS 2022 edition.

The KMD LAB focusses on two application areas: (a) business, including adaptive recommenders and opinionated streams, and (b) healthcare, including epidemiological mining, learning from clinical studies and clinical decision support.

KMD HIGHLIGHTS OF 2021

The KMD Lab PhD student Uli Niemann successfully defended his PhD on “Intelligent Assistance for Expert-Driven Subpopulation Discovery in High-Dimensional Timestamped Medical Data” and was granted the award of the faculty for the Best PhD of 2021.
The bachelor student Anne Rother, research assistant at the KMD Lab, working under the supervision of Prof. Spiliopoulou on “Assessing the difficulty of annotating medical data in crowdworking with help of experiments”, published her results at PLOS ONE 16(7): e0254764. For this publication, she received the ‘Rudolf-Kruse-Award’ of the Faculty, an award given for an excellent student publication, typically granted to master students.

CHOICE of 5 RECENT KMD PUBLICATIONS (in order of appearance, latest first)

Prakash S, Unnikrishnan V, Pryss R, Kraft R, Schobel J, Hannemann R, Langguth B, Schlee W, Spiliopoulou M. Interactive System for Similarity-Based Inspection and Assessment of the Well-Being of mHealth Users. Entropy. 2021 Dec;23(12):1695.
Puga C, Niemann U, Unnikrishnan V, Schleicher M, Schlee W, Spiliopoulou M. Discovery of Patient Phenotypes through Multi-layer Network Analysis on the Example of Tinnitus. 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), 2021, pp. 1-10, doi: 10.1109/DSAA53316.2021.9564158.
Jamaludeen N, Unnikrishnan V, Pryss R, Schobel J, Schlee W, Spiliopoulou M. Circadian Conditional Granger Causalities on Ecological Momentary Assessment Data from an mHealth App. In2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS) 2021 Jun 7 (pp. 354-359). IEEE.
Unnikrishnan V, Schleicher M, Shah Y, Jamaludeen N, Pryss R, Schobel J, Kraft R, Schlee W, Spiliopoulou M. The Effect of Non-Personalised Tips on the Continued Use of Self-Monitoring mHealth Applications. Brain Sciences. 2020 Dec;10(12):924.
Niemann U, Brueggemann P, Boecking B, Mebus W, Rose M, Spiliopoulou M, Mazurek B. Phenotyping chronic tinnitus patients using self-report questionnaire data: Cluster analysis and visual comparison. Scientific reports. 2020 Oct 2;10(1):1-0.

Panagiotis Papapetrou is a Professor at the Department of Computer and Systems Sciences of Stockholm University, Sweden. He is also an Adjunct Professor at the Computer Science Department at Aalto University, Finland. His area of expertise is algorithmic data mining with particular focus on mining and indexing temporal data and healthcare data. Panagiotis received his PhD in Computer Science at Boston University in 2009 and his Masters degree at the same university in 2007. He was a postdoctoral researcher at Aalto University during 2009-2012, and a lecturer at Birkbeck University of London, UK, during 2012-2013. He has participated in several national and international research projects, among which a 4-year starting grant funded by the Swedish Research Council. He is serving as Action Editor at the Data Mining and Knowledge Discovery journal and he is a Board Member of the Swedish Artificial Intelligence Society. Panagiotis has been involved in the organization of several Workshops and Tutorials at KDD, ICDM, and ECML/PKDD. Moreover, he has served as the general chair of IDA 2016, PhD consortium co-chair at ICDM 2018, and Workshops co-chair at ICDM 2019.

DS@SU HIGHLIGHTS OF 2021

1-year PhD Student Zhendong Lee won the best paper award at the AI in Medicine (AIME) conference on “Counterfactual Explanations for Survival Prediction of Cardiovascular ICU Patients”

CHOICE of 5 RECENT DSV@SU PUBLICATIONS (in order of appearance, latest first)

Jonathan Rebane, Isak Samsten, Leon Bornemann, and Panagiotis Papapetrou,“SMILE: A feature-based temporal abstraction framework for event-interval sequence classification”.In Data Mining and Knowledge Discovery 35(1): 372-399, 2021
Jonathan Rebane, Isak Samsten, and Panagiotis Papapetrou, “Exploiting Complex Medical Data with Interpretable Deep Learning for Adverse Drug Event Prediction”. In Artificial Intelligence in Medicine, 28(8): 1651-1659, 2021
Zhendong Wang, Isak Samsten, and Panagiotis Papapetrou, Counterfactual Explanations for Survival Prediction of Cardiovascular ICU Patients. In Artificial Intelligence in Medicine (AIME), 338-348, 2021 [best student paper award]
Zed Lee, Tony Lindgren, and Panagiotis Papapetrou, “Z-Miner: an efficient method for mining frequent arrangements of event intervals“. In ACM Knowledge Discovery and Data Mining (KDD), 524-534, 2020
Maria Bampa, Panagiotis Papapetrou, and Jaakko Hollmen, “A clustering framework for patient phenotyping with application to adverse drug events“. In IEEE Computer-Based Medical Systems (CBMS), 177-182, 2020