Tutorial KDD 2018

ovgu_logo_png SU-logo


Data Science for Health

TUTORIAL - Knowledge Discovery from Cohorts, Electronic Health Records and further Patient-related data


KDD 2018 November, London - from August 19 to August 23, The tutorials are on August 19


Tutorialists: Panagiotis Papapetrou (Stockholm) and Myra Spiliopoulou (Magdeburg)


Data mining is intensively used in medicine and healthcare. Electronic Health Records (EHRs) are perceived as big patient data. On them, scientists strive to perform predictions on patients' progress, to understand and predict response to therapy, to detect adverse drug effects, and many other learning tasks. Medical researchers are also interested in learning from cohorts of population-based studies and of experiments. Learning tasks include the identification of disease predictors that can lead to new diagnostic tests and the acquisition of insights on interventions.

In this tutorial, we elaborate on data sources, methods, and case studies in medical mining. Next to conventional data sources, we address the potential of data from mobile devices. We discuss the learning problems that can be solved with those data, we present case studies and investigate the methods needed to prepare and mine those data and to present the results to a medical expert.

Medical research is largely hypothesis-driven: data collection, analysis and acquisition of insights are embedded into workflows that differ from the ways used by data mining scholars for (medical) data analysis. While medical researchers are often willing to offer their data for data-driven learning, it is the task of data mining scholars to analyze the data in a way that can be understood and exploited by medical researchers. The knowledge and techniques that will be presented in this tutorial will also serve as guidelines for novices and experienced data mining researchers, so that their methods and results when mining medical data will be useful to the medical domain and healthcare experts.



PART 1: Introduction (BOTH) – 30 mins

  1. What are patient data? Electronic Health Records (EHRs), social data, data collected in cohort studies

  2. What is a cohort?

  3. Cohorts for clinical studies

  4. Cohorts for population-based studies

PART 2: Learning from EHR data (PANOS) – 30 mins

  1. SupervisedlearningfromEHRs

  2. Unsupervised learning from EHRs

  3. Temporal data mining from EHRs

PART 3: Hypothesis-driven vs exploratory learning on patient data (MYRA) – 40 mins

  1. Cohort specification from EHR data

  2. Expert driven cohort refinement on EHR data

  3. Expert inputs for learning on EHR data

  4. Experiments on clinical cohorts

PART 4: Deep learning on EHR data (PANOS) – 40 mins

  1. Neural networks for EHR data

  2. Recurrent neural networks for diagnosis and treatment prediction

  3. Convolutional neural networks for medical image processing

PART 5: Exploratory learning on patient mobile data (MYRA) – 30 mins

  1. Learning from the data of mobile devices

  2. Monitoring the ecological momentary assessments of patients

PART 6: Conclusions and open challenges – 10 mins

  1. The challenge of finding the data

  2. The challenge of seeing with the expert's eyes

  3. The challenge of preparing the data

  4. Challenges of learning

  5. The challenge of explaining the results


Target audience and prerequisites

The tutorial is intended for all KDD participants, and especially for young researchers, who are interested on how data mining and machine learning can be of benefit to healtchare and to medicine.

Participants are expected to have basic knowledge within the areas of data mining, machine learning, and databases. The audience is expected to be familiar with standard concepts and methods, such as classification models, deep learning, density-based clustering, Hidden Markov Models, frequent pattern and rule mining. Such knowledge can be expected from KDD participants, including students.


Tutor’s short bio and their expertise related to the tutorial

Myra Spiliopoulou is Professor of Business Information Systems at the Otto-von-Guericke-University Magdeburg. Her research is on mining dynamic complex data, with focus on healthcare and social data. She is action editor for DAMI and PC Chair of the Applied Data Science Track of KDD 2018. In the recent past, she was one of the four Journal Track Chairs for ECML PKDD 2017, Panel Chair of IEEE ICDM 2017 and PC Chair of the IEEE Symposium of Computer Based Medical Systems 2016. She has held tutorials on topics of data mining at KDD 2009 and 2015, PAKDD 2013 and 2016 and in many ECML PKDD conferences.

Panagiotis Papapetrou is Professor at the Department of Computer and Systems Sciences at Stockholm University and Adjunct Professor at the Computer Science Department at Aalto University. His area of expertise is algorithmic data mining with particular focus on mining and indexing temporal data and healthcare data. Panagiotis received his PhD in Computer Science at Boston University in 2009, was a post-doctoral researcher at Aalto University during 2009-2013, and lecturer at the University of London during 2012-2013. He has participated in several national and international research projects. He is board member of the Swedish AI Society.

Corresponding author
Prof. Panagiotis Papapetrou, panagiotis@dsv.su.se
Prof. Myra Spiliopoulou, myra@ovgu.de

A list of forums and their time and locations
This tutorial has not been held in this form in the past.
Earlier versions of the part on cohort data: (by Myra Spiliopoulou) have been held before.
- "Medical Mining for Clinical Knowledge Discovery”, ECML PKDD 2014 (Nancy, Sept. 2014). This was the first edition of the tutorial focusing on learning from cohort
data and was attended by approximately 20 persons.
- "Medical Mining", KDD 2015 (Sydney, Aug. 2015). This was the second edition which had an audience of about the same size. The proposed tutorial differs from this
one substantially (see under the next bullet point).
- "Medical Mining", PAKDD 2016 (Auckland, April 2016). The third edition contained different examples and placing particular emphasis on the publication of medical
mining research papers. The proposed version of the part of the cohort data has common elements with the second and third editions of the "Medical Mining" tutorial,
namely: terminology and goals of epidemiological research, some of the cited papers. New is the emphasis on exploratory analysis and the emphasis on the role of data
exploration and clustering in medical research papers. Moreover, both authors have offered following recent tutorials:
- “Learning from hospital data and learning from cohorts”, ECML/PKDD 2016 (Riva Del Garda, Sept. 2016). The proposed tutorial contains some topics that overlap with the ECML/PKDD 2016 tutorial, but the presented material will be majorly updated to include new methods and results. In addition, the following parts contain
completely new topics, such as methods for multi-dimensional time series classification, mining ordered rules, learning from timestamped epidemiological data,
and learning from crowd-sensing data. These techniques are covered in parts 3.2, 4.2, 5.2, 6, and 7.
- “Learning from hospital data and learning from cohorts”, ECML/PKDD 2016 (Riva Del Garda, Sept. 2016). The proposed tutorial contains some topics that overlap with the ECML/PKDD 2016 tutorial, but the presented material will be majorly updated to include new methods and results. In addition, the following parts contain
completely new topics, such as deep learning and learning from social patient data. These techniques are covered in parts 4 and 5.
- “Mining Cohorts & Patient Data: Challenges and Solutions for the Pre-Mining, the Mining and the Post-Mining Phases”, ICDM 2017 (New Orleans, Nov 2017). The proposed tutorial contains some topics that overlap with this ICDM 2017 tutorial, but the presented material also includes new methods and results, with emphasis on deep learning for EHRs (Part 4) and learning from social and mobile data (Part 5).

A list of the most important references that will be covered in the tutorial
Next to methods for the analysis of medicine/healthcare related data in different domains, we
cite also basic technologies; these are marked [B].

Part 1: Learning on Cohorts
- Gunter TD, Terry NP. The Emergence of National Electronic Health Record Architectures in the United States and Australia: Models, Costs, and Questions. Journal of Medical Internet Research. 2005;7(1):e3. doi:10.2196/jmir.7.1.e3
- Hielscher T, Spiliopoulou M, Völzke H, Kühn JP. Identifying relevant features for a multi-factorial disorder with constraint-based subspace clustering. In 2016 IEEE 29th
International Symposium on Computer-Based Medical Systems (CBMS), 2016 Jun 20 (pp. 207-212). IEEE.
- Niemann U, Hielscher T, Spiliopoulou M, Völzke H, Kühn JP. Can we classify the participants of a longitudinal epidemiological study from their previous evolution?. In 2015
IEEE 28th International Symposium on Computer-Based Medical Systems (CBMS), 2015 Jun 22 (pp. 121-126). IEEE.
- Niemann U, Spiliopoulou M, Preim B, Ittermann T, Völzke, H. Combining Subgroup Discovery and Clustering to Identify Diverse Subpopulations in Cohort Study Data. In
2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), June 22, 2017
- Thew S, Sutcliffe A, Procter R, De Bruijn O, McNaught J, Venters CC, Buchan I.
Requirements engineering for e-Science: experiences in epidemiology. IEEE Software. 2009 Jan;26(1).
PART 2: Machine Learning from EHR
- Henelius A, Puolamäki K, Asker L, Boström H,Papapetrou P: A peek into the black box: exploring classifiers by randomization. Data Min. Knowl. Discov. 28(5-6): 1503-1529 (2014)
- Henelius A, Puolamäki K, Karlsson I, Zhao J, Asker L, Boström H, Papapetrou P: GoldenEye++: A Closer Look into the Black Box. SLDS 2015: 96-105
- Karlsson I, Papapetrou P, Boström H: Generalized random shapelet forests. Data Min. Knowl. Discov. 30(5): 1053-1085 (2016)
- Karlsson I, Papapetrou P, Asker L, Boström H, Persson HE: Mining disproportional itemsets for characterizing groups of heart failure patients from administrative health records. PETRA 2017: 394-398
- Mogensen UB, Ishwaran H, Gerds TA. Evaluating Random Forests for Survival Analysis using Prediction Error Curves. Journal of statistical software. 2012;50(11):1-23.
- Moskovitch R, Wang F, Shahar Y, Hripcsak G: Temporal data analytics. Journal of Biomedical Informatics 62: 276-277 (2016)
- Zhao J: Temporal weighting of clinical events in electronic health records for pharmaco-vigilance. BIBM 2015: 375-381.
PART 3: Expert-driven Learning from Clinical Cohorts
- Castaneda C, Nalley K, Mannion C, Bhattacharyya P, Blake P, Pecora A, Goy A A, Suh K: Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J. Clinical Bioinformatics 5: 4 (2015)
- Deschamps K, Matricali GA, Roosen P, Desloovere K, Bruyninck H, Spaepen P, Nobels F, Tits J, Flour M, Staes F. Classification of forefoot plantar pressure distribution in
persons with diabetes: a novel perspective for the mechanical management of diabetic foot?. PloS one. 2013 Nov 22;8(11):e79924.
- Holzinger A. Interactive machine learning for health informatics: when do we need the human-in-the-loop?. Brain Informatics. 2016 Jun 1;3(2):119-31.
- Niemann U, Spiliopoulou M, Szczepanski T, Samland F, Grützner J, Senk D, Ming A, Kellersmann J, Malanowski J, Klose S, Mertens PR. Comparative Clustering of Plantar Pressure Distributions in Diabetics with Polyneuropathy May Be Applied to Reveal Inappropriate Biomechanical Stress. PloS one. 2016 Aug 16;11(8):e0161326.
- Zhang Z, Gotz D, Perer A. Iterative cohort analysis and exploration. Information Visualization. 2015 Oct;14(4):289-307.PART 4: Deep Learning from EHR
- J. Congand and B. Xiao, “Minimizing Computation in Convolutional Neural Networks,” 2014. [B]
- Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning to Diagnose with LSTM Recurrent Neural Networks,” arXiv, 2016.
- P. Nguyen, T. Tran, N. Wickramasinghe, and S. Venkatesh, “Deepr: A Convolutional Net for Medical Records,” arXiv, pp. 1–9, 2016.
- P. Nickerson, Patrick Tighe, Benjamin Shickel, Parisa Rashidi: Deep neural network architectures for forecasting analgesic response. EMBC 2016: 2966-2969.
- B. Shickel and P. J. Tighe and A. Bihorac and P. Rashidi Deep EHR: A Survey of Recent
Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE Journal of Biomedical and Health Informatics. 2017, 1(1): 99.
PART 5: Learning from mobile data
- Clifford, Gari D., and David Clifton. "Wireless technology in disease management and medicine." Annual review of medicine 63 (2012): 479-492
- Guo, Bin, Zhu Wang, Zhiwen Yu, Yu Wang, Neil Y. Yen, Runhe Huang, and Xingshe Zhou. "Mobile crowd sensing and computing: The review of an emerging human-powered
sensing paradigm." ACM Computing Surveys (CSUR) 48, no. 1 (2015): 7 [B]
- Ham, Nathaniel, Amir Dirin, and Teemu H. Laine. "Machine learning and dynamic user interfaces in a context aware nurse application environment." Journal of Ambient Intelligence and Humanized Computing 8, no. 2 (2017): 259-271
- Jiménez-Serrano, Santiago, Salvador Tortajada, and Juan Miguel García-Gómez. "A mobile health application to predict postpartum depression based on machine learning." Telemedicine and e-Health 21, no. 7 (2015): 567-574.
- Jones, Valerie M., RJ Mendes Batista, Richard GA Bults, Harm op den Akker, I. A. Widya, Hermanus J. Hermens, Thijs Tönis, T. Tonis, and Miriam Marie Rosé Vollenbroek-Hutten.
"Interpreting streaming biosignals: in search of best approaches to augmenting mobile health monitoring with machine learning for adaptive clinical decision support." In Workshop on Learning from Medical Data Streams, LEMEDS 2011. 2011
- Kumar, Santosh, Wendy J. Nilsen, Amy Abernethy, Audie Atienza, Kevin Patrick, Misha Pavel, William T. Riley et al. "Mobile health technology evaluation: the mHealth evidence
workshop." American journal of preventive medicine 45, no. 2 (2013): 228-236
- Mohr, David C., Mi Zhang, and Stephen M. Schueller. "Personal sensing: understanding mental health using ubiquitous sensors and machine learning." Annual review of clinical
psychology 13 (2017): 23-47.
- Onnela, Jukka-Pekka, and Scott L. Rauch. "Harnessing smartphone-based digital phenotyping to enhance behavioral and mental health." Neuropsychopharmacology 41, no. 7 (2016): 1691
- Probst T, Pryss RC, Langguth B, Spiliopoulou M, Landgrebe M, Vesala M, Harrison S, Schobel J, Reichert M, Stach M, Schlee W. "Outpatient Tinnitus Clinic, Self-Help Web Platform, or Mobile Application to Recruit Tinnitus Study Samples?". Frontiers in aging neuroscience. 2017;9. 

Contact info of the tutors

Prof. Myra Spiliopoulou

Research Group on Knowledge Management and Discovery (KMD),

Faculty of Computer Science, Otto-von-Guericke-University Magdeburg,

PO Box 4120, 39016 Magdeburg, Germany

Email: myra@ovgu.de

URL: http://www.kmd.ovgu.de/Team/Academic+Staff/Myra+Spiliopoulou.html


Prof. Panagiotis Papapetrou

Data Science group

Department of Computer and Systems Sciences

PO Box 7003, 164 07, Stockholm, Sweden

Email: panagiotis@dsv.su.se

URL: http://people.dsv.su.se/~panagiotis/



Last Modification: 30.05.2018 - Contact Person:

Sie können eine Nachricht versenden an: Webmaster