Tutorial KDD 2018
Data Science for Health
TUTORIAL - Knowledge Discovery from Cohorts, Electronic Health Records and further Patient-related data
KDD 2018 November, London - from August 19 to August 23, The tutorials are on August 19
Tutorialists: Panagiotis Papapetrou (Stockholm) and Myra Spiliopoulou (Magdeburg)
Data mining is intensively used in medicine and healthcare. Electronic Health Records (EHRs) are perceived as big patient data. On them, scientists strive to perform predictions on patients' progress, to understand and predict response to therapy, to detect adverse drug effects, and many other learning tasks. Medical researchers are also interested in learning from cohorts of population-based studies and of experiments. Learning tasks include the identification of disease predictors that can lead to new diagnostic tests and the acquisition of insights on interventions.
In this tutorial, we elaborate on data sources, methods, and case studies in medical mining. Next to conventional data sources, we address the potential of data from mobile devices. We discuss the learning problems that can be solved with those data, we present case studies and investigate the methods needed to prepare and mine those data and to present the results to a medical expert.
Medical research is largely hypothesis-driven: data collection, analysis and acquisition of insights are embedded into workflows that differ from the ways used by data mining scholars for (medical) data analysis. While medical researchers are often willing to offer their data for data-driven learning, it is the task of data mining scholars to analyze the data in a way that can be understood and exploited by medical researchers. The knowledge and techniques that will be presented in this tutorial will also serve as guidelines for novices and experienced data mining researchers, so that their methods and results when mining medical data will be useful to the medical domain and healthcare experts.
PART 1: Introduction
PART 2: Learning from EHR data (PANOS)
Supervised learning from EHRs
Unsupervised learning from EHRs
Temporal data mining from EHRs
PART 3a: Learning from Cohorts - Population-based Studies (MYRA)
What is a cohort?
Cohorts for Population-based studies
Learning and the challenge of time
PART 3b: Learning from Cohorts - Clinical data (MYRA)
1. Cohort specification
2. Expert-driven cohort construction and refinement
3. Involving the expert for labeling
4. Experiments on clinical cohorts (skipped, due to lack of time)
5. The validation issue
PART 4: Deep learning on EHR data (PANOS)
Neural networks for EHR data
Recurrent neural networks for diagnosis and treatment prediction
Convolutional neural networks for medical image processing
PART 5: Learning from mobile data (MYRA)
Learning from the data of mobile devices
Monitoring the ecological momentary assessments of patients
Slides here (figures removed)
PART 6: Conclusions and open challenges – 10 mins
The challenge of finding the data
The challenge of seeing with the expert's eyes
The challenge of preparing the data
Challenges of learning
The challenge of explaining the results
Target audience and prerequisites
The tutorial is intended for all KDD participants, and especially for young researchers, who are interested on how data mining and machine learning can be of benefit to healtchare and to medicine.
Participants are expected to have basic knowledge within the areas of data mining, machine learning, and databases. The audience is expected to be familiar with standard concepts and methods, such as classification models, deep learning, density-based clustering, Hidden Markov Models, frequent pattern and rule mining. Such knowledge can be expected from KDD participants, including students.
Tutor’s short bio and their expertise related to the tutorial
Myra Spiliopoulou is Professor of Business Information Systems at the Otto-von-Guericke-University Magdeburg. Her research is on mining dynamic complex data, with focus on healthcare and social data. She is action editor for DAMI and PC Chair of the Applied Data Science Track of KDD 2018. In the recent past, she was one of the four Journal Track Chairs for ECML PKDD 2017, Panel Chair of IEEE ICDM 2017 and PC Chair of the IEEE Symposium of Computer Based Medical Systems 2016. She has held tutorials on topics of data mining at KDD 2009 and 2015, PAKDD 2013 and 2016 and in many ECML PKDD conferences.
Panagiotis Papapetrou is Professor at the Department of Computer and Systems Sciences at Stockholm University and Adjunct Professor at the Computer Science Department at Aalto University. His area of expertise is algorithmic data mining with particular focus on mining and indexing temporal data and healthcare data. Panagiotis received his PhD in Computer Science at Boston University in 2009, was a post-doctoral researcher at Aalto University during 2009-2013, and lecturer at the University of London during 2012-2013. He has participated in several national and international research projects. He is board member of the Swedish AI Society.
Prof. Panagiotis Papapetrou, firstname.lastname@example.org
Prof. Myra Spiliopoulou, email@example.com
Learning on Cohorts
- Gunter TD, Terry NP. The Emergence of National Electronic Health Record Architectures in the United States and Australia: Models, Costs, and Questions. Journal of Medical Internet Research. 2005;7(1):e3. doi:10.2196/jmir.7.1.e3
- Hielscher T, Spiliopoulou M, Völzke H, Kühn JP. Identifying relevant features for a multi-factorial disorder with constraint-based subspace clustering. In 2016 IEEE 29th
International Symposium on Computer-Based Medical Systems (CBMS), 2016 Jun 20 (pp. 207-212). IEEE.
- Niemann U, Hielscher T, Spiliopoulou M, Völzke H, Kühn JP. Can we classify the participants of a longitudinal epidemiological study from their previous evolution?. In 2015
IEEE 28th International Symposium on Computer-Based Medical Systems (CBMS), 2015 Jun 22 (pp. 121-126). IEEE.
- Niemann U, Spiliopoulou M, Preim B, Ittermann T, Völzke, H. Combining Subgroup Discovery and Clustering to Identify Diverse Subpopulations in Cohort Study Data. In
2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), June 22, 2017
- Thew S, Sutcliffe A, Procter R, De Bruijn O, McNaught J, Venters CC, Buchan I.
Requirements engineering for e-Science: experiences in epidemiology. IEEE Software. 2009 Jan;26(1).
- Castaneda C, Nalley K, Mannion C, Bhattacharyya P, Blake P, Pecora A, Goy A A, Suh K: Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J. Clinical Bioinformatics 5: 4 (2015)
- Deschamps K, Matricali GA, Roosen P, Desloovere K, Bruyninck H, Spaepen P, Nobels F, Tits J, Flour M, Staes F. Classification of forefoot plantar pressure distribution in
persons with diabetes: a novel perspective for the mechanical management of diabetic foot?. PloS one. 2013 Nov 22;8(11):e79924.
- Holzinger A. Interactive machine learning for health informatics: when do we need the human-in-the-loop?. Brain Informatics. 2016 Jun 1;3(2):119-31.
- Niemann U, Spiliopoulou M, Szczepanski T, Samland F, Grützner J, Senk D, Ming A, Kellersmann J, Malanowski J, Klose S, Mertens PR. Comparative Clustering of Plantar Pressure Distributions in Diabetics with Polyneuropathy May Be Applied to Reveal Inappropriate Biomechanical Stress. PloS one. 2016 Aug 16;11(8):e0161326.
- Zhang Z, Gotz D, Perer A. Iterative cohort analysis and exploration. Information Visualization. 2015 Oct;14(4):289-307.
Machine Learning from EHR
- Henelius A, Puolamäki K, Asker L, Boström H,Papapetrou P: A peek into the black box: exploring classifiers by randomization. Data Min. Knowl. Discov. 28(5-6): 1503-1529 (2014)
- Henelius A, Puolamäki K, Karlsson I, Zhao J, Asker L, Boström H, Papapetrou P: GoldenEye++: A Closer Look into the Black Box. SLDS 2015: 96-105
- Karlsson I, Papapetrou P, Boström H: Generalized random shapelet forests. Data Min. Knowl. Discov. 30(5): 1053-1085 (2016)
- Karlsson I, Papapetrou P, Asker L, Boström H, Persson HE: Mining disproportional itemsets for characterizing groups of heart failure patients from administrative health records. PETRA 2017: 394-398
- Mogensen UB, Ishwaran H, Gerds TA. Evaluating Random Forests for Survival Analysis using Prediction Error Curves. Journal of statistical software. 2012;50(11):1-23.
- Moskovitch R, Wang F, Shahar Y, Hripcsak G: Temporal data analytics. Journal of Biomedical Informatics 62: 276-277 (2016)
- Zhao J: Temporal weighting of clinical events in electronic health records for pharmaco-vigilance. BIBM 2015: 375-381.
Deep Learning from EHR
- J. Congand and B. Xiao, “Minimizing Computation in Convolutional Neural Networks,” 2014. [B]
- Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning to Diagnose with LSTM Recurrent Neural Networks,” arXiv, 2016.
- P. Nguyen, T. Tran, N. Wickramasinghe, and S. Venkatesh, “Deepr: A Convolutional Net for Medical Records,” arXiv, pp. 1–9, 2016.
- P. Nickerson, Patrick Tighe, Benjamin Shickel, Parisa Rashidi: Deep neural network architectures for forecasting analgesic response. EMBC 2016: 2966-2969.
- B. Shickel and P. J. Tighe and A. Bihorac and P. Rashidi Deep EHR: A Survey of Recent
Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE Journal of Biomedical and Health Informatics. 2017, 1(1): 99.
Learning from mobile data
- Clifford, Gari D., and David Clifton. "Wireless technology in disease management and medicine." Annual review of medicine 63 (2012): 479-492
- Guo, Bin, Zhu Wang, Zhiwen Yu, Yu Wang, Neil Y. Yen, Runhe Huang, and Xingshe Zhou. "Mobile crowd sensing and computing: The review of an emerging human-powered
sensing paradigm." ACM Computing Surveys (CSUR) 48, no. 1 (2015): 7 [B]
- Ham, Nathaniel, Amir Dirin, and Teemu H. Laine. "Machine learning and dynamic user interfaces in a context aware nurse application environment." Journal of Ambient Intelligence and Humanized Computing 8, no. 2 (2017): 259-271
- Jiménez-Serrano, Santiago, Salvador Tortajada, and Juan Miguel García-Gómez. "A mobile health application to predict postpartum depression based on machine learning." Telemedicine and e-Health 21, no. 7 (2015): 567-574.
- Jones, Valerie M., RJ Mendes Batista, Richard GA Bults, Harm op den Akker, I. A. Widya, Hermanus J. Hermens, Thijs Tönis, T. Tonis, and Miriam Marie Rosé Vollenbroek-Hutten.
"Interpreting streaming biosignals: in search of best approaches to augmenting mobile health monitoring with machine learning for adaptive clinical decision support." In Workshop on Learning from Medical Data Streams, LEMEDS 2011. 2011
- Kumar, Santosh, Wendy J. Nilsen, Amy Abernethy, Audie Atienza, Kevin Patrick, Misha Pavel, William T. Riley et al. "Mobile health technology evaluation: the mHealth evidence
workshop." American journal of preventive medicine 45, no. 2 (2013): 228-236
- Mohr, David C., Mi Zhang, and Stephen M. Schueller. "Personal sensing: understanding mental health using ubiquitous sensors and machine learning." Annual review of clinical
psychology 13 (2017): 23-47.
- Onnela, Jukka-Pekka, and Scott L. Rauch. "Harnessing smartphone-based digital phenotyping to enhance behavioral and mental health." Neuropsychopharmacology 41, no. 7 (2016): 1691
- Probst T, Pryss RC, Langguth B, Spiliopoulou M, Landgrebe M, Vesala M, Harrison S, Schobel J, Reichert M, Stach M, Schlee W. "Outpatient Tinnitus Clinic, Self-Help Web Platform, or Mobile Application to Recruit Tinnitus Study Samples?". Frontiers in aging neuroscience. 2017;9.
Contact info of the tutors
Prof. Myra Spiliopoulou
Research Group on Knowledge Management and Discovery (KMD),
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg,
PO Box 4120, 39016 Magdeburg, Germany
Prof. Panagiotis Papapetrou
Data Science group
Department of Computer and Systems Sciences
PO Box 7003, 164 07, Stockholm, Sweden