PAKDD 2016 - Medical Mining Tutorial
Tutorial Title: "Medical Mining"
By: Myra Spiliopoulou, Pedro Pereira Rodrigues, Ernestina Menasalvas
We experience an increasing interest for mining technologies in medical research and in healthcare. In scientific publications of mining conferences we see that advanced mining methods are applied on medical data. However, scientific publications in medical research show a rather different picture: mining methods are used rarely, and the choice is among the simplest algorithms. Why is this discrepancy? One reason is that medical scholars and practitioners work differently from mining researchers; this affects how they deal with data, they interpret models and what they deduce from a model. To promote a better exploitation of state-of-the-art mining methods, mining researchers should learn how medical researchers and practitioners work. Purpose of this tutorial is to contribute to this learning process.
We address three fields of medicine and healthcare: (1) mining in epidemiology, (2) mining in hospitals, and (3) mining EHR.
Target audience and prerequisites.
All conference participants who want to contribute with their skills in the progress of medicine and healthcare
Importance of topic and benefit for the KDD participants
There is proliferation of medical data and of applications, in which mining is needed. Medical researchers are willing to offer their data, but mining scholars are often called to formulate the learning problem and to deliver a solution understandable to non-mining experts. This tutorial provides insides on how mining is done in medical research, on pitfalls of mining in hospital research and decision support, and on curating data for mining.
Outline of the tutorial
PART 1. Mining in Epidemiology (by Myra Spiliopoulou). This part of the tutorial starts with explaining what epidemiologists study, and brings forward some basic terminology on different kinds of studies in epidemiology. We start by explaining that epidemiology researchers do not only study the spread of epidemies, but are as well interested in non-contagious illnesses (like Alzheimer's), in disorders and impairments (like traumatic brain injury) and in healthy living and ageing. In this part, we will see what is a cohort and what is the modifier of an outcome, what is the difference between a population-based study and a clinical trial, what is the difference between a longitudinal and a cross-sectional study, and why clustering methods must always take the target variable (!) into account. We will discuss how basic and elaborate data mining methods can be framed to be useful in epidemiological research, and give several examples.
PART 2. Mining Hospital Data (by Pedro Pereira Rodrigues). This part of the tutorial deals with knowledge discovery and decision support in the hospital. It starts by explaining Electronic Health Records (EHR) and lists the most prominent dangers faced by a mining scholar who wants to analyze them. We will see the processes in which EHR are used, filled or modified, the knowledge discovery tasks in which these records must be analyzed, and the challenges of such an analysis. Data mining in the hospital must ideally flow into clinical decision support (CDS). This part of the tutorial contains several cases of CDS, highlighting the importance of adhering to the hospital protocols for data processing and model evaluation, and the importance of integrating CDS into the hospital processes.
PART 3. Mining EHR non structured information (by Ernestina Menasalvas). This part of the tutorial continues with the analysis of the EHR and focuses on the analysis of the non-structured information contained: text and images. We will see the challenges of preparing such data for analysis. In particular we will deal with the following challenges:
- Natural language processing: Lost of information (clinical notes, papers, social networks input) is free text and contains valuable knowledge. However techniques for language processing are required. These techniques should take into account acronyms and abbreviations of the medical field, negation finding and multilingual issues.
- Standardized Medical Annotation Framework: A standardized medical text processing and understanding framework supports technical integration of annotation technologies; this incorporates the definition of data formats (output and exchange formats) and information delivered from semantic annotation systems. Such a framework enables a standardized integration of software provided for semantic annotations and, at the same time, supports clinical IT departments in their data and system integration tasks. Available annotation frameworks such as UIMA will be reviewed.
- Image understanding algorithm (partially available techniques): Imaging processing algorithms for automated detection of anatomical structure and abnormal structure (including automated measuring).
Tutors' short bio
Myra Spiliopoulou is Professor of Business Information Systems at the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany. Her main research is on mining dynamic complex data. Her publications are on mining complex streams, mining evolving objects, adapting models to drift and building models that capture drift. She focusses on two application areas: business (including opinion stream mining and adaptive recommenders) and medical research (including epidemiological mining and learning from clinical studies). She served as PC Co-Chair of ECML PKDD 2006, NLDB 2008 and of 36th Annual Conference of the German Classification Society (GfKl 2012, Hildesheim, August 2012) She is involved in the organization committees of several conferences. She is PC Co-Chair for CBMS 2016. She was Tutorials Co-Chair at ICDM 2010 and Workshops Co-Chair at ICDM 2011, Demo Track Co-Chair of ECML PKDD 2014 and 2015, and is senior PC member of recent conferences like ECML PKDD 2014, 2015 and SIAM Data Mining 2015. She has held tutorials on topics of data mining at KDD 2009 and 2015, PAKDD 2013 and in most ECML PKDD conferences since several years.
|Prof. Myra Spiliopoulou
Research Group on Knowledge Management and Discovery (KMD),
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg,
PO Box 4120, 39016 Magdeburg, Germany
Pedro Pereira Rodrigues is Professor at the Department of Health Information and Decision Sciences, Faculty of Medicine of the University of Porto, and a researcher at the Biostatistics and Intelligent Data Analysis group of the Center for Health Technologies and Services Research. His main research area is machine learning, currently devoted to Bayesian networks applications to clinical research and decision support. He has edited 4 conference proceedings, and published articles in indexed peer-reviewed journals and conference proceedings. He helped organizing events as also general chair (CBMS 2013) and PC chair (ECMLPKDD 2015, CBMS 2014-15, and several thematic tracks and workshops since 2007) and is a member of the steering committee of CBMS, and was a member of the program committee for more than 20 editions of international conferences (e.g. IJCAI, ECMLPKDD, ICML, CBMS). He has also co-organized tutorials in IBERAMIA 2012 and ECMLPKDD 2014.
|Prof. Pedro Pereira Rodrigues
CINTESIS & LIAAD, Health Information and Decision Sciences Department,
Faculty of Medicine of the University of Porto, Alameda Prof. Hernani Monteiro, 4200-319 Porto, Portugal
Ernestina Menasalvas is Professor at the Department of Computer Systems Languages and Sw Engeneering, Faculty of Computer Science of Universidad Politecnica de Madrid (UPM) and a member of the MIDAS, Data Mining and data simulation group‚ at the Center of Biotechnology at UPM. Her subject area is Data Mining, and most recently using medical data. She has also participated in a range of projects related to data integration and mining on mobile devices. She has published three international books on web mining (edited by Springer in 2003, 2004 and 2009 respectively) as well as in several key international journals.
|Prof. Ernestina Menasalvas
Centro de Tecnologia Biomedica, Universidad Politecnica de Madrid,
Campus de Montegancedo, Pozuelo de Alarcon, Spain