Opinion Stream Classification with Ensembles and Active leaRners - OSCAR
With the rise of WEB 2.0, many people use social media to post opinions on almost any subject - events, products, topics. Opinion mining is used to draw conclusions on the attitude of people towards each subject; such insights are essential for product design and advertisement, for event planning, political campaigns etc. As opinions accumulate, though, changes occur and invalidate the models from which these conclusions are drawn. Changes concern the general sentiment towards a subject and towards specific facets of this subject, as well as the words used to express sentiment. Subjects also change over time. In OSCAR, we will develop opinion stream mining methods that deal with change and adapt the learned models continuously.
The first part of OSCAR is on leveraging stream mining methods to deal with vocabulary changes. In text mining, the vocabulary words constitute the feature space. A change in the feature space means that the model built upon the old words must be updated. It is impractical to do such an update whenever a new word appears or a word gets out of use. In OSCAR, we will rather accumulate information on the usage and sentiment of each word to highlight the long-term interplay between word polarity and document polarity. On this basis, we will design methods that assess the importance of a word for model adaptation, update the vocabulary by using only words that remain important for some time, and adapt models gradually.
Second, we will work on reducing the need for labeled documents. In stream classification, it is assumed that an expert is available at any time to label the arriving data instances. This assumption is waived in active learning, where only few instances are chosen for labeling - those expected to improve the model the most. Active learning methods assume a fixed feature space. In OSCAR, we will develop active stream learning methods that learn and adapt polarity models on an evolving feature space.
Third, we will work on dealing with different types of change simultaneously. To this purpose, we will use ensembles. We will dedicate some ensemble members to the identification of topic trends, others to changes in the vocabulary and others to temporal changes, including periodical ones. We will investigate ways of coordinating the ensemble members to ensure a smooth adaption of the final ensemble model at any time. The output of OSCAR will be a complete framework, encompassing active ensemble learning methods that deal with different forms of change and learn with limited expert involvement. The framework will also encompass coordinating components that weigh the contribution of individual models to the final one, and regulate the exchange of information between ensemble members and active learners.
We will test OSCAR on real data, mainly from Twitter: we will study how vocabulary changes and topics emerge and fade in streams of tweets for specific subject areas, and how they influence the learned model.