Jakob, Niklas (2011)
Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication
|
PDF
Diss.pdf Copyright Information: CC BY-NC-ND 2.5 Generic - Creative Commons, Attribution, NonCommercial, NoDerivs . Download (2MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems | ||||
Language: | English | ||||
Referees: | Gurevych, Prof. Dr. Iryna ; Heyer, Prof. Dr. Gerhard | ||||
Date: | 24 May 2011 | ||||
Place of Publication: | Darmstadt | ||||
Date of oral examination: | 18 May 2011 | ||||
Abstract: | With the growing popularity of online shopping, most e-commerce websites nowadays offer their customers to leave feedback about their purchases. This form of customer or user interaction is also very popular among Web 2.0 websites. Online databases, e.g. of movies, offer their users incentives to participate in the content creation by giving them the opportunity to rate films and write reviews about them. Complete websites, e.g. rateitall.com, have emerged, which allow their users to rate and review virtually anything they care about. As more and more content is created and aggregated on these websites, a strong demand for automatic approaches which are capable of extracting structured information from mostly unstructured text has emerged. An automatic extraction of the opinions uttered in the thousands of user-generated texts can provide interesting data for several other tasks such as question answering, information retrieval and summarization. All of these tasks require an opinion mining system, which analyzes the individual elements of an opinion on a sentence level, i.e. the terms which express the opinion, their polarity, and what the opinion is about. In this thesis, we present a comprehensive study of the automatic extraction of opinions with a focus on opinion targets, which is an essential step in order to enable other tasks, e.g. information retrieval or question answering on opinionated content. We analyze the state-of-the-art in opinion mining and divide it into three subtasks, one of which is the extraction of opinion targets. We perform a comparative evaluation of two unsupervised algorithms in the task of opinion target extraction on datasets of customer reviews and blog postings which span the following four different domains: digital cameras, cars, movies and web-services. We show how the identification of opinion expressions influences the opinion target extraction performance of each algorithm. We also show that a simple word distance-based heuristic significantly outperforms both unsupervised algorithms, which make their relevance decision by analyzing word frequencies in the corpus. The word distance-based heuristic reaches an F-Measure between 0.372 and 0.491 on the four datasets. We furthermore evaluate a state-of-the-art supervised algorithm in the task of opinion target extraction and present a new approach which is based on Conditional Random Fields (CRF). Our approach outperforms the state-of-the-art baseline significantly on all four datasets reaching an F-Measure between 0.497 and 0.702. We also evaluate both algorithms in a cross-domain opinion target extraction task, since a common problem with supervised algorithms is the domain dependence of the learned model. In this setting, our CRF-based approach also outperforms the baseline on all four datasets and it outperforms the best unsupervised approach, which is by design not prone to domain dependence, on three of the four datasets mentioned above. In the cross-domain opinion target extraction task, the CRF-based approach reaches an F-Measure between 0.360 and 0.518 on the four datasets. The extraction of opinion targets, which are referenced by anaphoric expressions, is a challenge which is frequently encountered in opinion mining at the phrase level. For the first time, we integrate anaphora resolution algorithms in a supervised opinion mining system. We perform a comparative evaluation of two algorithms, in which we require them to extract the correct antecedent of anaphoric targets. Our results indicate that one of the algorithms, which was designed for high-precision anaphora resolution, is better suited in the opinion mining setting. By extending the algorithm, which yields the best results in the off-the-shelf configuration, we yield significant improvements regarding the extraction of opinion targets on three of the four datasets. Finally, we show how an opinion mining system can be successfully employed to improve another application. Recommendation systems are nowadays widely used in online platforms and desktop applications in order to suggest goods or pieces of art to users, which they do not know yet, but are likely to enjoy. The recommendations for a user U1 are determined by first profiling the taste and interests of all users of the recommendation system. Then the algorithm identifies other users U2 ... Un which have a similar taste as user U1, and then recommends items to U1 which the users who have a similar taste enjoyed. A user's taste and interests are typically profiled by giving him the option to rate entities, which he has consumed. As mentioned above, website operators have also given users the opportunity to leave their ratings not only on a numerical scale, but also via a free-text review. We hypothesize that these free-text reviews contain a lot of information, expressed in the users' opinions, which would allow us to model his taste and preferences on a very fine granularity. We show that, by integrating our opinion mining system as a feature provider to a state-of-the-art recommendation system, we can significantly improve the accuracy of the recommendations, which we evaluate on a dataset of movie ratings and reviews. |
||||
Alternative Abstract: |
|
||||
Alternative keywords: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-26090 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science 20 Department of Computer Science > Ubiquitous Knowledge Processing |
||||
Date Deposited: | 22 Jun 2011 07:06 | ||||
Last Modified: | 08 Jul 2020 23:55 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/2609 | ||||
PPN: | 267353316 | ||||
Export: |
View Item |