Heuristic Rule Learning.
Technische Universität, Darmstadt
[Ph.D. Thesis], (2012)
Available under Creative Commons Attribution Non-commercial No Derivatives, 2.5.
Download (2MB) | Preview
|Item Type:||Ph.D. Thesis|
|Title:||Heuristic Rule Learning|
The primary goal of the research reported in this thesis is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm both in classification and regression. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics for classification. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. Additionally, we evaluate the spectrum of different search strategies to see whether separate-and-conquer rule learning algorithms are able to gain performance in terms of predictive accuracy or theory size by using more powerful search strategies like beam search or exhaustive search. Unlike previous results that demonstrated that rule learning algorithms suffer from oversearching, our work pays particular attention to the interaction between the search heuristic and the search strategy. Our results show that exhaustive search has primarily the effect of finding longer, but nevertheless more general rules than hill-climbing search.
A second objective is the design of a regression rule learning algorithm. To do so, a novel parametrized regression heuristic is introduced and its parameter is tuned in the same way as before. A new splitpoint generation method is introduced for the efficient handling of numerical attributes. We show that this metric-based algorithm performs comparable to several other regression algorithms. Furthermore, we propose a novel approach for learning regression rules by transforming the regression problem into a classification problem. The key idea is to dynamically define a region around the target value predicted by the rule, and considering all examples within that region as positive and all examples outside that region as negative. In this way, conventional rule learning heuristics may be used for inducing regression rules. Our results show that our heuristic algorithm outperforms approaches that use a static discretization of the target variable, and performs en par with other comparable rule-based approaches, albeit without reaching the performance of statistical approaches.
In the end, two case studies on real world problems are presented. The first one deals with the problem of predicting skin cancer and the second one is about deciding whether or not students have to be invited to a counseling session. For reasons of interpretability, rules were perfectly suited to work with in both case studies. The results show that the derived rule-based algorithms are able to find rules that are very diverse, proved to be interesting and are also sufficiently accurate. All experiments were performed in the SECO-framework, a new versatile framework for heuristic rule learning, which allows for an easy configuration of a wide range of different components rule learners consist of.
|Place of Publication:||Darmstadt|
|Uncontrolled Keywords:||heuristics, regression, meta-learning, parameter tuning, coverage space, search algorithms, oversearching, reduction|
|Classification DDC:||000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik|
|Divisions:||20 Department of Computer Science
20 Department of Computer Science > Knowledge Engineering
|Date Deposited:||26 Nov 2012 13:13|
|Last Modified:||07 Dec 2012 12:05|
|Referees:||Fürnkranz, Prof. Dr. Johannes and Lavesson, PhD, Prof. Niklas|
|Refereed:||10 October 2012|