TU Darmstadt / ULB / TUprints

Self-Imitation Regularization: Regularizing Neural Networks by Leveraging Their Dark Knowledge

Jäger, Jonas (2019)
Self-Imitation Regularization: Regularizing Neural Networks by Leveraging Their Dark Knowledge.
Technische Universität
Bachelor Thesis, Primary publication

TUDthesis_complete.pdf - Submitted Version
Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs.

Download (6MB) | Preview
Item Type: Bachelor Thesis
Type of entry: Primary publication
Title: Self-Imitation Regularization: Regularizing Neural Networks by Leveraging Their Dark Knowledge
Language: English
Referees: Fürnkranz, Prof. Dr. Johannes ; Loza Mencía, Dr. Eneldo
Date: 4 June 2019
Place of Publication: Darmstadt
Date of oral examination: 23 May 2019

Deep Learning, the learning of deep neural networks, is nowadays indispensable not only in the fields of computer science and information technology but also in innumerable areas of daily life. It is one of the key technologies in the development of artificial intelligence and will continue to be of great importance in the future, e.g., in the development of autonomous driving. Since the data for learning such (deep) neural networks is clearly limited and therefore the neural network cannot be prepared for all possible data which have to be handled in real-life situations, a solid generalization capability is necessary. This means the ability to acquire a general concept from the training data, so that the task associated with the data is properly understood and the training data is not simply memorized. An essential component behind such a generalization capability is regularization.

A regularization procedure causes a neural network to generalize (better) when learning from data. Various, commonly and widespread used regularization procedures exist, which, for example, limit the weights in the neural network (parameters by whose adaptation learning takes place) or temporarily change its structure, and thus implicitly aim for the neural network to make better predictions on as yet unseen data.

Self-Imitation Regularization (SIR) is presented in this thesis and is an easy to implement regularization procedure which, in contrast to the established standard regularization procedures, explicitly addresses the actual objective - the formation of predictions - and only implicitly influences the weights/parameters in the neural network. The existing (dark) knowledge of the learning neural network is used and explicitly involved in the learning principles (i.e., the error function to be minimized) of the neural network. Since this is one's own knowledge, which, in turn, is made available during learning, this can be seen as a form of self-imitation. For a given data example, the (dark) knowledge contains, on the one hand, information about the similarities to other classes (in a classification problem, a neural network predicts classes and classifies the data). On the other hand, it quantifies (relative to other data examples) the confidence of the neural network in the prediction for this data example. Intuitively, through self-imitation, this information induces a questioning behavior regarding the correctness of the given solutions in the training data as well as deepens the understanding of correlations and similarities between the classes.

Besides the regularization ability, which has strong guarantees of success (partially under statistical significance), the use of SIR stabilizes the training, increases the data efficiency and is resistant to erroneous data labels, which was demonstrated in various experiments. It is also applicable to very deep neural network architectures and can be combined with some standard regularization methods (i.e., dropout and maxnorm regularization). The implementation and additional computation effort is very low while the hyperparameter tuning is simple.

In this thesis, experimental results on the use of SIR are analyzed and evaluated using several procedures, and the learned neural networks are examined closely in order to explain the regularization behavior along with accompanying properties.

URN: urn:nbn:de:tuda-tuprints-87175
Divisions: 20 Department of Computer Science > Knowl­edge En­gi­neer­ing
Date Deposited: 14 Jun 2019 14:41
Last Modified: 14 Jun 2019 14:41
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/8717
PPN: 449933113
Actions (login required)
View Item View Item