Statistic and Model Based
Learning Methods for the
Extension of Unification Grammars

1996, German, 100+ pages, postscript, .ps.gz, code (.tar.gz), abstract (German, English).
Diplom thesis at the University of Erlangen-Nuremberg, Germany.

Systems to process natural languages often use grammars to model syntax. The writing of these grammars, however, is very time expensive. It even seems to be impossible to ever write a grammar that covers a natural language completely. There are always previously unconsidered special cases and exceptions. Hence, after each testing cycle, new grammar rules have to be added.

Miles Osborne introduced a system for the English language where additional rules are added to a give basic grammar by learning. The rule construction uses the charts of previously not successfully parsed training sentences. The system consists of a linguistically motivated model based component and a statistically oriented data driven component to construct new rules and evaluate them.

It has been the objective of this thesis to make this system applicable for German. Since the German language differs significantly from English, especially the model based component could not used in its original form. A new grammar formalism and a new grammar model had to be developed. Also the generation of new rules is more restricted to keep a higher linguistic plausibility of the created grammar. The system was implemented in Prolog, C and Perl.

The experiments demonstrated that the system can learn new rules successfully that allowed to parse the majority of the test sentences.