Unlike western languages which have a small alphabet, Chinese has a much larger alphabet consists of several thousand Chinese characters (Han Zi). In order to input Chinese text with the usual keyboard originally designed for inputing English, a coding scheme that maps a series of Latin characters to Chinese text is required. A input method us a software that resides between normal user input interface and the low level system I/O routine, mapping series of key sequences to Chinese text transparently.
There are several widely used coding schemes for inputing Chinese text. The Pin Yin method is the most popular input method among Mandarin speakers. It maps a phonetic (Pin Yin) code to one or more Chinese characters. For instance, the following sample illustrates inputing a sentence using Pin Yin method:
Pin Yin Decoding zhong hua ren min gong he guo ----------------> The People's Republic of China
One major deficiency of Pin Yin method is the translation ambiguity: one phonetic code can map up to 100 different Chinese characters. This is not surprising, since the coding scheme needs to represent thousands of different Chinese characters with only 417 different phonetic code. Hence a user must select the character he or she wants during inputing, which greatly slow down the input speed. A good input method should have the ability to resolve translation ambiguity in some intelligent way and minimize the need of human intervention.
A good Chinese input method should have the following property:
Typically,the process of Phonetic to Chinese conversion consists of two stages:
Currently, a Trigram SLM is built to select the best path from
the lattice. The adaptive part of the system is implemented with a
Memory-based Learner aims at adjusting the model's parameters
according to user's preference on-line. Both Pin Yin and Wu
Bi are supported in whole sentence input mode. The input
method conforms to the XIM protocol and works as a standalone XIM server
under X-Window (Linux and FreeBSD). This software
is still in its early stage and no code is available yet. However, you can
view some fancy screen-shot:
Wu Bi input method in action
Pin Yin input method in action
This project is suspended as of 2004, and probably will not be developed for a long time. The major reason is that I have become an experienced WuBi user and am satisfied with my current inputting speed: 35 - 60 characters per minute. Therefore I lose interest in developing a PinYin solution that is actually much slower than WuBi (at least for myself). If you are a PinYin user and have not used WuBi before, I recommend you have a try, and you will be highly rewarded in the end.