This Demo: This is the LingPipe demo for Chinese word segmentation, also known as tokenization. It wraps Chinese words in XML elements. The notion of word is derived from the corpus prepared by Academia Sinica.
General Web Demo Instructions
Set the web browser's character encoding based on the encoding of text to be submitted (Use browser menu=View, submenu=Encoding).
Set the input character encoding to match the actual encoding of the input bytes.
Set the input type selection to match the type of the input, either plain text, HTML or XML.
Set the output character encoding to any value; it need not match the input character set or browser.
Cut-and-paste or enter text in the specified character encoding.
To analyze a file, first switch to the file input form, the link for which is next to the submit button.