Abstract and Keywords
This chapter deals with the fundamental and challenging issue of the identification of wordhood in Chinese from both theoretical and computational perspectives. We follow the Lexical-Markup Framework definition of a word as a lexical entry, a unique form-meaning pair. This in turn leads to the discovery that the most robust orthographically relevant level in Chinese is semantics, as the language allows borrowing of non-Chinese phonemes through the limited use of mixed orthography. Based on our understanding of the sematic-based nature of Chinese words, we introduce different approach to the automatic identification of Chinese words (i.e., word segmentation). This chapter’s foci are on the two currently more successful approaches: character position tagging and word boundary decision.
Access to the complete content on Oxford Handbooks Online requires a subscription or purchase. Public users are able to search the site and view the abstracts and keywords for each book and chapter without a subscription.
If you have purchased a print title that contains an access token, please see the token for information about how to register your code.