About Dialects of China data entries.
The project Dialects on Computer (DOC) file can be found at http://www.lang.cityu.edu.hk/chinese/DOCMAS9.TXT
The file docmas9.txt does not display IPA correctly in its text based format. I've converted the pronunciation section into IPA and include tone contours as well, and to use the powerful ability of web browsers to display Chinese and IPA characters in colour simultaneously in the form of a HTML file. After a little work I've reduced the final version to 5105 kb (from 9218 kb) in size. The Chinese characters still requires Big5 fonts. The new version is now called file doc-ipa.zip (455kb). Unzip the HTML source file first, to give doc-ipa.htm then opened in a web browser and is 5062 kb in size.
One major advantage of this HTML version is, you can select data items, and copy them from your browser to unicode enabled editors and still retain the Chinese character text, and correct IPA values. Using the docmas9.txt file, the character codes for the IPA is dependent on the docipa.ttf font which uses ordinary ascii characters to encode the IPA values.
This database, the docmas9.txt file, separates all the individual data into columns which allows easy manipulation if one is able to do computer programming. You can write a program to convert the text as I have done, or, manipulate the database to ouput only the things you require, for instance only a single dialect, characters of a single tone class, etc.
The copyright of the data given here remains as always with the DOC compilers. All I have done is merely changed it into a reader friendlier format.
© Dylan W.H. Sung
This page was created on Saturday 17th January 2004
and recently updated on Wednesday 21st January 2004.