Unicode solution
Unicode solution Unicode solution
Unicode: Wide byte character sets (from windows core Programming)
Unicode is Apple and Xerox Corporation in the 1988 establishment of a technology standard. 1991, a group set up a body responsible for the development and promotion of Unicode applications. The group from Apple, Compaq, IBM, Microsoft, Oracle, Silicon? Graphics,? Inc, Sybase, Unisys and Xerox, and other companies. The group responsible for the maintenance of Unicode standard.
Unicode provides a simple and consistent method that string. Unicode strings in all the characters are 16 (2 bytes). It has no specific bytes to specify a byte is the same as an integral part of a character, or a new character. This means that you only need to increase or decrease a guide, we can traverse the various characters in the string, call CharNext no longer need such a function. Since Unicode with a 16 to the value of each character said, a total of 65,000 characters can be, so that it can be for all countries in the world writing in the coding of all characters, far more than a single-byte character sets, 256 The number of characters.
The basic problem we are facing is the world's written language can not simply use 256 8 code. Previous solutions including code page and DBCS has proved to be unable to meet the need, but also stupid. That what is the real solution?
As writers, we experienced such problems. If too many things, with eight values can not be said, then we try a broader value, such as 16-bit values. Besides, this is very interesting, is being developed for Unicode. And the chaotic character code 256 images, as well as containing a number of byte code and the number of 2-byte code different double-byte character sets, Unicode is the reunification of the 16 systems to allow that 65,536 characters. This means that all characters and the use of hieroglyphs of the world's languages, including a series of mathematics, symbols and symbol sets monetary unit is sufficient.
Unicode and DBCS understand the difference between very important. Unicode use (especially in the C programming language environment), "wide Character Set." "The Unicode characters are each 16 rather than 8-bit wide interface." In Unicode, no mere eight numerical significance there. In contrast, in double-byte character set we are still dealing with eight numerical. Some self-byte character definition, and certain needs and bytes showed another common definition of a byte characters.
DBCS string dealing with a very cluttered, but processing Unicode text is like dealing with the order of the text. You may be pleased to know that the former 128 Unicode characters (16 code from 0 x0000 to 0 x007F) is the ASCII characters, and the next 128 Unicode characters (x0080 code from 0 to 0 x00FF) is ISO? 8859-1 on the expansion of ASCII. Unicode in different parts of the same characters are based on existing standards. This is to facilitate the conversion. Greek alphabet x0370 use from 0 to 0 x03FF code, the use of Slavic language x0400 from 0 to 0 x04FF code, the United States use x0530 from 0 to 0 x058F code, the use of Hebrew x0590 from 0 to 0 x05FF code. China, Japan and South Korea hieroglyphs (known as the CJK) occupiers from 0 x3000 to 0 x9FFF code.
Unicode is the greatest benefit here there is only one character set is not ambiguous. Unicode is actually personal computer industry companies in almost every important the result of collaborative work, and it with ISO? 10646-1 standards in the code is that he gathers. Unicode is an important reference, "The? Unicode? Standard, Version 2.0" (Addison-Wesley Publishing, 1996). This is a special book, it has little other documents show that the way the world's written language and the richness of diversity. In addition, the book also provides Unicode development of the basic principles and details.
Unicode shortcomings? Of course there are. Unicode strings occupied memory is twice the ASCII string. (However compressed files contribute greatly reduce the disk space of the paper.), But perhaps the worst shortcomings are as follows: Relatively speaking people not accustomed to using Unicode. As writers, and this is our work
Posted on 2006-03-27 17:49 xnabx Reading (38) Comments (0) edit collections cited
Tags: java hash solution, java unicode






