Welcome Guest [Log In] [Register]
We hope you enjoy your visit.


You're currently viewing the Ultimate 3D Community as a guest. This means that you can only read posts, but can not create posts or topics by yourself. To be able to post you need to register. Then you can participate in the community active and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.

Join our community!

If you are already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Unicode; The Main Idea
Topic Started: Oct 24 2009, 09:32 PM (1,065 Views)
NGen
Member Avatar
Advanced Member
[ *  *  * ]
Since many of us started with using GameMaker, we became accustomed to the familiar and easy-to-understand ASCII system, with a set of 127 or 255 characters, covering the entire English alphabet, numbers, a few symbols, and a few slots free for program-specific values.

However, technology began to reach other parts of the world, and using this small set of 255 characters just wouldn't cut it. A character set to cover the languages of all of the major civilizations of the world was needed, and thus, Unicode was created, using 16-bit values instead of 8-bit to hold characters. 16-bit characters have the ability to cover over 60,000 characters, which is definitely enough to hold the characters sets of all of the languages in the world.

Microsoft was already somewhat ahead of the others, and created the wchar_t variable as a part of the STD library. It was defined in <stddef.h> as:
Code:
 
typedef unsigned short wchar_t;

As you can see, this was also constructed as an unsigned, 16-bit variable to hold all of the characters of Unicode.


Currently, the Windows API has support for Unicode through 2 types of each function that supports text, for example, MessageBox.

MessageBox is actually a #define that replaces MessageBox with MessageBoxA (ASCII) or MessageBoxW (Wide-Char, Aka Unicode). If Unicode is supported by the system (as defined by the compiler), then MessageBoxW will be used in place of MessageBox. The text parameters for MessageBoxW are defined as LPCWSTR's (Long Pointer Constants to Wide-Character String. I don't know why the Long is there), as it is appropriate for use on Unicode systems. A LPCWSTR is equivalent to const wchar_t* If Unicode is not defined, MessageBoxA is used, where the text parameters are LPCSTR's (Long Pointer Constant to String). This is equivalent to a const char*.


Now, obviously most compilers treat whatever is enclosed in 2 quotation marks as char arrays regardless of whether or not the OS supports Unicode. This will be a problem when using text for Unicode character arrays. The solution is simply putting a 'L' in front of the text, as so:

Code:
 
wchar_t N[] = L"This is treated as a Wide-Character string!"

In the above code, the compiler would generate an array using wchar_t to contain the text instead of a char array. Note that if you used N[0], you would still receive the value of 84 as though it were a char array using the ASCII system. That's the great thing about it, there isn't much of a difference between the ASCII and Unicode character tables other than that Unicode is an extended version of ASCII. The ASCII table is still present in the Unicode table in the same positions. However, don't think that you can simply cast a wchar_t array to a char array. It would only result in a string double the size of the wchar_t string, and most of the characters probably won't look correct anyways. The correct way to convert a wchar_t string to a char string would simply be converting each character individually. For example:

Code:
 
wchar_t N[] = L"Hi!";
char NN[] = "...";
for ( int i = 0; i < 3; i++ ) {
NN[i] = (char) N[i];
}


However, this method is not recommended unless you only need the character set given by ASCII. For example, converting wchar_t with a Chinese character (something like 0x3000) to a char would not result in a char version of the character. In this case, it would actually be 0x00, since the conversion only takes the last 2 parts of the hex.



Now, most of us are accustomed to using the std::string class to manage text. Fortunately, there is already a wide-character string class set up, which is simply std::wstring. However, there is no overloaded operator for the 2 for setting a string to a wide-character string or vice-versa. The only way (as far as I know) would be to simply use the conversion method I stated before.
Edited by NGen, Oct 24 2009, 09:47 PM.
Offline Profile Quote Post Goto Top
 
Dr. Best
Member Avatar
Administrator
[ *  *  *  *  *  * ]
Nice tutorial. I never really started using Unicode. It can make things become a lot more complicated, because you will stumble upon libraries, which do not support it, quite often. Also you just can't get full Unicode character sets into glyph textures. That causes problems in 3D applications.

One more thing I'd like to see mentioned in the tutorial is that you can switch between Multi-Byte character sets (ASCII) and Unicode in the project settings when you are using Visual Studio. This influences which versions of the functions will be used.
Offline Profile Quote Post Goto Top
 
NGen
Member Avatar
Advanced Member
[ *  *  * ]
Quote:
 
Nice tutorial. I never really started using Unicode. It can make things become a lot more complicated, because you will stumble upon libraries, which do not support it, quite often. Also you just can't get full Unicode character sets into glyph textures. That causes problems in 3D applications.

Which is why I made that string class. ^_^

The Windows API has a large library of Font-related functions, so I'm wondering if you can use something in there for creating the textures.
Edited by NGen, Oct 25 2009, 11:53 PM.
Offline Profile Quote Post Goto Top
 
Dr. Best
Member Avatar
Administrator
[ *  *  *  *  *  * ]
NGen
Oct 25 2009, 11:53 PM
Quote:
 
Nice tutorial. I never really started using Unicode. It can make things become a lot more complicated, because you will stumble upon libraries, which do not support it, quite often. Also you just can't get full Unicode character sets into glyph textures. That causes problems in 3D applications.

Which is why I made that string class. ^_^
Yeah, but if you cast back to ASCII all the time, using Unicode does not help much. The idea behind Unicode is good and things would be nicer, if everybody would be using it everywhere. But that is simply not the case. For this reason using it can cause significantly more work. If your project requires support for many languages, this effort is worth it, otherwise it may be better to stick with ASCII. Using the country specific part of the character set (the second 128 values), you can get quite many languages. For example German Umlauts and Russian letters are no problem. Though logographic languages kill this approach, so especially big parts of the Asian market remain locked.

NGen
 
The Windows API has a large library of Font-related functions, so I'm wondering if you can use something in there for creating the textures.
Creating the textures is no problem. I have written a class, which does that. It is being used in Ultimate 3D 2.1 and all later versions. The problem is that languages like Chinese use very many characters, so you would need to use huge glyph textures. This takes a lot of video memory.
Offline Profile Quote Post Goto Top
 
NGen
Member Avatar
Advanced Member
[ *  *  * ]
Quote:
 
Yeah, but if you cast back to ASCII all the time, using Unicode does not help much.

Unfortunately, it's that kind of thinking that's making me wonder if the string class will actually be of any use. Still, it got me open to the idea of using templates to cover multiple classes, so I suppose it wasn't a complete waste.
Offline Profile Quote Post Goto Top
 
Dr. Best
Member Avatar
Administrator
[ *  *  *  *  *  * ]
NGen
Oct 26 2009, 02:32 AM
Quote:
 
Yeah, but if you cast back to ASCII all the time, using Unicode does not help much.

Unfortunately, it's that kind of thinking that's making me wonder if the string class will actually be of any use. Still, it got me open to the idea of using templates to cover multiple classes, so I suppose it wasn't a complete waste.
Yup, it is always nice to use opportunities for playing around with C++ language features. Being familiar with templates and inheritance can often lead you to solutions for problems, which avoid a lot of redundant code.
Offline Profile Quote Post Goto Top
 
« Previous Topic · Tutorials · Next Topic »
Add Reply