2024-06-28 11:38:19 +08:00
|
|
|
/**
|
|
|
|
|
|
|
|
\page encoding_helper Encoding Helper
|
|
|
|
|
|
|
|
YYCC::EncodingHelper namespace include all encoding related functions:
|
|
|
|
|
|
|
|
\li The convertion between native string and UTF8 string which has been introduced in chapter \ref library_encoding.
|
|
|
|
\li Windows specific convertion between \c WCHAR, UTF8 string and string encoded by other encoding.
|
|
|
|
\li The convertion among UTF8, UTF16 and UTF32.
|
|
|
|
|
2024-07-04 20:26:59 +08:00
|
|
|
|
|
|
|
\section encoding_helper__native_utf8_conv Native & UTF8 Convertion
|
|
|
|
|
|
|
|
These convertion functions have been introduced in previous page.
|
|
|
|
See \ref library_encoding for more infomation.
|
|
|
|
|
|
|
|
\section encoding_helper__win_conv Windows Specific Convertion
|
|
|
|
|
2024-07-05 09:18:31 +08:00
|
|
|
During Windows programming, the convertion between Microsoft specified \c wchar_t and \c char is an essential operation.
|
|
|
|
Because Windows has 2 different function system, the functions ended with A and the functions ended with W.
|
|
|
|
(Microsoft specified \c wchar_t is \c 2 bytes long. It's different with Linux defined common 4 bytes long).
|
|
|
|
Thus YYCC provides these convertion functions in Windows to help programmer have better programming experience.
|
|
|
|
|
|
|
|
These functions are Windows specific, so they will be invisible in other platforms.
|
|
|
|
Please use them carefully (make sure that you are using them only in Windows environment).
|
|
|
|
|
|
|
|
YYCC supports following convertions:
|
|
|
|
|
|
|
|
\li \c WcharToChar: Convert \c wchar_t string to code page specified string.
|
|
|
|
\li \c CharToWchar: The reversed convertion of WcharToChar.
|
|
|
|
\li \c CharToChar: Convert string between 2 different code pages. It's a shortcut of calling CharToWchar and WcharToChar successively.
|
|
|
|
\li \c WcharToUTF8: Convert \c wchar_t string to UTF8 string.
|
|
|
|
\li \c UTF8ToWchar: The reversed convertion of WcharToUTF8.
|
|
|
|
|
|
|
|
Code Page is a Windows concept.
|
|
|
|
If you don't understand it, please view corresponding Microsoft documentation.
|
2024-07-04 20:26:59 +08:00
|
|
|
|
|
|
|
\section encoding_helper__utf_conv UTF8 UTF16 UTF32 Convertion
|
|
|
|
|
2024-07-05 09:18:31 +08:00
|
|
|
The convertion between UTF8, UTF16 and UTF32 is not common but essential.
|
|
|
|
These convertions can be achieved by standard library functions and classes.
|
|
|
|
(they are actually done by standard library functions in our implementation)
|
|
|
|
But we provided functions are easy to use and have clear interface.
|
|
|
|
|
|
|
|
These functions are different with the functions introduced above.
|
|
|
|
They can be used in any platform, not confined in Windows platforms.
|
|
|
|
|
|
|
|
YYCC supports following convertions:
|
|
|
|
|
|
|
|
\li \c UTF8ToUTF16: Convert UTF8 string to UTF16 string.
|
|
|
|
\li \c UTF16ToUTF8: The reversed convertion of UTF8ToUTF16.
|
|
|
|
\li \c UTF8ToUTF32: Convert UTF8 string to UTF32 string.
|
|
|
|
\li \c UTF32ToUTF8: The reversed convertion of UTF8ToUTF32.
|
2024-07-04 20:26:59 +08:00
|
|
|
|
|
|
|
\section encoding_helper__overloads Function Overloads
|
|
|
|
|
|
|
|
Every encoding convertion functions (except the convertion between UTF8 and native string) have 4 different overloads for different scenarios.
|
|
|
|
Take YYCC::EncodingHelper::WcharToChar for example.
|
|
|
|
There are following 4 overloads:
|
|
|
|
|
|
|
|
\code
|
|
|
|
bool WcharToChar(const std::wstring_view& src, std::string& dst, UINT code_page);
|
|
|
|
bool WcharToChar(const wchar_t* src, std::string& dst, UINT code_page);
|
|
|
|
std::string WcharToChar(const std::wstring_view& src, UINT code_page);
|
|
|
|
std::string WcharToChar(const wchar_t* src, UINT code_page);
|
|
|
|
\endcode
|
|
|
|
|
|
|
|
\subsection encoding_helper__overloads_destination Destination String
|
|
|
|
|
|
|
|
According to the return value, these 4 overload can be divided into 2 types.
|
|
|
|
The first type returns bool. The second type returns \c std::string instance.
|
|
|
|
|
|
|
|
For the first type, it always return bool to indicate whether the convertion is success.
|
|
|
|
Due to this, the function must require an argument for holding the result string.
|
|
|
|
So you can see the functions belonging to this type always require a reference to \c std::string in argument.
|
|
|
|
|
|
|
|
Oppositely, the second directly returns result by return value.
|
|
|
|
It doesn't care the success of convertion and will return empty string if convertion failed.
|
|
|
|
Programmer can more naturally use it because the retuen value itself is the result.
|
|
|
|
There is no need to declare a variable before calling convertion function for holding result.
|
|
|
|
|
|
|
|
All in all, the first type overload should be used in strict scope.
|
|
|
|
The success of convertion will massively affect the behavior of your following code.
|
|
|
|
For example, the convertion code is delivered to some system function and it should not be empty and etc.
|
|
|
|
The second type overload usually is used in lossen scenarios.
|
|
|
|
For exmaple, this overload usually is used in console output because it usually doesn't matter.
|
|
|
|
There is no risk even if the convertion failed (just output a blank string).
|
|
|
|
|
|
|
|
For the first type, please note that there is \b NO guarantee that the argument holding return value is not changed.
|
|
|
|
Even the convertion is failed, the argument holding return value may still be changed by function itself.
|
|
|
|
|
|
|
|
In this case, the type of result is \c std::string because this is function required.
|
|
|
|
In other functions, such as YYCC::EncodingHelper::WcharToUTF8, the type of result can be \c yycc_u8string or etc.
|
|
|
|
So please note the type of result is decided by convertion function itself, not only \c std::string.
|
|
|
|
|
|
|
|
\subsection encoding_helper__overloads__source Source String
|
|
|
|
|
|
|
|
According to the way providing source string,
|
|
|
|
these 4 overload also can be divided into 2 types.
|
|
|
|
The first type take a reference to constant \c std::wstring_view.
|
|
|
|
The second type take a pointer to constant wchar_t.
|
|
|
|
|
|
|
|
For first type, it will take the whole string for convertion, including \b embedded NUL terminal.
|
|
|
|
Please note we use string view as argument.
|
|
|
|
It is compatible with corresponding raw string pointer and string container.
|
|
|
|
So it is safe to directly pass \c std::wstring for this function.
|
|
|
|
|
|
|
|
For second type, it will assume that you passed argument is a NUL terminated string and send it for convertion.
|
|
|
|
|
|
|
|
The result is clear.
|
|
|
|
If you want to process string with \b embedded NUL terminal, please choose first type overload.
|
|
|
|
Otherwise the second type overload is enough.
|
|
|
|
|
|
|
|
Same as destination string, the type of source is also decided by the convertion function itself.
|
2024-07-05 09:18:31 +08:00
|
|
|
For exmaple, the type of source in YYCC::EncodingHelper::UTF8ToWchar is \c yycc_u8string_view and \c yycc_char8_t,
|
2024-07-04 20:26:59 +08:00
|
|
|
not \c std::wstring and \c wchar_t.
|
|
|
|
|
|
|
|
\subsection encoding_helper__overloads__extra Extra Argument
|
|
|
|
|
|
|
|
There is an extra argument called \c code_page for YYCC::EncodingHelper::WcharToChar.
|
2024-07-05 09:18:31 +08:00
|
|
|
It indicates the code page of destination string,
|
2024-07-04 20:26:59 +08:00
|
|
|
because this function will convert \c wchar_t string to the string with specified code page encoding.
|
|
|
|
|
|
|
|
Some convertion functions have extra argument like this,
|
|
|
|
because they need more infomations to decide what they need to do.
|
2024-07-05 09:18:31 +08:00
|
|
|
Some convertion functions don't have extra argument.
|
2024-07-04 20:26:59 +08:00
|
|
|
For exmaple, the convertion between \c wchar_t string and UTF8 string.
|
|
|
|
Because both source string and destination string are concrete.
|
|
|
|
There is no need to provide any more infomations.
|
|
|
|
|
|
|
|
\subsection encoding_helper__overloads__conclusion Conclusion
|
|
|
|
|
|
|
|
Mixing 2 types of source string and 2 types of destination string,
|
|
|
|
we have 4 different overload as we illustrated before.
|
|
|
|
Programmer can use them freely according to your requirements.
|
|
|
|
And don't forget to provide extra argument if function required.
|
|
|
|
|
2024-06-28 11:38:19 +08:00
|
|
|
*/
|