doc: finish encoding doc
This commit is contained in:
@@ -1,4 +1,4 @@
|
|||||||
namespace YYCC::ConsoleHelper {
|
namespace yycc::carton::csconsole {
|
||||||
/**
|
/**
|
||||||
|
|
||||||
\page csconsole Universal IO Function
|
\page csconsole Universal IO Function
|
||||||
@@ -8,8 +8,8 @@ because Windows is lacking in UTF8 console IO.
|
|||||||
|
|
||||||
\section csconsole__deprecation Deprecation Notes
|
\section csconsole__deprecation Deprecation Notes
|
||||||
|
|
||||||
This namespace, or thie module is deprecated.
|
This namespace, or this module is deprecated.
|
||||||
It provided functions are too aggressive and can not cover all use scenarios.
|
Its provided functions are too aggressive and can not cover all use scenarios.
|
||||||
So it is suggested not to use this namespace.
|
So it is suggested not to use this namespace.
|
||||||
Programmers should handle Windows UTF8 issues on their own.
|
Programmers should handle Windows UTF8 issues on their own.
|
||||||
|
|
||||||
|
|||||||
@@ -1,148 +0,0 @@
|
|||||||
namespace YYCC::EncodingHelper {
|
|
||||||
/**
|
|
||||||
|
|
||||||
\page encoding_helper Encoding Helper
|
|
||||||
|
|
||||||
YYCC::EncodingHelper namespace include all encoding related functions:
|
|
||||||
|
|
||||||
\li The convertion between ordinary string and UTF8 string which has been introduced in chapter \ref library_encoding.
|
|
||||||
\li Windows specific convertion between \c WCHAR, UTF8 string and string encoded by other encoding.
|
|
||||||
\li The convertion among UTF8, UTF16 and UTF32.
|
|
||||||
|
|
||||||
\section encoding_helper__ordinary_utf8_conv Ordinary & UTF8 Convertion
|
|
||||||
|
|
||||||
These convertion functions have been introduced in previous page.
|
|
||||||
See \ref library_encoding for more infomation.
|
|
||||||
|
|
||||||
YYCC supports following convertions:
|
|
||||||
|
|
||||||
\li #ToUTF8: Convert ordinary string to UTF8 string.
|
|
||||||
\li #ToUTF8View: Same as ToUTF8, but return string view instead.
|
|
||||||
\li #ToOrdinary: Convert UTF8 string to ordinary string.
|
|
||||||
\li #ToOrdinaryView: Same as ToOrdinary, but return string view instead.
|
|
||||||
|
|
||||||
\section encoding_helper__win_conv Windows Specific Convertion
|
|
||||||
|
|
||||||
During Windows programming, the convertion between Microsoft specified \c wchar_t and \c char is an essential operation.
|
|
||||||
Because Windows has 2 different function system, the functions ended with A and the functions ended with W.
|
|
||||||
(Microsoft specified \c wchar_t is \c 2 bytes long. It's different with Linux defined common 4 bytes long).
|
|
||||||
Thus YYCC provides these convertion functions in Windows to help programmer have better programming experience.
|
|
||||||
|
|
||||||
These functions are Windows specific, so they will be invisible in other platforms.
|
|
||||||
Please use them carefully (make sure that you are using them only in Windows environment).
|
|
||||||
|
|
||||||
YYCC supports following convertions:
|
|
||||||
|
|
||||||
\li #WcharToChar: Convert \c wchar_t string to code page specified string.
|
|
||||||
\li #CharToWchar: The reversed convertion of WcharToChar.
|
|
||||||
\li #CharToChar: Convert string between 2 different code pages. It's a shortcut of calling CharToWchar and WcharToChar successively.
|
|
||||||
\li #WcharToUTF8: Convert \c wchar_t string to UTF8 string.
|
|
||||||
\li #UTF8ToWchar: The reversed convertion of WcharToUTF8.
|
|
||||||
\li #CharToUTF8: Convert code page specified string to UTF8 string.
|
|
||||||
\li #UTF8ToChar: The reversed convertion of CharToUTF8.
|
|
||||||
|
|
||||||
Code Page is a Windows concept.
|
|
||||||
If you don't understand it, please view corresponding Microsoft documentation.
|
|
||||||
|
|
||||||
\section encoding_helper__utf_conv UTF8 UTF16 UTF32 Convertion
|
|
||||||
|
|
||||||
The convertion between UTF8, UTF16 and UTF32 is not common but essential.
|
|
||||||
These convertions can be achieved by standard library functions and classes.
|
|
||||||
(they are actually done by standard library functions in our implementation)
|
|
||||||
But we provided functions are easy to use and have clear interface.
|
|
||||||
|
|
||||||
These functions are different with the functions introduced above.
|
|
||||||
They can be used in any platform, not confined in Windows platforms.
|
|
||||||
|
|
||||||
YYCC supports following convertions:
|
|
||||||
|
|
||||||
\li #UTF8ToUTF16: Convert UTF8 string to UTF16 string.
|
|
||||||
\li #UTF16ToUTF8: The reversed convertion of UTF8ToUTF16.
|
|
||||||
\li #UTF8ToUTF32: Convert UTF8 string to UTF32 string.
|
|
||||||
\li #UTF32ToUTF8: The reversed convertion of UTF8ToUTF32.
|
|
||||||
|
|
||||||
\section encoding_helper__overloads Function Overloads
|
|
||||||
|
|
||||||
Every encoding convertion functions (except the convertion between UTF8 and ordinary string) have 4 different overloads for different scenarios.
|
|
||||||
Take #WcharToChar for example.
|
|
||||||
There are following 4 overloads:
|
|
||||||
|
|
||||||
\code
|
|
||||||
bool WcharToChar(const std::wstring_view& src, std::string& dst, UINT code_page);
|
|
||||||
bool WcharToChar(const wchar_t* src, std::string& dst, UINT code_page);
|
|
||||||
std::string WcharToChar(const std::wstring_view& src, UINT code_page);
|
|
||||||
std::string WcharToChar(const wchar_t* src, UINT code_page);
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
\subsection encoding_helper__overloads_destination Destination String
|
|
||||||
|
|
||||||
According to the return value, these 4 overload can be divided into 2 types.
|
|
||||||
The first type returns bool. The second type returns \c std::string instance.
|
|
||||||
|
|
||||||
For the first type, it always return bool to indicate whether the convertion is success.
|
|
||||||
Due to this, the function must require an argument for holding the result string.
|
|
||||||
So you can see the functions belonging to this type always require a reference to \c std::string in argument.
|
|
||||||
|
|
||||||
Oppositely, the second directly returns result by return value.
|
|
||||||
It doesn't care the success of convertion and will return empty string if convertion failed.
|
|
||||||
Programmer can more naturally use it because the retuen value itself is the result.
|
|
||||||
There is no need to declare a variable before calling convertion function for holding result.
|
|
||||||
|
|
||||||
All in all, the first type overload should be used in strict scope.
|
|
||||||
The success of convertion will massively affect the behavior of your following code.
|
|
||||||
For example, the convertion code is delivered to some system function and it should not be empty and etc.
|
|
||||||
The second type overload usually is used in lossen scenarios.
|
|
||||||
For exmaple, this overload usually is used in console output because it usually doesn't matter.
|
|
||||||
There is no risk even if the convertion failed (just output a blank string).
|
|
||||||
|
|
||||||
For the first type, please note that there is \b NO guarantee that the argument holding return value is not changed.
|
|
||||||
Even the convertion is failed, the argument holding return value may still be changed by function itself.
|
|
||||||
|
|
||||||
In this case, the type of result is \c std::string because this is function required.
|
|
||||||
In other functions, such as #WcharToUTF8, the type of result can be \c yycc_u8string or etc.
|
|
||||||
So please note the type of result is decided by convertion function itself, not only \c std::string.
|
|
||||||
|
|
||||||
\subsection encoding_helper__overloads__source Source String
|
|
||||||
|
|
||||||
According to the way providing source string,
|
|
||||||
these 4 overload also can be divided into 2 types.
|
|
||||||
The first type take a reference to constant \c std::wstring_view.
|
|
||||||
The second type take a pointer to constant \c wchar_t.
|
|
||||||
|
|
||||||
For first type, it will take the whole string for convertion, including \b embedded NUL terminal.
|
|
||||||
Please note we use string view as argument.
|
|
||||||
It is compatible with corresponding raw string pointer and string container.
|
|
||||||
So it is safe to directly pass \c std::wstring for this function.
|
|
||||||
|
|
||||||
For second type, it will assume that you passed argument is a NUL terminated string and send it for convertion.
|
|
||||||
|
|
||||||
The result is clear.
|
|
||||||
If you want to process string with \b embedded NUL terminal, please choose first type overload.
|
|
||||||
Otherwise the second type overload is enough.
|
|
||||||
|
|
||||||
Same as destination string, the type of source is also decided by the convertion function itself.
|
|
||||||
For exmaple, the type of source in #UTF8ToWchar is \c yycc_u8string_view and \c yycc_char8_t,
|
|
||||||
not \c std::wstring and \c wchar_t.
|
|
||||||
|
|
||||||
\subsection encoding_helper__overloads__extra Extra Argument
|
|
||||||
|
|
||||||
There is an extra argument called \c code_page for #WcharToChar.
|
|
||||||
It indicates the code page of destination string,
|
|
||||||
because this function will convert \c wchar_t string to the string with specified code page encoding.
|
|
||||||
|
|
||||||
Some convertion functions have extra argument like this,
|
|
||||||
because they need more infomations to decide what they need to do.
|
|
||||||
Some convertion functions don't have extra argument.
|
|
||||||
For exmaple, the convertion between \c wchar_t string and UTF8 string.
|
|
||||||
Because both source string and destination string are concrete.
|
|
||||||
There is no need to provide any more infomations.
|
|
||||||
|
|
||||||
\subsection encoding_helper__overloads__conclusion Conclusion
|
|
||||||
|
|
||||||
Mixing 2 types of source string and 2 types of destination string,
|
|
||||||
we have 4 different overload as we illustrated before.
|
|
||||||
Programmer can use them freely according to your requirements.
|
|
||||||
And don't forget to provide extra argument if function required.
|
|
||||||
|
|
||||||
*/
|
|
||||||
}
|
|
||||||
202
doc/src/carton/pycodec.dox
Normal file
202
doc/src/carton/pycodec.dox
Normal file
@@ -0,0 +1,202 @@
|
|||||||
|
namespace yycc::carton::pycodec {
|
||||||
|
/**
|
||||||
|
\page pycodec Unified Codec (Python-like Codec)
|
||||||
|
|
||||||
|
\section pycodec__overview Overview
|
||||||
|
|
||||||
|
The unified encoding conversion module provides a consistent interface for character encoding conversion across different platforms.
|
||||||
|
It automatically selects the appropriate backend implementation based on the platform and available features.
|
||||||
|
|
||||||
|
\section pycodec__classes Available Classes
|
||||||
|
|
||||||
|
\subsection pycodec__classes__char Character to/from UTF-8 Conversion
|
||||||
|
|
||||||
|
Convert between named encodings and UTF-8 using a unified interface:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/carton/pycodec.hpp>
|
||||||
|
|
||||||
|
// Example: Converting from a named encoding to UTF-8
|
||||||
|
CharToUtf8 converter("GBK"); // or "ISO-8859-1", "SHIFT-JIS", etc.
|
||||||
|
|
||||||
|
std::string gbk_text = "你好,世界!";
|
||||||
|
auto result = converter.to_utf8(gbk_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting from UTF-8 to a named encoding
|
||||||
|
Utf8ToChar converter("GBK");
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = converter.to_char(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::string gbk_text = result.value();
|
||||||
|
// Use gbk_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection pycodec__classes__wchar Wide Character to/from UTF-8 Conversion
|
||||||
|
|
||||||
|
Convert between wide character strings and UTF-8:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/carton/pycodec.hpp>
|
||||||
|
|
||||||
|
// Example: Converting wide character to UTF-8
|
||||||
|
WcharToUtf8 converter;
|
||||||
|
|
||||||
|
std::wstring wide_text = L"Hello, 世界!";
|
||||||
|
auto result = converter.to_utf8(wide_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-8 to wide character
|
||||||
|
Utf8ToWchar converter;
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = converter.to_wchar(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::wstring wide_text = result.value();
|
||||||
|
// Use wide_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection pycodec__classes__utf16_utf32 UTF-8 to/from UTF-16/UTF-32 Conversion
|
||||||
|
|
||||||
|
Convert between UTF encodings:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/carton/pycodec.hpp>
|
||||||
|
|
||||||
|
// Example: Converting UTF-8 to UTF-16
|
||||||
|
Utf8ToUtf16 converter;
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = converter.to_utf16(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u16string utf16_text = result.value();
|
||||||
|
// Use utf16_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-16 to UTF-8
|
||||||
|
Utf16ToUtf8 converter;
|
||||||
|
|
||||||
|
std::u16string utf16_text = u"Hello, 世界!";
|
||||||
|
auto result = converter.to_utf8(utf16_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-8 to UTF-32
|
||||||
|
Utf8ToUtf32 converter;
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界! 🌍";
|
||||||
|
auto result = converter.to_utf32(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u32string utf32_text = result.value();
|
||||||
|
// Use utf32_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-32 to UTF-8
|
||||||
|
Utf32ToUtf8 converter;
|
||||||
|
|
||||||
|
std::u32string utf32_text = U"Hello, 世界! 🌍";
|
||||||
|
auto result = converter.to_utf8(utf32_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\section pycodec__utility Utility Functions
|
||||||
|
|
||||||
|
\subsection pycodec__utility__validation Encoding Name Validation
|
||||||
|
|
||||||
|
Check if an encoding name is valid in the current environment:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/carton/pycodec.hpp>
|
||||||
|
|
||||||
|
// Example: Validating an encoding name
|
||||||
|
bool is_valid = is_valid_encoding_name(u8"UTF-8");
|
||||||
|
if (is_valid) {
|
||||||
|
std::cout << "UTF-8 is a valid encoding name\n";
|
||||||
|
} else {
|
||||||
|
std::cout << "UTF-8 is not supported\n";
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test another encoding
|
||||||
|
is_valid = is_valid_encoding_name(u8"GBK");
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\section pycodec__error_handling Error Handling
|
||||||
|
|
||||||
|
All functions in this module return a result containing either
|
||||||
|
a ConvError struct represents conversion errors, or the final converted string.
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/carton/pycodec.hpp>
|
||||||
|
|
||||||
|
CharToUtf8 converter("INVALID_ENCODING_NAME");
|
||||||
|
std::string text = "Hello";
|
||||||
|
|
||||||
|
auto result = converter.to_utf8(text);
|
||||||
|
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string converted = result.value();
|
||||||
|
// Process successfully converted string
|
||||||
|
} else {
|
||||||
|
// Handle conversion failure
|
||||||
|
std::cout << "Conversion failed\n";
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\section pycodec__backend_specifics Platform-Specific Backends
|
||||||
|
|
||||||
|
For detailed information about the specific platform backends, see:
|
||||||
|
|
||||||
|
\li \ref encoding__windows : Windows-specific implementation using Win32 APIs
|
||||||
|
\li \ref encoding__iconv : Iconv-based implementation for POSIX-like systems
|
||||||
|
|
||||||
|
\section pycodec__notes Notes
|
||||||
|
|
||||||
|
For all supported encoding names and their aliases,
|
||||||
|
please browse code written in <TT>script/pycodec</TT> located in our source code.
|
||||||
|
|
||||||
|
Please also note that not all encoding name has implementation for all platforms.
|
||||||
|
Some uncommon encoding names are not supported on some backend due to the limitations of the corresponding baskend.
|
||||||
|
These also can be found in that directory introduced above.
|
||||||
|
|
||||||
|
*/
|
||||||
|
}
|
||||||
166
doc/src/encoding/iconv.dox
Normal file
166
doc/src/encoding/iconv.dox
Normal file
@@ -0,0 +1,166 @@
|
|||||||
|
namespace yycc::encoding::iconv {
|
||||||
|
/**
|
||||||
|
\page encoding__iconv Iconv-based Codec
|
||||||
|
|
||||||
|
\section encoding__iconv__overview Overview
|
||||||
|
|
||||||
|
The Iconv-based encoding conversion module provides encoding conversion functionality using the iconv library.
|
||||||
|
This module is available when you are in POSIX system, or enable iconv support manually when configuring the library.
|
||||||
|
|
||||||
|
\section encoding__iconv__classes Available Classes
|
||||||
|
|
||||||
|
\subsection encoding__iconv__classes__char Char to/from UTF-8 Conversion
|
||||||
|
|
||||||
|
Convert between character encodings and UTF-8:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/iconv.hpp>
|
||||||
|
|
||||||
|
// Example: Creating a converter from Latin-1 to UTF-8
|
||||||
|
CharToUtf8 converter("ISO-8859-1");
|
||||||
|
|
||||||
|
std::string latin1_text = "Café résumé naïve";
|
||||||
|
auto result = converter.to_utf8(latin1_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Creating a converter from UTF-8 to Latin-1
|
||||||
|
Utf8ToChar converter("ISO-8859-1");
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Café résumé naïve";
|
||||||
|
auto result = converter.to_char(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::string latin1_text = result.value();
|
||||||
|
// Use latin1_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection encoding__iconv__classes__wchar WChar to/from UTF-8 Conversion
|
||||||
|
|
||||||
|
Convert between wide character and UTF-8:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/iconv.hpp>
|
||||||
|
|
||||||
|
// Example: Converting wide character to UTF-8
|
||||||
|
WcharToUtf8 converter;
|
||||||
|
|
||||||
|
std::wstring wide_text = L"Hello, 世界!";
|
||||||
|
auto result = converter.to_utf8(wide_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-8 to wide character
|
||||||
|
Utf8ToWchar converter;
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = converter.to_wchar(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::wstring wide_text = result.value();
|
||||||
|
// Use wide_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection encoding__iconv__classes__utf16_utf32 UTF-8 to/from UTF-16/UTF-32 Conversion
|
||||||
|
|
||||||
|
Convert between UTF encodings:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/iconv.hpp>
|
||||||
|
|
||||||
|
// Example: Converting UTF-8 to UTF-16
|
||||||
|
Utf8ToUtf16 converter;
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = converter.to_utf16(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u16string utf16_text = result.value();
|
||||||
|
// Use utf16_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-16 to UTF-8
|
||||||
|
Utf16ToUtf8 converter;
|
||||||
|
|
||||||
|
std::u16string utf16_text = u"Hello, 世界!";
|
||||||
|
auto result = converter.to_utf8(utf16_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-8 to UTF-32
|
||||||
|
Utf8ToUtf32 converter;
|
||||||
|
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界! 🌍";
|
||||||
|
auto result = converter.to_utf32(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u32string utf32_text = result.value();
|
||||||
|
// Use utf32_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-32 to UTF-8
|
||||||
|
Utf32ToUtf8 converter;
|
||||||
|
|
||||||
|
std::u32string utf32_text = U"Hello, 世界! 🌍";
|
||||||
|
auto result = converter.to_utf8(utf32_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\section encoding__iconv__error_handling Error Handling
|
||||||
|
|
||||||
|
All functions in this module return a result containing either
|
||||||
|
a ConvError struct represents conversion errors, or the final converted string.
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/iconv.hpp>
|
||||||
|
|
||||||
|
CharToUtf8 converter("INVALID_ENCODING");
|
||||||
|
// Note: Constructor errors might be detected during conversion
|
||||||
|
|
||||||
|
std::string text = "Hello";
|
||||||
|
auto result = converter.to_utf8(text);
|
||||||
|
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string converted = result.value();
|
||||||
|
// Process successfully converted string
|
||||||
|
} else {
|
||||||
|
// Handle conversion failure
|
||||||
|
std::cout << "Conversion failed\n";
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
*/
|
||||||
|
}
|
||||||
98
doc/src/encoding/stl.dox
Normal file
98
doc/src/encoding/stl.dox
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
namespace yycc::encoding::stl {
|
||||||
|
/**
|
||||||
|
\page encoding__stl STL-based Codec
|
||||||
|
|
||||||
|
\section encoding__stl__overview Overview
|
||||||
|
|
||||||
|
The STL-based encoding conversion module provides cross-platform encoding conversion functionality using the standard library's codecvt facets.
|
||||||
|
This module is designed to handle conversions between UTF-8, UTF-16, and UTF-32 encodings using the standard C++ locale facilities.
|
||||||
|
|
||||||
|
\section encoding__stl__attentions Attentions
|
||||||
|
|
||||||
|
The underlying implementation of this module is deprecated by C++ STL and may be removed in future versions of C++.
|
||||||
|
So please use this module carefully or considering use our \ref pycodec module instead.
|
||||||
|
|
||||||
|
\section encoding__stl__functions Available Functions
|
||||||
|
|
||||||
|
\subsection encoding__stl__functions__utf16 UTF-8 to/from UTF-16 Conversion
|
||||||
|
|
||||||
|
Convert between UTF-8 and UTF-16 encodings using standard library facilities:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/stl.hpp>
|
||||||
|
|
||||||
|
// Example: Converting UTF-8 to UTF-16
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = to_utf16(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u16string utf16_text = result.value();
|
||||||
|
// Use utf16_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-16 to UTF-8
|
||||||
|
std::u16string utf16_text = u"Hello, 世界!";
|
||||||
|
auto result = to_utf8(utf16_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection encoding__stl__functions__utf32 UTF-8 to/from UTF-32 Conversion
|
||||||
|
|
||||||
|
Convert between UTF-8 and UTF-32 encodings:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/stl.hpp>
|
||||||
|
|
||||||
|
// Example: Converting UTF-8 to UTF-32
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界! 🌍";
|
||||||
|
auto result = to_utf32(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u32string utf32_text = result.value();
|
||||||
|
// Use utf32_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-32 to UTF-8
|
||||||
|
std::u32string utf32_text = U"Hello, 世界! 🌍";
|
||||||
|
auto result = to_utf8(utf32_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\section encoding__stl__error_handling Error Handling
|
||||||
|
|
||||||
|
All functions in this module return a result containing either
|
||||||
|
a ConvError struct represents conversion errors, or the final converted string.
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/stl.hpp>
|
||||||
|
|
||||||
|
std::u8string invalid_utf8 = "\xFF\xFE"; // Invalid UTF-8 sequence
|
||||||
|
auto result = to_utf16(invalid_utf8);
|
||||||
|
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u16string converted = result.value();
|
||||||
|
// Process successfully converted string
|
||||||
|
} else {
|
||||||
|
// Handle conversion failure
|
||||||
|
std::cout << "Conversion failed\n";
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
*/
|
||||||
|
}
|
||||||
191
doc/src/encoding/windows.dox
Normal file
191
doc/src/encoding/windows.dox
Normal file
@@ -0,0 +1,191 @@
|
|||||||
|
namespace yycc::encoding::windows {
|
||||||
|
/**
|
||||||
|
\page encoding__windows Win32-based Codec
|
||||||
|
|
||||||
|
\section encoding__windows__overview Overview
|
||||||
|
|
||||||
|
The Windows-specific encoding conversion module provides encoding conversion functionality
|
||||||
|
using Windows API functions such as `WideCharToMultiByte` and `MultiByteToWideChar`.
|
||||||
|
This module is available only on Windows platforms and offers efficient conversion
|
||||||
|
between various character encodings including wide character, multi-byte, and UTF-8.
|
||||||
|
|
||||||
|
\section encoding__windows__functions Available Functions
|
||||||
|
|
||||||
|
\subsection encoding__windows__functions__wchar Wide Character to/from Multi-byte Conversion
|
||||||
|
|
||||||
|
Convert between wide character strings and multi-byte strings using Windows code pages:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/windows.hpp>
|
||||||
|
|
||||||
|
// Example: Converting wide character string to multi-byte with specific code page
|
||||||
|
std::wstring wide_text = L"Hello, 世界!";
|
||||||
|
auto result = to_char(wide_text, CP_UTF8); // Using UTF-8 code page
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::string multi_byte_text = result.value();
|
||||||
|
// Use multi_byte_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting multi-byte string to wide character with specific code page
|
||||||
|
std::string multi_byte_text = "Hello, 世界!";
|
||||||
|
auto result = to_wchar(multi_byte_text, CP_UTF8);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::wstring wide_text = result.value();
|
||||||
|
// Use wide_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection encoding__windows__functions__mbcs Multi-byte to/from Multi-byte Conversion
|
||||||
|
|
||||||
|
Convert between different multi-byte encodings by using wide character as an intermediate:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/windows.hpp>
|
||||||
|
|
||||||
|
// Example: Converting between two different code pages
|
||||||
|
std::string source_text = "Hello, world!";
|
||||||
|
auto result = to_char(source_text, CP_ACP, CP_UTF8); // ANSI to UTF-8
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::string utf8_text = result.value();
|
||||||
|
// Use converted UTF-8 text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection encoding__windows__functions__utf8 UTF-8 Specific Conversions
|
||||||
|
|
||||||
|
Specialized functions for UTF-8 conversion without requiring explicit code page specification:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/windows.hpp>
|
||||||
|
|
||||||
|
// Example: Converting wide character to UTF-8
|
||||||
|
std::wstring wide_text = L"Hello, 世界!";
|
||||||
|
auto result = to_utf8(wide_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-8 to wide character
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = to_wchar(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::wstring wide_text = result.value();
|
||||||
|
// Use wide_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting multi-byte to UTF-8
|
||||||
|
std::string multi_byte_text = "Hello, world!";
|
||||||
|
auto result = to_utf8(multi_byte_text, CP_ACP);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-8 to multi-byte
|
||||||
|
std::u8string utf8_text = u8"Hello, world!";
|
||||||
|
auto result = to_char(utf8_text, CP_ACP);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::string multi_byte_text = result.value();
|
||||||
|
// Use multi_byte_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\subsection encoding__windows__functions__utf16_utf32 UTF-8 to/from UTF-16/UTF-32 Conversion
|
||||||
|
|
||||||
|
Available on Windows with Microsoft STL for conversion between UTF encodings:
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/windows.hpp>
|
||||||
|
|
||||||
|
// Example: Converting UTF-8 to UTF-16
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界!";
|
||||||
|
auto result = to_utf16(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u16string utf16_text = result.value();
|
||||||
|
// Use utf16_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-16 to UTF-8
|
||||||
|
std::u16string utf16_text = u"Hello, 世界!";
|
||||||
|
auto result = to_utf8(utf16_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-8 to UTF-32
|
||||||
|
std::u8string utf8_text = u8"Hello, 世界! 🌍";
|
||||||
|
auto result = to_utf32(utf8_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u32string utf32_text = result.value();
|
||||||
|
// Use utf32_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\code
|
||||||
|
// Example: Converting UTF-32 to UTF-8
|
||||||
|
std::u32string utf32_text = U"Hello, 世界! 🌍";
|
||||||
|
auto result = to_utf8(utf32_text);
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::u8string utf8_text = result.value();
|
||||||
|
// Use utf8_text...
|
||||||
|
} else {
|
||||||
|
// Handle conversion error
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\section encoding__windows__error_handling Error Handling
|
||||||
|
|
||||||
|
All functions in this module return a result containing either
|
||||||
|
a ConvError struct represents conversion errors, or the final converted string.
|
||||||
|
|
||||||
|
\code
|
||||||
|
#include <yycc/encoding/windows.hpp>
|
||||||
|
|
||||||
|
std::wstring invalid_text = /* some problematic string */;
|
||||||
|
auto result = to_char(invalid_text, CP_UTF8);
|
||||||
|
|
||||||
|
if (result.has_value()) {
|
||||||
|
std::string converted = result.value();
|
||||||
|
// Process successfully converted string
|
||||||
|
} else {
|
||||||
|
// Handle conversion failure
|
||||||
|
std::cout << "Conversion failed\n";
|
||||||
|
}
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
*/
|
||||||
|
}
|
||||||
@@ -45,6 +45,16 @@
|
|||||||
|
|
||||||
\li \subpage patch
|
\li \subpage patch
|
||||||
|
|
||||||
|
<B>Text Encoding</B>
|
||||||
|
|
||||||
|
\li \subpage encoding__stl
|
||||||
|
|
||||||
|
\li \subpage encoding__windows
|
||||||
|
|
||||||
|
\li \subpage encoding__iconv
|
||||||
|
|
||||||
|
\li \subpage pycodec
|
||||||
|
|
||||||
</TD>
|
</TD>
|
||||||
<TD ALIGN="LEFT" VALIGN="TOP">
|
<TD ALIGN="LEFT" VALIGN="TOP">
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user