- use namespace bracket all content in documentation to reduce useless namespace prefix. - change the argument type of AbstractSetting and CoreManager to yycc_u8string_view instead of const yycc_char8_t*. - throw exception if given setting name is invalid in ConfigManager, instead of slient fallback.
228 lines
10 KiB
Plaintext
228 lines
10 KiB
Plaintext
namespace YYCC {
|
|
/**
|
|
|
|
\page library_encoding Library Encoding
|
|
|
|
Before using this library, you should know the encoding strategy of this library first.
|
|
In short words, this library use UTF8 encoding everywhere except some special cases,
|
|
for example, function explicitly order the encoding of input parameters.
|
|
|
|
In following content of this article, you will know the details about how we use UTF8 in this library.
|
|
|
|
\section library_encoding__utf8_type UTF8 Type
|
|
|
|
YYCC uses custom UTF8 char type, string container and string view all over the library, from parameters to return value.
|
|
Following content will introduce how we define them.
|
|
|
|
\subsection library_encoding__utf8_type__char_type Char Type
|
|
|
|
YYCC library has its own UTF8 char type, \c yycc_char8_t.
|
|
This is how we define it:
|
|
|
|
\code
|
|
#if defined(__cpp_char8_t)
|
|
using yycc_char8_t = char8_t;
|
|
#else
|
|
using yycc_char8_t = unsigned char;
|
|
#endif
|
|
\endcode
|
|
|
|
If your environment (higher or equal to C++ 20) supports \c char8_t provided by standard library, \c yycc_char8_t is just an alias to \c char8_t,
|
|
otherwise (lower than C++ 20, e.g. C++ 17), \c yycc_char8_t will be defined as \c unsigned \c char like C++ 20 does (this can be seen as a polyfill).
|
|
|
|
This means that if you already have used \c char8_t provided by standard library,
|
|
you do not need to do any extra modification before using this library.
|
|
Because all types are compatible.
|
|
|
|
\subsection library_encoding__utf8_type__container_type String Container and View
|
|
|
|
We define string container and string view like this:
|
|
|
|
\code
|
|
using yycc_u8string = std::basic_string<yycc_char8_t>;
|
|
using yycc_u8string_view = std::basic_string_view<yycc_char8_t>;
|
|
\endcode
|
|
|
|
The real code written in library may be slightly different with this but they have same meanings.
|
|
|
|
In \c char8_t environment, they are just the alias to \c std::u8string and \c std::u8string_view respectively.
|
|
So if you have already used them, no need to any modification for your code before using this library.
|
|
|
|
\subsection library_encoding__utf8_type__why Why?
|
|
|
|
You may curious why I create a new UTF8 char type, rather than using standard library UTF8 char type directly. There are 2 reasons.
|
|
|
|
First, It was too late that I notice I can use standard library UTF8 char type.
|
|
My UTF8 char type has been used in library everywhere and its tough to fully replace them into standard library UTF8 char type.
|
|
|
|
Second, UTF8 related content of standard library is \e volatile.
|
|
I notice standard library change UTF8 related functions frequently and its API are not stable.
|
|
For example, standard library brings \c std::codecvt_utf8 in C++ 11, deprecate it in C++ 17 and even remove it in C++ 26.
|
|
That's unacceptable! So I create my own UTF8 type to avoid the scenario that standard library remove \c char8_t in future.
|
|
|
|
\section library_encoding__concept Concepts
|
|
|
|
In following content, you may be face with 2 words: ordinary string and UTF8 string.
|
|
|
|
UTF8 string, as its name, is the string encoded with UTF8.
|
|
The char type of it must is \c yycc_char8_t.
|
|
(equivalent to \c char8_t after C++ 20.)
|
|
|
|
Ordinary string means the plain, native string.
|
|
The result of C++ string literal without any prefix \c "foo bar" is a rdinary string.
|
|
The char type of it is \c char.
|
|
Its encoding depends on compiler and environment.
|
|
(UTF8 in Linux, or system code page in Windows if UTF8 switch was not enabled in MSVC.)
|
|
|
|
For more infomation, please browse CppReference:
|
|
https://en.cppreference.com/w/cpp/language/string_literal
|
|
|
|
\section library_encoding__utf8_literal UTF8 Literal
|
|
|
|
String literal is a C++ concept.
|
|
If you are not familar with it, please browse related article first, such as CppReference.
|
|
|
|
\subsection library_encoding__utf8_literal__single Single Literal
|
|
|
|
In short words, YYCC allow you declare an UTF8 literal like this:
|
|
|
|
\code
|
|
YYCC_U8("This is UTF8 literal.")
|
|
\endcode
|
|
|
|
YYCC_U8 is macro.
|
|
You don't need add extra \c u8 prefix in string given to the macro.
|
|
This macro will do this automatically.
|
|
|
|
In detail, this macro do a \c reinterpret_cast to change the type of given argument to \c const \c yycc_char8_t* forcely.
|
|
This ensure that declared UTF8 literal is compatible with YYCC UTF8 types.
|
|
|
|
\subsection library_encoding__utf8_literal__char Single Char
|
|
|
|
Same as UTF8 literal, YYCC allow you cast normal \c char into \c yycc_char8_t as following code:
|
|
|
|
\code
|
|
YYCC_U8_CHAR('A')
|
|
\endcode
|
|
|
|
YYCC_U8_CHAR is a macro.
|
|
It just simply use \c static_cast to cast given value to \c yycc_char8_t.
|
|
It doesn't mean that you can cast non-ASCII characters,
|
|
because the space these characters occupied usually more than the maximum value of \c char.
|
|
For example, following code is \b invalid:
|
|
|
|
\code
|
|
YYCC_U8_CHAR('文') // INVALID!
|
|
\endcode
|
|
|
|
\subsection library_encoding__utf8_literal__concatenation Literal Concatenation
|
|
|
|
YYCC_U8 macro also works for string literal concatenation:
|
|
|
|
\code
|
|
YYCC_U8("Error code: " PRIu32 ". Please contact me.");
|
|
\endcode
|
|
|
|
According to C++ standard for string literal concatenation,
|
|
<I>"If one of the strings has an encoding prefix and the other does not, the one that does not will be considered to have the same encoding prefix as the other."</I>
|
|
At the same time, YYCC_U8 macro will automatically add \c u8 prefix for the first component of this string literal concatenation.
|
|
So the whole string will be UTF8 literal.
|
|
It also order you should \b not add any prefix for other components of this string literal concatenation.
|
|
|
|
\subsection library_encoding__utf8_literal__why Why?
|
|
|
|
You may know that C++ standard allows programmer declare an UTF8 literal explicitly by writing code like this:
|
|
|
|
\code
|
|
u8"foo bar"
|
|
\endcode
|
|
|
|
This is okey. But it may incompatible with YYCC UTF8 char type.
|
|
According to C++ standard, this UTF8 literal syntax will only return \c const \c char8_t* if your C++ standard higher or equal to C++ 20,
|
|
otherwise it will return \c const \c char*.
|
|
This behavior cause that you can not assign this UTF8 literal to \c yycc_u8string if you are in the environment which do not support \c char8_t,
|
|
because their types are different.
|
|
Thereas you can not use the functions provided by this library because they are all use YYCC defined UTF8 char type.
|
|
|
|
\section library_encoding__utf8_pointer UTF8 String Pointer
|
|
|
|
String pointer means the raw pointer pointing to a string, such as \c const \c char*, \c char*, \c char32_t* and etc.
|
|
|
|
Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c yycc_char8_t.
|
|
YYCC provides YYCC::EncodingHelper::ToUTF8 to resolve this issue. There is an exmaple:
|
|
|
|
\code
|
|
const char* absolutely_is_utf8 = "I confirm this is encoded with UTF8.";
|
|
const yycc_char8_t* converted = YYCC::EncodingHelper::ToUTF8(absolutely_is_utf8);
|
|
|
|
char* mutable_utf8 = const_cast<char*>(absolutely_is_utf8); // This is not safe. Just for example.
|
|
yycc_char8_t* mutable_converted = YYCC::EncodingHelper::ToUTF8(mutable_utf8);
|
|
\endcode
|
|
|
|
YYCC::EncodingHelper::ToUTF8 has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.
|
|
|
|
YYCC also has ability that convert YYCC UTF8 char type to ordinary char type by YYCC::EncodingHelper::ToOrdinary.
|
|
Here is an exmaple:
|
|
|
|
\code
|
|
const yycc_char8_t* yycc_utf8 = YYCC_U8("I am UTF8 string.");
|
|
const char* converted = YYCC::EncodingHelper::ToOrdinary(yycc_utf8);
|
|
|
|
yycc_char8_t* mutable_yycc_utf8 = const_cast<char*>(yycc_utf8); // Not safe. Also just for example.
|
|
char* mutable_converted = YYCC::EncodingHelper::ToOrdinary(mutable_yycc_utf8);
|
|
\endcode
|
|
|
|
Same as YYCC::EncodingHelper::ToUTF8, YYCC::EncodingHelper::ToOrdinary also has 2 overloads to handle constant and mutable string pointer.
|
|
|
|
\section library_encoding__utf8_container UTF8 String Container
|
|
|
|
String container usually means the standard library string container, such as \c std::string, \c std::wstring, \c std::u32string and etc.
|
|
|
|
In many personal project, programmer may use \c std::string everywhere because \c std::u8string may not be presented when writing peoject.
|
|
How to do convertion between ordinary string container and YYCC UTF8 string container?
|
|
It is definitely illegal that directly do force convertion. Because they may have different class layout.
|
|
Calm down and I will tell you how to do correct convertion.
|
|
YYCC provides YYCC::EncodingHelper::ToUTF8 to convert ordinary string container to YYCC UTF8 string container.
|
|
There is an exmaple:
|
|
|
|
\code
|
|
std::string ordinary_string("I am UTF8");
|
|
yycc_u8string yycc_string = YYCC::EncodingHelper::ToUTF8(ordinary_string);
|
|
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
|
|
\endcode
|
|
|
|
Actually, YYCC::EncodingHelper::ToUTF8 accepts a reference to \c std::string_view as argument.
|
|
However, there is a implicit convertion from \c std::string to \c std::string_view,
|
|
so you can directly pass a \c std::string instance to it.
|
|
|
|
String view will reduce unnecessary memory copy.
|
|
If you just want to pass ordinary string container to function, and this function accepts \c yycc_u8string_view as its argument,
|
|
you can use alternative YYCC::EncodingHelper::ToUTF8View.
|
|
|
|
\code
|
|
std::string ordinary_string("I am UTF8");
|
|
yycc_u8string_view yycc_string = YYCC::EncodingHelper::ToUTF8View(ordinary_string);
|
|
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
|
|
\endcode
|
|
|
|
Comparing with previous one, this example use less memory.
|
|
The reduced memory is the content of \c yycc_string because string view is a view, not the copy of original string.
|
|
|
|
Same as UTF8 string pointer, we also have YYCC::EncodingHelper::ToOrdinary and YYCC::EncodingHelper::ToOrdinaryView do correspondant reverse convertion.
|
|
Try to do your own research and figure out how to use them.
|
|
It's pretty easy.
|
|
|
|
\section library_encoding__windows Warnings to Windows Programmer
|
|
|
|
Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
|
|
If you run the convertion code introduced in this article with the string which is not encoded with UTF8, it may cause undefined behavior.
|
|
|
|
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
|
|
Thus you can use the functions introduced in this article safely.
|
|
Otherwise, you must guarteen that the argument you provided to these functions is encoded by UTF8 manually.
|
|
|
|
Linux user do not need care this.
|
|
Because almost Linux distro use UTF8 in default.
|
|
|
|
*/
|
|
} |