doc/src/string/string_reinterpret.dox

namespace yycc {
/**

\page string_reinterpret String Reinterpret

Now, you have know that we use UTF8 string everywhere in this project
as we introduced in \ref premise_and_principle__string_encoding.
Now it's time to know how to fetch UTF8 string from user or anywhere else.


\section string_reinterpret__utf8_type UTF8 Type

After upgrade the whole project into C++23, \c char8_t is the only valid UTF8 char type.
And \c std::u8string and \c std::u8string_view are the only valid UTF8 string container and viewer.
Additionally, \c u8 string literal prefix is the only way to create UTF8 string literal.

All in all, please use this library provided string functions with UTF8 format.

\section string_reinterpret__concept Concepts

In following content, you may be face with 2 words: ordinary string and UTF8 string.

UTF8 string, as its name, is the string encoded with UTF8.
The char type of it must is \c yycc_char8_t.
(equivalent to \c char8_t after C++ 20.)

Ordinary string means the plain, native string.
The result of C++ string literal without any prefix \c "foo bar" is a rdinary string.
The char type of it is \c char.
Its encoding depends on compiler and environment.
(UTF8 in Linux, or system code page in Windows if UTF8 switch was not enabled in MSVC.)

For more infomation, please browse CppReference:
https://en.cppreference.com/w/cpp/language/string_literal

\section string_reinterpret__utf8_literal UTF8 Literal

String literal is a C++ concept.
If you are not familar with it, please browse related article first, such as CppReference.

\subsection string_reinterpret__utf8_literal__single Single Literal

In short words, YYCC allow you declare an UTF8 literal like this:

\code
YYCC_U8("This is UTF8 literal.")
\endcode

YYCC_U8 is macro.
You don't need add extra \c u8 prefix in string given to the macro.
This macro will do this automatically.

In detail, this macro do a \c reinterpret_cast to change the type of given argument to \c const \c yycc_char8_t* forcely.
This ensure that declared UTF8 literal is compatible with YYCC UTF8 types.

\subsection string_reinterpret__utf8_literal__char Single Char

Same as UTF8 literal, YYCC allow you cast normal \c char into \c yycc_char8_t as following code:

\code
YYCC_U8_CHAR('A')
\endcode

YYCC_U8_CHAR is a macro.
It just simply use \c static_cast to cast given value to \c yycc_char8_t.
It doesn't mean that you can cast non-ASCII characters,
because the space these characters occupied usually more than the maximum value of \c char.
For example, following code is \b invalid:

\code
YYCC_U8_CHAR('文') // INVALID!
\endcode

\subsection string_reinterpret__utf8_literal__concatenation Literal Concatenation

YYCC_U8 macro also works for string literal concatenation:

\code
YYCC_U8("Error code: " PRIu32 ". Please contact me.");
\endcode

According to C++ standard for string literal concatenation, 
<I>"If one of the strings has an encoding prefix and the other does not, the one that does not will be considered to have the same encoding prefix as the other."</I>
At the same time, YYCC_U8 macro will automatically add \c u8 prefix for the first component of this string literal concatenation.
So the whole string will be UTF8 literal.
It also order you should \b not add any prefix for other components of this string literal concatenation.

\subsection string_reinterpret__utf8_literal__why Why?

You may know that C++ standard allows programmer declare an UTF8 literal explicitly by writing code like this:

\code
u8"foo bar"
\endcode

This is okey. But it may incompatible with YYCC UTF8 char type.
According to C++ standard, this UTF8 literal syntax will only return \c const \c char8_t* if your C++ standard higher or equal to C++ 20,
otherwise it will return \c const \c char*.
This behavior cause that you can not assign this UTF8 literal to \c yycc_u8string if you are in the environment which do not support \c char8_t, 
because their types are different.
Thereas you can not use the functions provided by this library because they are all use YYCC defined UTF8 char type.

\section string_reinterpret__utf8_pointer UTF8 String Pointer

String pointer means the raw pointer pointing to a string, such as \c const \c char*, \c char*, \c char32_t* and etc.

Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c yycc_char8_t.
YYCC provides YYCC::EncodingHelper::ToUTF8 to resolve this issue. There is an exmaple:

\code
const char* absolutely_is_utf8 = "I confirm this is encoded with UTF8.";
const yycc_char8_t* converted = YYCC::EncodingHelper::ToUTF8(absolutely_is_utf8);

char* mutable_utf8 = const_cast<char*>(absolutely_is_utf8); // This is not safe. Just for example.
yycc_char8_t* mutable_converted = YYCC::EncodingHelper::ToUTF8(mutable_utf8);
\endcode

YYCC::EncodingHelper::ToUTF8 has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.

YYCC also has ability that convert YYCC UTF8 char type to ordinary char type by YYCC::EncodingHelper::ToOrdinary.
Here is an exmaple:

\code
const yycc_char8_t* yycc_utf8 = YYCC_U8("I am UTF8 string.");
const char* converted = YYCC::EncodingHelper::ToOrdinary(yycc_utf8);

yycc_char8_t* mutable_yycc_utf8 = const_cast<char*>(yycc_utf8); // Not safe. Also just for example.
char* mutable_converted = YYCC::EncodingHelper::ToOrdinary(mutable_yycc_utf8);
\endcode

Same as YYCC::EncodingHelper::ToUTF8, YYCC::EncodingHelper::ToOrdinary also has 2 overloads to handle constant and mutable string pointer.

\section string_reinterpret__utf8_container UTF8 String Container

String container usually means the standard library string container, such as \c std::string, \c std::wstring, \c std::u32string and etc.

In many personal project, programmer may use \c std::string everywhere because \c std::u8string may not be presented when writing peoject.
How to do convertion between ordinary string container and YYCC UTF8 string container?
It is definitely illegal that directly do force convertion. Because they may have different class layout.
Calm down and I will tell you how to do correct convertion.
YYCC provides YYCC::EncodingHelper::ToUTF8 to convert ordinary string container to YYCC UTF8 string container.
There is an exmaple:

\code
std::string ordinary_string("I am UTF8");
yycc_u8string yycc_string = YYCC::EncodingHelper::ToUTF8(ordinary_string);
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
\endcode

Actually, YYCC::EncodingHelper::ToUTF8 accepts a reference to \c std::string_view as argument.
However, there is a implicit convertion from \c std::string to \c std::string_view, 
so you can directly pass a \c std::string instance to it.

String view will reduce unnecessary memory copy.
If you just want to pass ordinary string container to function, and this function accepts \c yycc_u8string_view as its argument,
you can use alternative YYCC::EncodingHelper::ToUTF8View.

\code
std::string ordinary_string("I am UTF8");
yycc_u8string_view yycc_string = YYCC::EncodingHelper::ToUTF8View(ordinary_string);
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
\endcode

Comparing with previous one, this example use less memory.
The reduced memory is the content of \c yycc_string because string view is a view, not the copy of original string.

Same as UTF8 string pointer, we also have YYCC::EncodingHelper::ToOrdinary and YYCC::EncodingHelper::ToOrdinaryView do correspondant reverse convertion.
Try to do your own research and figure out how to use them.
It's pretty easy.

\section string_reinterpret__windows Warnings to Windows Programmer

Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
If you run the convertion code introduced in this article with the string which is not encoded with UTF8, it may cause undefined behavior.

To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
Thus you can use the functions introduced in this article safely.
Otherwise, you must guarteen that the argument you provided to these functions is encoded by UTF8 manually.

Linux user do not need care this.
Because almost Linux distro use UTF8 in default.

*/
}
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`namespace yycc {`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`/**`

refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\page string_reinterpret String Reinterpret`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`Now, you have know that we use UTF8 string everywhere in this project`
			`as we introduced in \ref premise_and_principle__string_encoding.`
			`Now it's time to know how to fetch UTF8 string from user or anywhere else.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00

refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\section string_reinterpret__utf8_type UTF8 Type`
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`After upgrade the whole project into C++23, \c char8_t is the only valid UTF8 char type.`
			`And \c std::u8string and \c std::u8string_view are the only valid UTF8 string container and viewer.`
			`Additionally, \c u8 string literal prefix is the only way to create UTF8 string literal.`
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`All in all, please use this library provided string functions with UTF8 format.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\section string_reinterpret__concept Concepts`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00
			`In following content, you may be face with 2 words: ordinary string and UTF8 string.`

			`UTF8 string, as its name, is the string encoded with UTF8.`
			`The char type of it must is \c yycc_char8_t.`
			`(equivalent to \c char8_t after C++ 20.)`

			`Ordinary string means the plain, native string.`
			`The result of C++ string literal without any prefix \c "foo bar" is a rdinary string.`
			`The char type of it is \c char.`
			`Its encoding depends on compiler and environment.`
			`(UTF8 in Linux, or system code page in Windows if UTF8 switch was not enabled in MSVC.)`

			`For more infomation, please browse CppReference:`
			`https://en.cppreference.com/w/cpp/language/string_literal`

refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\section string_reinterpret__utf8_literal UTF8 Literal`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00			`String literal is a C++ concept.`
			`If you are not familar with it, please browse related article first, such as CppReference.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\subsection string_reinterpret__utf8_literal__single Single Literal`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00			`In short words, YYCC allow you declare an UTF8 literal like this:`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
			`\code`
			`YYCC_U8("This is UTF8 literal.")`
			`\endcode`

feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00			`YYCC_U8 is macro.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`You don't need add extra \c u8 prefix in string given to the macro.`
			`This macro will do this automatically.`

			`In detail, this macro do a \c reinterpret_cast to change the type of given argument to \c const \c yycc_char8_t* forcely.`
doc: update documentation 2024-06-28 11:38:19 +08:00			`This ensure that declared UTF8 literal is compatible with YYCC UTF8 types.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\subsection string_reinterpret__utf8_literal__char Single Char`
doc: add documentation for win fct helper - add documentation for win fct helper - add new macro YYCC_U8_CHAR for casting ordinary char to yycc utf8 char. - add documentation for new added YYCC_U8_CHAR. 2024-07-13 12:58:49 +08:00
			`Same as UTF8 literal, YYCC allow you cast normal \c char into \c yycc_char8_t as following code:`

			`\code`
			`YYCC_U8_CHAR('A')`
			`\endcode`

			`YYCC_U8_CHAR is a macro.`
			`It just simply use \c static_cast to cast given value to \c yycc_char8_t.`
			`It doesn't mean that you can cast non-ASCII characters,`
			`because the space these characters occupied usually more than the maximum value of \c char.`
			`For example, following code is \b invalid:`

			`\code`
			`YYCC_U8_CHAR('文') // INVALID!`
			`\endcode`

refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\subsection string_reinterpret__utf8_literal__concatenation Literal Concatenation`
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00
			`YYCC_U8 macro also works for string literal concatenation:`

			`\code`
			`YYCC_U8("Error code: " PRIu32 ". Please contact me.");`
			`\endcode`

			`According to C++ standard for string literal concatenation,`
			`<I>"If one of the strings has an encoding prefix and the other does not, the one that does not will be considered to have the same encoding prefix as the other."</I>`
			`At the same time, YYCC_U8 macro will automatically add \c u8 prefix for the first component of this string literal concatenation.`
			`So the whole string will be UTF8 literal.`
			`It also order you should \b not add any prefix for other components of this string literal concatenation.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\subsection string_reinterpret__utf8_literal__why Why?`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00			`You may know that C++ standard allows programmer declare an UTF8 literal explicitly by writing code like this:`

			`\code`
			`u8"foo bar"`
			`\endcode`

			`This is okey. But it may incompatible with YYCC UTF8 char type.`
			`According to C++ standard, this UTF8 literal syntax will only return \c const \c char8_t* if your C++ standard higher or equal to C++ 20,`
			`otherwise it will return \c const \c char*.`
			`This behavior cause that you can not assign this UTF8 literal to \c yycc_u8string if you are in the environment which do not support \c char8_t,`
			`because their types are different.`
			`Thereas you can not use the functions provided by this library because they are all use YYCC defined UTF8 char type.`

refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\section string_reinterpret__utf8_pointer UTF8 String Pointer`
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00
			`String pointer means the raw pointer pointing to a string, such as \c const \c char, \c char, \c char32_t* and etc.`

			`Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c yycc_char8_t.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`YYCC provides YYCC::EncodingHelper::ToUTF8 to resolve this issue. There is an exmaple:`

			`\code`
			`const char* absolutely_is_utf8 = "I confirm this is encoded with UTF8.";`
			`const yycc_char8_t* converted = YYCC::EncodingHelper::ToUTF8(absolutely_is_utf8);`

			`char* mutable_utf8 = const_cast<char*>(absolutely_is_utf8); // This is not safe. Just for example.`
			`yycc_char8_t* mutable_converted = YYCC::EncodingHelper::ToUTF8(mutable_utf8);`
			`\endcode`

refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`YYCC::EncodingHelper::ToUTF8 has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`YYCC also has ability that convert YYCC UTF8 char type to ordinary char type by YYCC::EncodingHelper::ToOrdinary.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`Here is an exmaple:`

			`\code`
			`const yycc_char8_t* yycc_utf8 = YYCC_U8("I am UTF8 string.");`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`const char* converted = YYCC::EncodingHelper::ToOrdinary(yycc_utf8);`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
			`yycc_char8_t* mutable_yycc_utf8 = const_cast<char*>(yycc_utf8); // Not safe. Also just for example.`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`char* mutable_converted = YYCC::EncodingHelper::ToOrdinary(mutable_yycc_utf8);`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`\endcode`

refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`Same as YYCC::EncodingHelper::ToUTF8, YYCC::EncodingHelper::ToOrdinary also has 2 overloads to handle constant and mutable string pointer.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\section string_reinterpret__utf8_container UTF8 String Container`
feat: add new split function reducing memory cost. - add a new split function, SplitView which can reduce cost memory by using string view. - add a new testbench for split function for testing empty source string. - add documentation for some string helper function. - improve library encoding documentation. 2024-06-29 17:39:13 +08:00
			`String container usually means the standard library string container, such as \c std::string, \c std::wstring, \c std::u32string and etc.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
doc: update documentation 2024-06-28 11:38:19 +08:00			`In many personal project, programmer may use \c std::string everywhere because \c std::u8string may not be presented when writing peoject.`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`How to do convertion between ordinary string container and YYCC UTF8 string container?`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`It is definitely illegal that directly do force convertion. Because they may have different class layout.`
			`Calm down and I will tell you how to do correct convertion.`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`YYCC provides YYCC::EncodingHelper::ToUTF8 to convert ordinary string container to YYCC UTF8 string container.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`There is an exmaple:`

			`\code`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`std::string ordinary_string("I am UTF8");`
			`yycc_u8string yycc_string = YYCC::EncodingHelper::ToUTF8(ordinary_string);`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);`
			`\endcode`

doc: update documentation 2024-06-28 11:38:19 +08:00			`Actually, YYCC::EncodingHelper::ToUTF8 accepts a reference to \c std::string_view as argument.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`However, there is a implicit convertion from \c std::string to \c std::string_view,`
			`so you can directly pass a \c std::string instance to it.`

			`String view will reduce unnecessary memory copy.`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`If you just want to pass ordinary string container to function, and this function accepts \c yycc_u8string_view as its argument,`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`you can use alternative YYCC::EncodingHelper::ToUTF8View.`

			`\code`
refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`std::string ordinary_string("I am UTF8");`
			`yycc_u8string_view yycc_string = YYCC::EncodingHelper::ToUTF8View(ordinary_string);`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00			`auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);`
			`\endcode`

			`Comparing with previous one, this example use less memory.`
			`The reduced memory is the content of \c yycc_string because string view is a view, not the copy of original string.`

refactor: rename Native String to Ordinary String. - rename Native to Ordinary in code and documentation. - fulfill some documentations. 2024-07-05 10:36:24 +08:00			`Same as UTF8 string pointer, we also have YYCC::EncodingHelper::ToOrdinary and YYCC::EncodingHelper::ToOrdinaryView do correspondant reverse convertion.`
doc: update documentation 2024-06-28 11:38:19 +08:00			`Try to do your own research and figure out how to use them.`
			`It's pretty easy.`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
refactor: refactor doc layout 2025-12-25 15:12:29 +08:00			`\section string_reinterpret__windows Warnings to Windows Programmer`
doc: add documentation about library encoding. 2024-06-27 23:20:56 +08:00
			`Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.`
			`If you run the convertion code introduced in this article with the string which is not encoded with UTF8, it may cause undefined behavior.`

			`To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.`
			`Thus you can use the functions introduced in this article safely.`
			`Otherwise, you must guarteen that the argument you provided to these functions is encoded by UTF8 manually.`

			`Linux user do not need care this.`
			`Because almost Linux distro use UTF8 in default.`

			`*/`
doc: update documentation - use namespace bracket all content in documentation to reduce useless namespace prefix. - change the argument type of AbstractSetting and CoreManager to yycc_u8string_view instead of const yycc_char8_t*. - throw exception if given setting name is invalid in ConfigManager, instead of slient fallback. 2024-07-31 14:14:38 +08:00			`}`