refactor: refactor doc layout

2025-12-25 15:12:29 +08:00
parent 45f32297da
commit 337734d340
18 changed files with 52 additions and 137 deletions
--- a/doc/src/string/string_helper.dox
+++ b/doc/src/string/string_helper.dox
@@ -0,0 +1,149 @@
+namespace YYCC::StringHelper {
+/**
+
+\page string_helper String Helper
+
+\section string_helper__printf Printf VPrintf
+
+YYCC::StringHelper provides 4 functions for formatting string.
+These functions are mainly provided to programmer who can not use C++ 20 \c std::format feature.
+
+\code
+bool Printf(yycc_u8string&, const yycc_char8_t*, ...);
+bool VPrintf(yycc_u8string&, const yycc_char8_t*, va_list argptr);
+yycc_u8string Printf(const yycc_char8_t*, ...);
+yycc_u8string VPrintf(const yycc_char8_t*, va_list argptr);
+\endcode
+
+#Printf and #VPrintf is similar to \c std::sprintf and \c std::vsprintf.
+#Printf accepts UTF8 format string and variadic arguments specifying data to print.
+This is commonly used by programmer.
+However, #VPrintf also do the same work but its second argument is \c va_list, 
+the representation of variadic arguments.
+It is mostly used by other function which has variadic arguments.
+
+The only difference between these function and standard library functions is 
+that you don't need to worry about whether the space of given buffer is enough,
+because these functions help you to calculate this internally.
+
+There is the same design like we introduced in \ref encoding_helper.
+There are 2 overloads for #Printf and #VPrintf respectively.
+First overload return bool value and require a string container as argument for storing result.
+The second overload return result string directly.
+As you expected, first overload will return false if fail to format string (this is barely happened).
+and second overload will return empty string when formatter failed.
+
+\section string_helper__replace Replace
+
+YYCC::StringHelper provide 2 functions for programmer do string replacement:
+
+\code
+void Replace(yycc_u8string&, const yycc_u8string_view&, const yycc_u8string_view&);
+yycc_u8string Replace(const yycc_u8string_view&, const yycc_u8string_view&, const yycc_u8string_view&);
+\endcode
+
+The first overload will do replacement in given string container directly.
+The second overload will produce a copy of original string and do replacement on the copied string.
+
+#Replace has special treatments for following scenarios:
+
+\li If given string is empty, the return value will be empty.
+\li If the character sequence to be replaced is empty string, no replacement will happen.
+\li If the character sequence will be replaced into string is or empty, it will simply delete found character sequence from given string.
+
+\section string_helper__join Join
+
+YYCC::StringHelper provide an universal way for joining string and various specialized join functions.
+
+\subsection string_helper__join__universal Universal Join Function
+
+Because C++ list types are various.
+There is no unique and convenient way to create an universal join function.
+So we create #JoinDataProvider to describe join context.
+
+Before using universal join function,
+you should setup #JoinDataProvider first, the context of join function.
+It actually is an \c std::function object which can be easily fetched by C++ lambda syntax.
+This function pointer accept a reference to \c yycc_u8string_view,
+programmer should set it to the string to be joined when at each calling.
+And this function pointer return a bool value to indicate the end of join.
+You can simply return \c false to terminate join process.
+The argument you assigned to argument will not be taken into join process when you return false.
+
+Then, you can pass the created #JoinDataProvider object to #Join function.
+And specify delimiter at the same time.
+Then you can get the final joined string.
+There is an example:
+
+\code
+std::vector<yycc_u8string> data {
+    YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
+};
+auto iter = data.cbegin();
+auto stop = data.cend();
+auto joined_string = YYCC::StringHelper::Join(
+    [&iter, &stop](yycc_u8string_view& view) -> bool {
+        if (iter == stop) return false;
+        view = *iter;
+        ++iter;
+        return true;
+    }, 
+    delimiter
+);
+\endcode
+
+\subsection string_helper__join__specialized Specialized Join Function
+
+Despite universal join function, 
+YYCC::StringHelper also provide a specialized join functions for standard library container.
+For example, the code written above can be written in following code by using this specialized overload.
+The first two argument is just the begin and end iterator.
+However, you must make sure that we can dereference it and then implicitly convert it to yycc_u8string_view.
+Otherwise this overload will throw template error.
+
+\code
+std::vector<yycc_u8string> data {
+    YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
+};
+auto joined_string = YYCC::StringHelper::Join(data.begin(), data.end(), delimiter);
+\endcode
+
+\section string_helper__lower_upper Lower Upper
+
+String helper provides Python-like string lower and upper function.
+Both lower and upper function have 2 overloads:
+
+\code
+yycc_u8string Lower(const yycc_u8string_view&);
+void Lower(yycc_u8string&);
+\endcode
+
+First overload accepts a string view as argument and return a \b copy whose content are all the lower case of original string.
+Second overload accepts a mutable string container as argument and will make all characters stored in it become their lower case.
+You can choose on of them for your flavor and requirements.
+Upper also has similar 2 overloads.
+
+\section string_helper__split Split
+
+String helper provides Python-like string split function.
+It has 2 types for you:
+
+\code
+std::vector<yycc_u8string> Split(const yycc_u8string_view&, const yycc_u8string_view&);
+std::vector<yycc_u8string_view> SplitView(const yycc_u8string_view&, const yycc_u8string_view&);
+\endcode
+
+All these overloads take a string view as the first argument representing the string need to be split.
+The second argument is a string view representing the delimiter for splitting.
+The only difference between these 2 split function are overt according to their names.
+The first split function will return a list of copied string as its split result.
+The second split function will return a list of string view as its split result,
+and it will keep valid as long as the life time of your given string view argument.
+It also means that the last overload will cost less memory if you don't need the copy of original string.
+
+If the source string (the string need to be split) is empty, or the delimiter is empty,
+the result will only has 1 item and this item is source string itself.
+There is no way that these methods return an empty list, except the code is buggy.
+
+*/
+}
--- a/doc/src/string/string_reinterpret.dox
+++ b/doc/src/string/string_reinterpret.dox
@@ -0,0 +1,184 @@
+namespace yycc {
+/**
+
+\page string_reinterpret String Reinterpret
+
+Now, you have know that we use UTF8 string everywhere in this project
+as we introduced in \ref premise_and_principle__string_encoding.
+Now it's time to know how to fetch UTF8 string from user or anywhere else.
+
+
+\section string_reinterpret__utf8_type UTF8 Type
+
+After upgrade the whole project into C++23, \c char8_t is the only valid UTF8 char type.
+And \c std::u8string and \c std::u8string_view are the only valid UTF8 string container and viewer.
+Additionally, \c u8 string literal prefix is the only way to create UTF8 string literal.
+
+All in all, please use this library provided string functions with UTF8 format.
+
+\section string_reinterpret__concept Concepts
+
+In following content, you may be face with 2 words: ordinary string and UTF8 string.
+
+UTF8 string, as its name, is the string encoded with UTF8.
+The char type of it must is \c yycc_char8_t.
+(equivalent to \c char8_t after C++ 20.)
+
+Ordinary string means the plain, native string.
+The result of C++ string literal without any prefix \c "foo bar" is a rdinary string.
+The char type of it is \c char.
+Its encoding depends on compiler and environment.
+(UTF8 in Linux, or system code page in Windows if UTF8 switch was not enabled in MSVC.)
+
+For more infomation, please browse CppReference:
+https://en.cppreference.com/w/cpp/language/string_literal
+
+\section string_reinterpret__utf8_literal UTF8 Literal
+
+String literal is a C++ concept.
+If you are not familar with it, please browse related article first, such as CppReference.
+
+\subsection string_reinterpret__utf8_literal__single Single Literal
+
+In short words, YYCC allow you declare an UTF8 literal like this:
+
+\code
+YYCC_U8("This is UTF8 literal.")
+\endcode
+
+YYCC_U8 is macro.
+You don't need add extra \c u8 prefix in string given to the macro.
+This macro will do this automatically.
+
+In detail, this macro do a \c reinterpret_cast to change the type of given argument to \c const \c yycc_char8_t* forcely.
+This ensure that declared UTF8 literal is compatible with YYCC UTF8 types.
+
+\subsection string_reinterpret__utf8_literal__char Single Char
+
+Same as UTF8 literal, YYCC allow you cast normal \c char into \c yycc_char8_t as following code:
+
+\code
+YYCC_U8_CHAR('A')
+\endcode
+
+YYCC_U8_CHAR is a macro.
+It just simply use \c static_cast to cast given value to \c yycc_char8_t.
+It doesn't mean that you can cast non-ASCII characters,
+because the space these characters occupied usually more than the maximum value of \c char.
+For example, following code is \b invalid:
+
+\code
+YYCC_U8_CHAR('文') // INVALID!
+\endcode
+
+\subsection string_reinterpret__utf8_literal__concatenation Literal Concatenation
+
+YYCC_U8 macro also works for string literal concatenation:
+
+\code
+YYCC_U8("Error code: " PRIu32 ". Please contact me.");
+\endcode
+
+According to C++ standard for string literal concatenation, 
+<I>"If one of the strings has an encoding prefix and the other does not, the one that does not will be considered to have the same encoding prefix as the other."</I>
+At the same time, YYCC_U8 macro will automatically add \c u8 prefix for the first component of this string literal concatenation.
+So the whole string will be UTF8 literal.
+It also order you should \b not add any prefix for other components of this string literal concatenation.
+
+\subsection string_reinterpret__utf8_literal__why Why?
+
+You may know that C++ standard allows programmer declare an UTF8 literal explicitly by writing code like this:
+
+\code
+u8"foo bar"
+\endcode
+
+This is okey. But it may incompatible with YYCC UTF8 char type.
+According to C++ standard, this UTF8 literal syntax will only return \c const \c char8_t* if your C++ standard higher or equal to C++ 20,
+otherwise it will return \c const \c char*.
+This behavior cause that you can not assign this UTF8 literal to \c yycc_u8string if you are in the environment which do not support \c char8_t, 
+because their types are different.
+Thereas you can not use the functions provided by this library because they are all use YYCC defined UTF8 char type.
+
+\section string_reinterpret__utf8_pointer UTF8 String Pointer
+
+String pointer means the raw pointer pointing to a string, such as \c const \c char*, \c char*, \c char32_t* and etc.
+
+Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c yycc_char8_t.
+YYCC provides YYCC::EncodingHelper::ToUTF8 to resolve this issue. There is an exmaple:
+
+\code
+const char* absolutely_is_utf8 = "I confirm this is encoded with UTF8.";
+const yycc_char8_t* converted = YYCC::EncodingHelper::ToUTF8(absolutely_is_utf8);
+
+char* mutable_utf8 = const_cast<char*>(absolutely_is_utf8); // This is not safe. Just for example.
+yycc_char8_t* mutable_converted = YYCC::EncodingHelper::ToUTF8(mutable_utf8);
+\endcode
+
+YYCC::EncodingHelper::ToUTF8 has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.
+
+YYCC also has ability that convert YYCC UTF8 char type to ordinary char type by YYCC::EncodingHelper::ToOrdinary.
+Here is an exmaple:
+
+\code
+const yycc_char8_t* yycc_utf8 = YYCC_U8("I am UTF8 string.");
+const char* converted = YYCC::EncodingHelper::ToOrdinary(yycc_utf8);
+
+yycc_char8_t* mutable_yycc_utf8 = const_cast<char*>(yycc_utf8); // Not safe. Also just for example.
+char* mutable_converted = YYCC::EncodingHelper::ToOrdinary(mutable_yycc_utf8);
+\endcode
+
+Same as YYCC::EncodingHelper::ToUTF8, YYCC::EncodingHelper::ToOrdinary also has 2 overloads to handle constant and mutable string pointer.
+
+\section string_reinterpret__utf8_container UTF8 String Container
+
+String container usually means the standard library string container, such as \c std::string, \c std::wstring, \c std::u32string and etc.
+
+In many personal project, programmer may use \c std::string everywhere because \c std::u8string may not be presented when writing peoject.
+How to do convertion between ordinary string container and YYCC UTF8 string container?
+It is definitely illegal that directly do force convertion. Because they may have different class layout.
+Calm down and I will tell you how to do correct convertion.
+YYCC provides YYCC::EncodingHelper::ToUTF8 to convert ordinary string container to YYCC UTF8 string container.
+There is an exmaple:
+
+\code
+std::string ordinary_string("I am UTF8");
+yycc_u8string yycc_string = YYCC::EncodingHelper::ToUTF8(ordinary_string);
+auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
+\endcode
+
+Actually, YYCC::EncodingHelper::ToUTF8 accepts a reference to \c std::string_view as argument.
+However, there is a implicit convertion from \c std::string to \c std::string_view, 
+so you can directly pass a \c std::string instance to it.
+
+String view will reduce unnecessary memory copy.
+If you just want to pass ordinary string container to function, and this function accepts \c yycc_u8string_view as its argument,
+you can use alternative YYCC::EncodingHelper::ToUTF8View.
+
+\code
+std::string ordinary_string("I am UTF8");
+yycc_u8string_view yycc_string = YYCC::EncodingHelper::ToUTF8View(ordinary_string);
+auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
+\endcode
+
+Comparing with previous one, this example use less memory.
+The reduced memory is the content of \c yycc_string because string view is a view, not the copy of original string.
+
+Same as UTF8 string pointer, we also have YYCC::EncodingHelper::ToOrdinary and YYCC::EncodingHelper::ToOrdinaryView do correspondant reverse convertion.
+Try to do your own research and figure out how to use them.
+It's pretty easy.
+
+\section string_reinterpret__windows Warnings to Windows Programmer
+
+Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
+If you run the convertion code introduced in this article with the string which is not encoded with UTF8, it may cause undefined behavior.
+
+To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
+Thus you can use the functions introduced in this article safely.
+Otherwise, you must guarteen that the argument you provided to these functions is encoded by UTF8 manually.
+
+Linux user do not need care this.
+Because almost Linux distro use UTF8 in default.
+
+*/
+}