1
0

doc: write document

This commit is contained in:
2025-12-28 16:54:22 +08:00
parent 6dbd031e00
commit e929ba3776
7 changed files with 270 additions and 320 deletions

View File

@@ -39,19 +39,17 @@
\li \subpage string__reinterpret
<!--
\li \subpage string_helper
\li \subpage string__op
\li \subpage patch
<!--
\li \subpage encoding_helper
\li \subpage parser_helper
\li \subpage console_helper
\li \subpage io_helper
\li \subpage std_patch
-->
<!--

75
doc/src/patch.dox Normal file
View File

@@ -0,0 +1,75 @@
namespace yycc::patch {
/**
\page patch Other STL Patches
There are some other STL patches in this library which can not be organized in single document file individually.
So I put them together here.
\section patch__ptr_pad Pointer Print Padding
When printing pointer on screen, programmer usually left-pad zero to make it looks good.
However, the count of zero for padding is different in x86 and x64 architecture (8 for x86 and 16 for x64).
Macro \c PRIXPTR_LPAD will help you to resolve this issue.
Macro \c PRIXPTR_LPAD will be expended to one of following value according to the target system architecture.
\li \c "08": On x86 system.
\li \c "016": On x64 system.
There is an example for how to use it:
\code
void* raw_ptr = blabla();
std::printf(stdout, "Raw Pointer 0x%" PRIXPTR_LPAD PRIXPTR, raw_ptr);
\endcode
Note \c PRIXPTR is defined by standard library for formatting pointer as hexadecimal style.
\section patch__smart_file Smart FILE Pointer
fopen::SmartStdFile use \c std::unique_ptr with custom deleter to implement smart \c FILE*.
It is useful in the cases that you want to automatically free opened file when leaving corresponding scope.
\section patch__utf8_fopen UTF8 fopen
In Windows, standard \c std::fopen can not handle UTF8 file name in common environment.
So we create fopen::fopen to give programmer an universal \c fopen in UTF8 style.
In Windows platform, this function will try to convert its argument to \c wchar_t
and calling Microsoft specific \c _wfopen function to open file.
If encoding convertion or \c _wfopen failed, this function will return \c nullptr like \c std::fopen does.
In other platforms, it will simply redirect calling to \c std::fopen.
There is a simple example:
\code
FILE* fs = fopen::fopen(u8"/path/to/file", u8"rb");
\endcode
\section patch__utf8_stream UTF8 Stream Support
The namespace yycc::patch::stream provides UTF8 support for \c std::ostream.
This namespace contains operator overloads that give \c std::ostream the ability to write UTF8 string and its char.
To use this feature, you should include its header file first,
and then directly use <TT>using namespace ::yycc::patch::stream;</TT> to import this namespace.
\section patch__utf8_format UTF8 Format Support
The namespace yycc::patch::format provides a patch for \c std::format to allow UTF8 string as arguments.
As \c std::format only allows \c char and \c wchar_t as its char type in C++ 23 currently,
it's impossible to use UTF8 string for std::format, both as format string and argument.
This namespace gives a patch for this shortcoming.
First, it define a brandnew format::format function, which resolve the issue that we can not use UTF8 as format string.
The implementation of this function is simple. We simply convert given UTF8 format string into ordinary string,
and then delegate it to \c std::vformat, the runtime format function in C++ 23.
So the performance of this function may be a little worse than \c std::format, but it's not a big deal.
We suggest that you use this namespace provided format::format function in your code,
to enable this UTF8 format string feature.
Additionally, this namespace provides \c std::formatter specializations for UTF8 string.
Thus we can safely use UTF8 string as argument in \c std::format, also including our invented brandnew format::format function.
*/
}

View File

@@ -1,50 +0,0 @@
namespace YYCC::IOHelper {
/**
\page io_helper IO Helper
Actually, YYCC::IOHelper includes functions which can not be placed in other place.
\section io_helper__ptr_pri_padding Pointer Print Padding
When printing pointer on screen, programmer usually left-pad zero to make it looks good.
However, the count of zero for padding is different in x86 and x64 architecture (8 for x86 and 16 for x64).
Macro \c PRI_XPTR_LEFT_PADDING will help you to resolve this issue.
Macro \c PRI_XPTR_LEFT_PADDING will be defined to following value according to the target system architecture.
\li \c "08": On x86 system.
\li \c "016": On x64 system.
There is an example for how to use it:
\code
void* raw_ptr = blabla();
std::printf(stdout, "Raw Pointer 0x%" PRI_XPTR_LEFT_PADDING PRIXPTR, raw_ptr);
\endcode
Note \c PRIXPTR is defined by standard library for formatting pointer as hexadecimal style.
\section io_helper__smart_file Smart FILE Pointer
#SmartStdFile use \c std::unique_ptr with custom deleter to implement smart \c FILE*.
It is useful in the cases that you want to automatically free opened file when leaving corresponding scope.
\section io_helper__utf8_fopen UTF8 fopen
In Windows, standard \c std::fopen can not handle UTF8 file name in common environment.
So we create this function to give programmer an universal \c fopen in UTF8 style.
In Windows platform, this function will try to convert its argument to \c wchar_t
and calling Microsoft specific \c _wfopen function to open file.
If encoding convertion or \c _wfopen failed, this function will return \c nullptr like \c std::fopen does.
In other platforms, it will simply redirect calling to \c std::fopen.
There is a simple example:
\code
FILE* fs = YYCC::IOHelper::FOpen(YYCC_U8("/path/to/file"), YYCC_U8("rb"));
\endcode
*/
}

View File

@@ -1,112 +0,0 @@
namespace YYCC::StdPatch {
/**
\page std_patch Standard Library Patch
\section std_patch__starts_with_ends_with Starts With & Ends With
\c std::basic_string::starts_with and \c std::basic_string::ends_with (also available in \c std::basic_string_view)
are functions introduced in C++ 20 and unavailable in C++ 17.
YYCC::StdPatch provides a patch for these function in C++ 17 environment.
Please note these implementations are following implementation instruction presented by CppReference website.
And it should have the same performance with vanilla functions because Microsoft STL use the same way to implement.
These implementations will not fallback to vanilla function even they are available.
Because their performance are good.
To use these functions, you just need to call them like corresponding vanilla functions.
Our implementations provide all necessary overloads.
The only thing you need to do is provide the string self as the first argument,
because our implementations can not be inserted as a class member of string.
There is an example:
\code
YYCC::StdPatch::StartsWith(YYCC_U8("aabbcc"), YYCC_U8("aa"));
YYCC::StdPatch::EndsWith(YYCC_U8("aabbcc"), YYCC_U8("cc"));
\endcode
\section std_patch__contains Contains
\c Contains function in standard library ordered and unordered successive container are also introduced in C++ 20.
YYCC::StdPatch provides a patch for this function in C++ 17 environment.
Please note this implementation will fallback to vanilla function if it is available.
Because our implementation is a remedy (there is no way to use public class member to have the same performance of vanilla function).
There is an example about how to use it:
\code
std::set<int> test { 1, 5 };
YYCC::StdPatch::Contains(test, static_cast<int>(5));
\endcode
\section std_patch__fs_path std::filesystem::path Patch
As you know, the underlying char type of \c std::filesystem::path is \c wchar_t on Windows,
and in other platforms, it is simple \c char.
Due to this, if you try to create a \c std::filesystem::path instance by calling constructor with an UTF8 char sequence on Windows,
the library implementation will assume your input is based on current Windows code page, not UTF8.
And the final path stored in \c std::filesystem::path is not what you expcected.
This patch gives you a way to create \c std::filesystem::path
and extract path string stored in \c std::filesystem::path with UTF8 encoding.
This patch namespace always use UTF8 as its argument.
You should use the functions provided by this namespace on any platforms
instead of vanilla \c std::filesystem::path functions.
However, if your C++ standard is higher than C++ 20,
you can directly use UTF8 string pointer and string container in \c std::filesystem::path,
because standard library has supported them.
This patch only just want to provide an uniform programming experience.
This patch is served for Windows but also works on other plaftoms.
If you are in Windows, this patch will perform extra operations to achieve goals,
and in other platforms, they just redirect request to corresponding vanilla C++ functions.
\subsection std_patch__fs_path__from_utf8_path Create Path from UTF8 String
#ToStdPath provides this feature.
It accepts an string pointer to UTF8 string and try to create \c std::filesystem::path from it.
Function will throw exception if encoding convertion or constructor self failed.
There are some example:
\code
auto foobar_path = YYCC::StdPatch::ToStdPath(YYCC_U8("/foo/bar"));
auto slashed_path = foobar_path / YYCC::StdPatch::ToStdPath(YYCC_U8("test"));
auto replaced_ext = foobar_path.replace_extension(YYCC::StdPatch::ToStdPath(YYCC_U8(".txt")));
\endcode
For first line in example, it is obvious that you can create a \c std::filesystem::path from this function.
However, for the second and third line in example, what we want to tell you is
that you should always use this function in other \c std::filesystem::path functions requiring path string.
\c std::filesystem::path is a very \e conservative class.
Most of its functions only accept \c std::filesystem::path self as argument.
For example, \c std::filesystem::path::replace_extension do not accept string as argument.
It accepts a reference to \c std::filesystem::path as argument.
(it still is possible that pass string pointer or string container to it because they can be converted to \c std::filesystem::path implicitly.)
It's great. This is what we expected!
We now can safely deliver the result generated by our function to these functions,
and don't need to worry about the encoding of we provided string.
Because all strings have been converted to \c std::filesystem::path by our function before passing them.
So, the second line will produce \c "/foo/bar/test"
and the third line will produce \c "/foo/bar.txt" in any platforms.
You may notice std::filesystem::u8path.
However it is depracted since C++ 20,
because \c std::filesystem::path directly supports UTF8 by \c char8_t since C++ 20.
Because C++ standard is volatile, we create this function to have an uniform programming experience.
\subsection std_patch__fs_path__to_utf8_path Extract UTF8 Path String from Path
#ToUTF8Path provides this feature.
It basically is the reversed operation of #ToStdPath.
It is usually used when you have done all path work in \c std::filesystem::path
and want to get the result.
There is an example:
\code
auto foobar_path = YYCC::StdPatch::ToStdPath(YYCC_U8("/foo/bar"));
auto result = YYCC::StdPatch::ToUTF8Path(foobar_path / YYCC::StdPatch::ToStdPath(YYCC_U8("test")));
\endcode
*/
}

176
doc/src/string/op.dox Normal file
View File

@@ -0,0 +1,176 @@
namespace yycc::string::op {
/**
\page string__op String Operations
\section string__op__printf Printf VPrintf
yycc::string::op provides 4 functions for formatting string.
These functions are originally provided to programmer who can not use C++ 20 \c std::format feature.
However, when this project was migrated to C++23 standard, \c std::format is finally available.
And we set these functions as the complement to \c std::format feature.
\code
std::u8string printf(const char8_t* format, ...);
std::u8string vprintf(const char8_t* format, va_list argptr);
std::string printf(const char* format, ...);
std::string vprintf(const char* format, va_list argptr);
\endcode
#printf and #vprintf is similar to \c std::sprintf and \c std::vsprintf.
#printf accepts UTF8 format string and variadic arguments specifying data to print.
This is commonly used by programmer.
However, #vprintf also do the same work but its second argument is \c va_list,
the representation of variadic arguments.
It is mostly used by other function which has variadic arguments.
The only difference between these function and standard library functions is
that you don't need to worry about whether the space of given buffer is enough,
because these functions help you to calculate this internally.
Once there are some exceptions occurs, such as, not enough memeory, or the bad syntax of format string,
these functions will throw exception immediately.
\section string__op__replace Replace
yycc::string::op provide 2 functions for programmer do string replacement:
\code
void replace(std::u8string& strl, const std::u8string_view& from_strl, const std::u8string_view& to_strl);
std::u8string replace(const std::u8string_view& strl, const std::u8string_view& from_strl, const std::u8string_view& to_strl);
\endcode
The first overload will do replacement in given string container directly.
The second overload will produce a copy of original string and do replacement on the copied string.
These #replace functions have special treatments for boundary scenarios:
\li If given string is empty, the return value will be empty.
\li If the character sequence to be replaced is empty string, no replacement will happen.
\li If the character sequence will be replaced into string is or empty, it will simply delete found character sequence from given string.
\section string__op__join Join
yycc::string::op provide an universal way for joining string and various specialized join functions.
\subsection string__op__join__universal Universal Join Function
Because C++ list types are various.
There is no unique and convenient way to create an universal join function.
So we create #JoinDataProvider to describe join context.
Before using universal join function,
you should setup #JoinDataProvider first, the context of join function.
It actually is an \c std::function object which can be easily fetched by C++ lambda syntax.
This function pointer returns \c std::optional<std::u8string_view>,
which should return \c std::u8string_view for the data to be joined, or \c std::nullopt if there is no more data.
As you noticed, this is similar to Rust iterator.
Then, you can pass the created #JoinDataProvider object to #join function.
And specify delimiter at the same time.
Then you can get the final joined string.
There is an example:
\code
std::vector<std::u8string> data {
u8"", u8"1", u8"2", u8""
};
auto iter = data.cbegin();
auto stop = data.cend();
std::u8string joined_string = yycc::string::op::join(
[&iter, &stop]() -> std::optional<std::u8string_view> {
if (iter == stop) return std::nullopt;
return *iter++;
},
delimiter
);
\endcode
\subsection string__op__join__specialized Specialized Join Function
Despite universal join function,
yycc::string::op also provide a specialized join functions for standard library container.
For example, the code written above can be written in following code by using this specialized overload.
The first two argument is just the begin and end iterator.
However, you must make sure that the iterator can be dereferenced and then implicitly converted to std::u8string_view.
\code
std::vector<std::u8string> data {
u8"", u8"1", u8"2", u8""
};
std::u8string joined_string = yycc::string::op::join(data.begin(), data.end(), delimiter);
\endcode
\section string__op__lower_upper Lower Upper
This namespace provides Python-like string lower and upper function.
\code
void lower(std::u8string& strl);
std::u8string to_lower(const std::u8string_view& strl);
void upper(std::u8string& strl);
std::u8string to_upper(const std::u8string_view& strl);
\endcode
The functions start with "to_" prefix accept a string view as argument
and return a \b copy whose content are all the lower/upper case of original string.
The rest of these functions accept a mutable string container as argument and will modify it in place.
\section string__op__strip_trim Strip and Trim
This namespace provides functions for removing leading and trailing characters.
There are two sets of functions:
\subsection string__op__strip Unicode-aware functions
These functions properly handle Unicode characters when stripping:
\code
std::u8string_view strip(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view lstrip(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view rstrip(const std::u8string_view& strl, const std::u8string_view& words);
\endcode
The prefix "l" and "r" are for left and right strip respectively like Python.
\subsection string__op__trim ASCII-only functions
These functions treat each byte as an individual character and are faster for ASCII-only scenarios:
\code
std::u8string_view trim(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view ltrim(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view rtrim(const std::u8string_view& strl, const std::u8string_view& words);
\endcode
The difference of "trim" and "strip" is same as their invented time in Java.
"trim" is inveted at first so its function is confined to ASCII-only strings.
"strip" is introduced later and it should accept more scenarios like Unicode.
Although all of "trim" and "strip" can handle Unicode in Java.
\section string__op__split Split
This namespace provides Python-like string split functions.
It has 3 variants for different use cases:
\code
LazySplit lazy_split(const std::u8string_view& strl, const std::u8string_view& delimiter);
std::vector<std::u8string_view> split(const std::u8string_view& strl, const std::u8string_view& delimiter);
std::vector<std::u8string> split_owned(const std::u8string_view& strl, const std::u8string_view& delimiter);
\endcode
All these overloads take a string view as the first argument representing the string need to be split.
The second argument is a string view representing the delimiter for splitting.
The first function #lazy_split returns a #LazySplit object that can be used in range-based for loops.
This is lazy-computed and memory-efficient for large datasets.
The second function #split returns a vector of string views, which is memory-efficient
but the views are only valid as long as the original string remains valid.
The third function #split_owned returns a vector of strings, which are copies of the original parts.
If the source string (the string need to be split) is empty, or the delimiter is empty,
the result will only has 1 item and this item is source string itself.
There is no way that these methods return an empty list, except the code is buggy.
*/
}

View File

@@ -89,13 +89,25 @@ Same as UTF8 string pointer, we also have as_ordinary() and as_ordinary_view() d
Try to do your own research and figure out how to use them.
It's pretty easy.
\section string__reinterpret__windows_warns Warnings to Windows Programmer
\section string__reinterpret__clarification Clarification about Usage Scenario
Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
Let we make a clarification for what this chapter are talking about.
In these chapter, what we are talking about the convertion between UTF8 string and ordinary string,
which is originally encoded by UTF-8 but presented by \c char type.
This spot is crucial. If you apply any functions provided by this namespace to any string which is not encoded by UTF-8,
for example, trying converting an CP1252 encoded western europe string to UTF-8 via function given by this namespace,
it must cause <B>undefined behavior</B>.
The correct function for doing these things introduced above is located in yycc::encoding namespace,
or a more generic module located in yycc::carton::pycodec.
This namespace is only suit for the convertion of UTF-8 string which was mis-presented by non-<TT>char8_t</TT> types.
After understand this point, you now can safely use this namespace.
Additionally, due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
If you run the convertion code introduced in this article with the string which is not encoded with UTF8,
it may cause undefined behavior.
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC compiler.
Thus you can use the functions introduced in this article safely.
Otherwise, you must guarteen that the argument you provided to these functions is encoded by UTF8 manually.

View File

@@ -1,149 +0,0 @@
namespace YYCC::StringHelper {
/**
\page string_helper String Helper
\section string_helper__printf Printf VPrintf
YYCC::StringHelper provides 4 functions for formatting string.
These functions are mainly provided to programmer who can not use C++ 20 \c std::format feature.
\code
bool Printf(yycc_u8string&, const yycc_char8_t*, ...);
bool VPrintf(yycc_u8string&, const yycc_char8_t*, va_list argptr);
yycc_u8string Printf(const yycc_char8_t*, ...);
yycc_u8string VPrintf(const yycc_char8_t*, va_list argptr);
\endcode
#Printf and #VPrintf is similar to \c std::sprintf and \c std::vsprintf.
#Printf accepts UTF8 format string and variadic arguments specifying data to print.
This is commonly used by programmer.
However, #VPrintf also do the same work but its second argument is \c va_list,
the representation of variadic arguments.
It is mostly used by other function which has variadic arguments.
The only difference between these function and standard library functions is
that you don't need to worry about whether the space of given buffer is enough,
because these functions help you to calculate this internally.
There is the same design like we introduced in \ref encoding_helper.
There are 2 overloads for #Printf and #VPrintf respectively.
First overload return bool value and require a string container as argument for storing result.
The second overload return result string directly.
As you expected, first overload will return false if fail to format string (this is barely happened).
and second overload will return empty string when formatter failed.
\section string_helper__replace Replace
YYCC::StringHelper provide 2 functions for programmer do string replacement:
\code
void Replace(yycc_u8string&, const yycc_u8string_view&, const yycc_u8string_view&);
yycc_u8string Replace(const yycc_u8string_view&, const yycc_u8string_view&, const yycc_u8string_view&);
\endcode
The first overload will do replacement in given string container directly.
The second overload will produce a copy of original string and do replacement on the copied string.
#Replace has special treatments for following scenarios:
\li If given string is empty, the return value will be empty.
\li If the character sequence to be replaced is empty string, no replacement will happen.
\li If the character sequence will be replaced into string is or empty, it will simply delete found character sequence from given string.
\section string_helper__join Join
YYCC::StringHelper provide an universal way for joining string and various specialized join functions.
\subsection string_helper__join__universal Universal Join Function
Because C++ list types are various.
There is no unique and convenient way to create an universal join function.
So we create #JoinDataProvider to describe join context.
Before using universal join function,
you should setup #JoinDataProvider first, the context of join function.
It actually is an \c std::function object which can be easily fetched by C++ lambda syntax.
This function pointer accept a reference to \c yycc_u8string_view,
programmer should set it to the string to be joined when at each calling.
And this function pointer return a bool value to indicate the end of join.
You can simply return \c false to terminate join process.
The argument you assigned to argument will not be taken into join process when you return false.
Then, you can pass the created #JoinDataProvider object to #Join function.
And specify delimiter at the same time.
Then you can get the final joined string.
There is an example:
\code
std::vector<yycc_u8string> data {
YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
};
auto iter = data.cbegin();
auto stop = data.cend();
auto joined_string = YYCC::StringHelper::Join(
[&iter, &stop](yycc_u8string_view& view) -> bool {
if (iter == stop) return false;
view = *iter;
++iter;
return true;
},
delimiter
);
\endcode
\subsection string_helper__join__specialized Specialized Join Function
Despite universal join function,
YYCC::StringHelper also provide a specialized join functions for standard library container.
For example, the code written above can be written in following code by using this specialized overload.
The first two argument is just the begin and end iterator.
However, you must make sure that we can dereference it and then implicitly convert it to yycc_u8string_view.
Otherwise this overload will throw template error.
\code
std::vector<yycc_u8string> data {
YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
};
auto joined_string = YYCC::StringHelper::Join(data.begin(), data.end(), delimiter);
\endcode
\section string_helper__lower_upper Lower Upper
String helper provides Python-like string lower and upper function.
Both lower and upper function have 2 overloads:
\code
yycc_u8string Lower(const yycc_u8string_view&);
void Lower(yycc_u8string&);
\endcode
First overload accepts a string view as argument and return a \b copy whose content are all the lower case of original string.
Second overload accepts a mutable string container as argument and will make all characters stored in it become their lower case.
You can choose on of them for your flavor and requirements.
Upper also has similar 2 overloads.
\section string_helper__split Split
String helper provides Python-like string split function.
It has 2 types for you:
\code
std::vector<yycc_u8string> Split(const yycc_u8string_view&, const yycc_u8string_view&);
std::vector<yycc_u8string_view> SplitView(const yycc_u8string_view&, const yycc_u8string_view&);
\endcode
All these overloads take a string view as the first argument representing the string need to be split.
The second argument is a string view representing the delimiter for splitting.
The only difference between these 2 split function are overt according to their names.
The first split function will return a list of copied string as its split result.
The second split function will return a list of string view as its split result,
and it will keep valid as long as the life time of your given string view argument.
It also means that the last overload will cost less memory if you don't need the copy of original string.
If the source string (the string need to be split) is empty, or the delimiter is empty,
the result will only has 1 item and this item is source string itself.
There is no way that these methods return an empty list, except the code is buggy.
*/
}