1
0

doc: write document

This commit is contained in:
2025-12-28 16:54:22 +08:00
parent 6dbd031e00
commit e929ba3776
7 changed files with 270 additions and 320 deletions

176
doc/src/string/op.dox Normal file
View File

@@ -0,0 +1,176 @@
namespace yycc::string::op {
/**
\page string__op String Operations
\section string__op__printf Printf VPrintf
yycc::string::op provides 4 functions for formatting string.
These functions are originally provided to programmer who can not use C++ 20 \c std::format feature.
However, when this project was migrated to C++23 standard, \c std::format is finally available.
And we set these functions as the complement to \c std::format feature.
\code
std::u8string printf(const char8_t* format, ...);
std::u8string vprintf(const char8_t* format, va_list argptr);
std::string printf(const char* format, ...);
std::string vprintf(const char* format, va_list argptr);
\endcode
#printf and #vprintf is similar to \c std::sprintf and \c std::vsprintf.
#printf accepts UTF8 format string and variadic arguments specifying data to print.
This is commonly used by programmer.
However, #vprintf also do the same work but its second argument is \c va_list,
the representation of variadic arguments.
It is mostly used by other function which has variadic arguments.
The only difference between these function and standard library functions is
that you don't need to worry about whether the space of given buffer is enough,
because these functions help you to calculate this internally.
Once there are some exceptions occurs, such as, not enough memeory, or the bad syntax of format string,
these functions will throw exception immediately.
\section string__op__replace Replace
yycc::string::op provide 2 functions for programmer do string replacement:
\code
void replace(std::u8string& strl, const std::u8string_view& from_strl, const std::u8string_view& to_strl);
std::u8string replace(const std::u8string_view& strl, const std::u8string_view& from_strl, const std::u8string_view& to_strl);
\endcode
The first overload will do replacement in given string container directly.
The second overload will produce a copy of original string and do replacement on the copied string.
These #replace functions have special treatments for boundary scenarios:
\li If given string is empty, the return value will be empty.
\li If the character sequence to be replaced is empty string, no replacement will happen.
\li If the character sequence will be replaced into string is or empty, it will simply delete found character sequence from given string.
\section string__op__join Join
yycc::string::op provide an universal way for joining string and various specialized join functions.
\subsection string__op__join__universal Universal Join Function
Because C++ list types are various.
There is no unique and convenient way to create an universal join function.
So we create #JoinDataProvider to describe join context.
Before using universal join function,
you should setup #JoinDataProvider first, the context of join function.
It actually is an \c std::function object which can be easily fetched by C++ lambda syntax.
This function pointer returns \c std::optional<std::u8string_view>,
which should return \c std::u8string_view for the data to be joined, or \c std::nullopt if there is no more data.
As you noticed, this is similar to Rust iterator.
Then, you can pass the created #JoinDataProvider object to #join function.
And specify delimiter at the same time.
Then you can get the final joined string.
There is an example:
\code
std::vector<std::u8string> data {
u8"", u8"1", u8"2", u8""
};
auto iter = data.cbegin();
auto stop = data.cend();
std::u8string joined_string = yycc::string::op::join(
[&iter, &stop]() -> std::optional<std::u8string_view> {
if (iter == stop) return std::nullopt;
return *iter++;
},
delimiter
);
\endcode
\subsection string__op__join__specialized Specialized Join Function
Despite universal join function,
yycc::string::op also provide a specialized join functions for standard library container.
For example, the code written above can be written in following code by using this specialized overload.
The first two argument is just the begin and end iterator.
However, you must make sure that the iterator can be dereferenced and then implicitly converted to std::u8string_view.
\code
std::vector<std::u8string> data {
u8"", u8"1", u8"2", u8""
};
std::u8string joined_string = yycc::string::op::join(data.begin(), data.end(), delimiter);
\endcode
\section string__op__lower_upper Lower Upper
This namespace provides Python-like string lower and upper function.
\code
void lower(std::u8string& strl);
std::u8string to_lower(const std::u8string_view& strl);
void upper(std::u8string& strl);
std::u8string to_upper(const std::u8string_view& strl);
\endcode
The functions start with "to_" prefix accept a string view as argument
and return a \b copy whose content are all the lower/upper case of original string.
The rest of these functions accept a mutable string container as argument and will modify it in place.
\section string__op__strip_trim Strip and Trim
This namespace provides functions for removing leading and trailing characters.
There are two sets of functions:
\subsection string__op__strip Unicode-aware functions
These functions properly handle Unicode characters when stripping:
\code
std::u8string_view strip(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view lstrip(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view rstrip(const std::u8string_view& strl, const std::u8string_view& words);
\endcode
The prefix "l" and "r" are for left and right strip respectively like Python.
\subsection string__op__trim ASCII-only functions
These functions treat each byte as an individual character and are faster for ASCII-only scenarios:
\code
std::u8string_view trim(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view ltrim(const std::u8string_view& strl, const std::u8string_view& words);
std::u8string_view rtrim(const std::u8string_view& strl, const std::u8string_view& words);
\endcode
The difference of "trim" and "strip" is same as their invented time in Java.
"trim" is inveted at first so its function is confined to ASCII-only strings.
"strip" is introduced later and it should accept more scenarios like Unicode.
Although all of "trim" and "strip" can handle Unicode in Java.
\section string__op__split Split
This namespace provides Python-like string split functions.
It has 3 variants for different use cases:
\code
LazySplit lazy_split(const std::u8string_view& strl, const std::u8string_view& delimiter);
std::vector<std::u8string_view> split(const std::u8string_view& strl, const std::u8string_view& delimiter);
std::vector<std::u8string> split_owned(const std::u8string_view& strl, const std::u8string_view& delimiter);
\endcode
All these overloads take a string view as the first argument representing the string need to be split.
The second argument is a string view representing the delimiter for splitting.
The first function #lazy_split returns a #LazySplit object that can be used in range-based for loops.
This is lazy-computed and memory-efficient for large datasets.
The second function #split returns a vector of string views, which is memory-efficient
but the views are only valid as long as the original string remains valid.
The third function #split_owned returns a vector of strings, which are copies of the original parts.
If the source string (the string need to be split) is empty, or the delimiter is empty,
the result will only has 1 item and this item is source string itself.
There is no way that these methods return an empty list, except the code is buggy.
*/
}

View File

@@ -89,13 +89,25 @@ Same as UTF8 string pointer, we also have as_ordinary() and as_ordinary_view() d
Try to do your own research and figure out how to use them.
It's pretty easy.
\section string__reinterpret__windows_warns Warnings to Windows Programmer
\section string__reinterpret__clarification Clarification about Usage Scenario
Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
Let we make a clarification for what this chapter are talking about.
In these chapter, what we are talking about the convertion between UTF8 string and ordinary string,
which is originally encoded by UTF-8 but presented by \c char type.
This spot is crucial. If you apply any functions provided by this namespace to any string which is not encoded by UTF-8,
for example, trying converting an CP1252 encoded western europe string to UTF-8 via function given by this namespace,
it must cause <B>undefined behavior</B>.
The correct function for doing these things introduced above is located in yycc::encoding namespace,
or a more generic module located in yycc::carton::pycodec.
This namespace is only suit for the convertion of UTF-8 string which was mis-presented by non-<TT>char8_t</TT> types.
After understand this point, you now can safely use this namespace.
Additionally, due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
If you run the convertion code introduced in this article with the string which is not encoded with UTF8,
it may cause undefined behavior.
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC compiler.
Thus you can use the functions introduced in this article safely.
Otherwise, you must guarteen that the argument you provided to these functions is encoded by UTF8 manually.

View File

@@ -1,149 +0,0 @@
namespace YYCC::StringHelper {
/**
\page string_helper String Helper
\section string_helper__printf Printf VPrintf
YYCC::StringHelper provides 4 functions for formatting string.
These functions are mainly provided to programmer who can not use C++ 20 \c std::format feature.
\code
bool Printf(yycc_u8string&, const yycc_char8_t*, ...);
bool VPrintf(yycc_u8string&, const yycc_char8_t*, va_list argptr);
yycc_u8string Printf(const yycc_char8_t*, ...);
yycc_u8string VPrintf(const yycc_char8_t*, va_list argptr);
\endcode
#Printf and #VPrintf is similar to \c std::sprintf and \c std::vsprintf.
#Printf accepts UTF8 format string and variadic arguments specifying data to print.
This is commonly used by programmer.
However, #VPrintf also do the same work but its second argument is \c va_list,
the representation of variadic arguments.
It is mostly used by other function which has variadic arguments.
The only difference between these function and standard library functions is
that you don't need to worry about whether the space of given buffer is enough,
because these functions help you to calculate this internally.
There is the same design like we introduced in \ref encoding_helper.
There are 2 overloads for #Printf and #VPrintf respectively.
First overload return bool value and require a string container as argument for storing result.
The second overload return result string directly.
As you expected, first overload will return false if fail to format string (this is barely happened).
and second overload will return empty string when formatter failed.
\section string_helper__replace Replace
YYCC::StringHelper provide 2 functions for programmer do string replacement:
\code
void Replace(yycc_u8string&, const yycc_u8string_view&, const yycc_u8string_view&);
yycc_u8string Replace(const yycc_u8string_view&, const yycc_u8string_view&, const yycc_u8string_view&);
\endcode
The first overload will do replacement in given string container directly.
The second overload will produce a copy of original string and do replacement on the copied string.
#Replace has special treatments for following scenarios:
\li If given string is empty, the return value will be empty.
\li If the character sequence to be replaced is empty string, no replacement will happen.
\li If the character sequence will be replaced into string is or empty, it will simply delete found character sequence from given string.
\section string_helper__join Join
YYCC::StringHelper provide an universal way for joining string and various specialized join functions.
\subsection string_helper__join__universal Universal Join Function
Because C++ list types are various.
There is no unique and convenient way to create an universal join function.
So we create #JoinDataProvider to describe join context.
Before using universal join function,
you should setup #JoinDataProvider first, the context of join function.
It actually is an \c std::function object which can be easily fetched by C++ lambda syntax.
This function pointer accept a reference to \c yycc_u8string_view,
programmer should set it to the string to be joined when at each calling.
And this function pointer return a bool value to indicate the end of join.
You can simply return \c false to terminate join process.
The argument you assigned to argument will not be taken into join process when you return false.
Then, you can pass the created #JoinDataProvider object to #Join function.
And specify delimiter at the same time.
Then you can get the final joined string.
There is an example:
\code
std::vector<yycc_u8string> data {
YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
};
auto iter = data.cbegin();
auto stop = data.cend();
auto joined_string = YYCC::StringHelper::Join(
[&iter, &stop](yycc_u8string_view& view) -> bool {
if (iter == stop) return false;
view = *iter;
++iter;
return true;
},
delimiter
);
\endcode
\subsection string_helper__join__specialized Specialized Join Function
Despite universal join function,
YYCC::StringHelper also provide a specialized join functions for standard library container.
For example, the code written above can be written in following code by using this specialized overload.
The first two argument is just the begin and end iterator.
However, you must make sure that we can dereference it and then implicitly convert it to yycc_u8string_view.
Otherwise this overload will throw template error.
\code
std::vector<yycc_u8string> data {
YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
};
auto joined_string = YYCC::StringHelper::Join(data.begin(), data.end(), delimiter);
\endcode
\section string_helper__lower_upper Lower Upper
String helper provides Python-like string lower and upper function.
Both lower and upper function have 2 overloads:
\code
yycc_u8string Lower(const yycc_u8string_view&);
void Lower(yycc_u8string&);
\endcode
First overload accepts a string view as argument and return a \b copy whose content are all the lower case of original string.
Second overload accepts a mutable string container as argument and will make all characters stored in it become their lower case.
You can choose on of them for your flavor and requirements.
Upper also has similar 2 overloads.
\section string_helper__split Split
String helper provides Python-like string split function.
It has 2 types for you:
\code
std::vector<yycc_u8string> Split(const yycc_u8string_view&, const yycc_u8string_view&);
std::vector<yycc_u8string_view> SplitView(const yycc_u8string_view&, const yycc_u8string_view&);
\endcode
All these overloads take a string view as the first argument representing the string need to be split.
The second argument is a string view representing the delimiter for splitting.
The only difference between these 2 split function are overt according to their names.
The first split function will return a list of copied string as its split result.
The second split function will return a list of string view as its split result,
and it will keep valid as long as the life time of your given string view argument.
It also means that the last overload will cost less memory if you don't need the copy of original string.
If the source string (the string need to be split) is empty, or the delimiter is empty,
the result will only has 1 item and this item is source string itself.
There is no way that these methods return an empty list, except the code is buggy.
*/
}