doc: write document
This commit is contained in:
176
doc/src/string/op.dox
Normal file
176
doc/src/string/op.dox
Normal file
@@ -0,0 +1,176 @@
|
||||
namespace yycc::string::op {
|
||||
/**
|
||||
|
||||
\page string__op String Operations
|
||||
|
||||
\section string__op__printf Printf VPrintf
|
||||
|
||||
yycc::string::op provides 4 functions for formatting string.
|
||||
These functions are originally provided to programmer who can not use C++ 20 \c std::format feature.
|
||||
However, when this project was migrated to C++23 standard, \c std::format is finally available.
|
||||
And we set these functions as the complement to \c std::format feature.
|
||||
|
||||
\code
|
||||
std::u8string printf(const char8_t* format, ...);
|
||||
std::u8string vprintf(const char8_t* format, va_list argptr);
|
||||
std::string printf(const char* format, ...);
|
||||
std::string vprintf(const char* format, va_list argptr);
|
||||
\endcode
|
||||
|
||||
#printf and #vprintf is similar to \c std::sprintf and \c std::vsprintf.
|
||||
#printf accepts UTF8 format string and variadic arguments specifying data to print.
|
||||
This is commonly used by programmer.
|
||||
However, #vprintf also do the same work but its second argument is \c va_list,
|
||||
the representation of variadic arguments.
|
||||
It is mostly used by other function which has variadic arguments.
|
||||
|
||||
The only difference between these function and standard library functions is
|
||||
that you don't need to worry about whether the space of given buffer is enough,
|
||||
because these functions help you to calculate this internally.
|
||||
|
||||
Once there are some exceptions occurs, such as, not enough memeory, or the bad syntax of format string,
|
||||
these functions will throw exception immediately.
|
||||
|
||||
\section string__op__replace Replace
|
||||
|
||||
yycc::string::op provide 2 functions for programmer do string replacement:
|
||||
|
||||
\code
|
||||
void replace(std::u8string& strl, const std::u8string_view& from_strl, const std::u8string_view& to_strl);
|
||||
std::u8string replace(const std::u8string_view& strl, const std::u8string_view& from_strl, const std::u8string_view& to_strl);
|
||||
\endcode
|
||||
|
||||
The first overload will do replacement in given string container directly.
|
||||
The second overload will produce a copy of original string and do replacement on the copied string.
|
||||
|
||||
These #replace functions have special treatments for boundary scenarios:
|
||||
|
||||
\li If given string is empty, the return value will be empty.
|
||||
\li If the character sequence to be replaced is empty string, no replacement will happen.
|
||||
\li If the character sequence will be replaced into string is or empty, it will simply delete found character sequence from given string.
|
||||
|
||||
\section string__op__join Join
|
||||
|
||||
yycc::string::op provide an universal way for joining string and various specialized join functions.
|
||||
|
||||
\subsection string__op__join__universal Universal Join Function
|
||||
|
||||
Because C++ list types are various.
|
||||
There is no unique and convenient way to create an universal join function.
|
||||
So we create #JoinDataProvider to describe join context.
|
||||
|
||||
Before using universal join function,
|
||||
you should setup #JoinDataProvider first, the context of join function.
|
||||
It actually is an \c std::function object which can be easily fetched by C++ lambda syntax.
|
||||
This function pointer returns \c std::optional<std::u8string_view>,
|
||||
which should return \c std::u8string_view for the data to be joined, or \c std::nullopt if there is no more data.
|
||||
As you noticed, this is similar to Rust iterator.
|
||||
|
||||
Then, you can pass the created #JoinDataProvider object to #join function.
|
||||
And specify delimiter at the same time.
|
||||
Then you can get the final joined string.
|
||||
There is an example:
|
||||
|
||||
\code
|
||||
std::vector<std::u8string> data {
|
||||
u8"", u8"1", u8"2", u8""
|
||||
};
|
||||
auto iter = data.cbegin();
|
||||
auto stop = data.cend();
|
||||
std::u8string joined_string = yycc::string::op::join(
|
||||
[&iter, &stop]() -> std::optional<std::u8string_view> {
|
||||
if (iter == stop) return std::nullopt;
|
||||
return *iter++;
|
||||
},
|
||||
delimiter
|
||||
);
|
||||
\endcode
|
||||
|
||||
\subsection string__op__join__specialized Specialized Join Function
|
||||
|
||||
Despite universal join function,
|
||||
yycc::string::op also provide a specialized join functions for standard library container.
|
||||
For example, the code written above can be written in following code by using this specialized overload.
|
||||
The first two argument is just the begin and end iterator.
|
||||
However, you must make sure that the iterator can be dereferenced and then implicitly converted to std::u8string_view.
|
||||
|
||||
\code
|
||||
std::vector<std::u8string> data {
|
||||
u8"", u8"1", u8"2", u8""
|
||||
};
|
||||
std::u8string joined_string = yycc::string::op::join(data.begin(), data.end(), delimiter);
|
||||
\endcode
|
||||
|
||||
\section string__op__lower_upper Lower Upper
|
||||
|
||||
This namespace provides Python-like string lower and upper function.
|
||||
|
||||
\code
|
||||
void lower(std::u8string& strl);
|
||||
std::u8string to_lower(const std::u8string_view& strl);
|
||||
void upper(std::u8string& strl);
|
||||
std::u8string to_upper(const std::u8string_view& strl);
|
||||
\endcode
|
||||
|
||||
The functions start with "to_" prefix accept a string view as argument
|
||||
and return a \b copy whose content are all the lower/upper case of original string.
|
||||
The rest of these functions accept a mutable string container as argument and will modify it in place.
|
||||
|
||||
\section string__op__strip_trim Strip and Trim
|
||||
|
||||
This namespace provides functions for removing leading and trailing characters.
|
||||
There are two sets of functions:
|
||||
|
||||
\subsection string__op__strip Unicode-aware functions
|
||||
|
||||
These functions properly handle Unicode characters when stripping:
|
||||
|
||||
\code
|
||||
std::u8string_view strip(const std::u8string_view& strl, const std::u8string_view& words);
|
||||
std::u8string_view lstrip(const std::u8string_view& strl, const std::u8string_view& words);
|
||||
std::u8string_view rstrip(const std::u8string_view& strl, const std::u8string_view& words);
|
||||
\endcode
|
||||
|
||||
The prefix "l" and "r" are for left and right strip respectively like Python.
|
||||
|
||||
\subsection string__op__trim ASCII-only functions
|
||||
|
||||
These functions treat each byte as an individual character and are faster for ASCII-only scenarios:
|
||||
|
||||
\code
|
||||
std::u8string_view trim(const std::u8string_view& strl, const std::u8string_view& words);
|
||||
std::u8string_view ltrim(const std::u8string_view& strl, const std::u8string_view& words);
|
||||
std::u8string_view rtrim(const std::u8string_view& strl, const std::u8string_view& words);
|
||||
\endcode
|
||||
|
||||
The difference of "trim" and "strip" is same as their invented time in Java.
|
||||
"trim" is inveted at first so its function is confined to ASCII-only strings.
|
||||
"strip" is introduced later and it should accept more scenarios like Unicode.
|
||||
Although all of "trim" and "strip" can handle Unicode in Java.
|
||||
|
||||
\section string__op__split Split
|
||||
|
||||
This namespace provides Python-like string split functions.
|
||||
It has 3 variants for different use cases:
|
||||
|
||||
\code
|
||||
LazySplit lazy_split(const std::u8string_view& strl, const std::u8string_view& delimiter);
|
||||
std::vector<std::u8string_view> split(const std::u8string_view& strl, const std::u8string_view& delimiter);
|
||||
std::vector<std::u8string> split_owned(const std::u8string_view& strl, const std::u8string_view& delimiter);
|
||||
\endcode
|
||||
|
||||
All these overloads take a string view as the first argument representing the string need to be split.
|
||||
The second argument is a string view representing the delimiter for splitting.
|
||||
|
||||
The first function #lazy_split returns a #LazySplit object that can be used in range-based for loops.
|
||||
This is lazy-computed and memory-efficient for large datasets.
|
||||
The second function #split returns a vector of string views, which is memory-efficient
|
||||
but the views are only valid as long as the original string remains valid.
|
||||
The third function #split_owned returns a vector of strings, which are copies of the original parts.
|
||||
|
||||
If the source string (the string need to be split) is empty, or the delimiter is empty,
|
||||
the result will only has 1 item and this item is source string itself.
|
||||
There is no way that these methods return an empty list, except the code is buggy.
|
||||
|
||||
*/
|
||||
}
|
||||
@@ -89,13 +89,25 @@ Same as UTF8 string pointer, we also have as_ordinary() and as_ordinary_view() d
|
||||
Try to do your own research and figure out how to use them.
|
||||
It's pretty easy.
|
||||
|
||||
\section string__reinterpret__windows_warns Warnings to Windows Programmer
|
||||
\section string__reinterpret__clarification Clarification about Usage Scenario
|
||||
|
||||
Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
|
||||
Let we make a clarification for what this chapter are talking about.
|
||||
In these chapter, what we are talking about the convertion between UTF8 string and ordinary string,
|
||||
which is originally encoded by UTF-8 but presented by \c char type.
|
||||
This spot is crucial. If you apply any functions provided by this namespace to any string which is not encoded by UTF-8,
|
||||
for example, trying converting an CP1252 encoded western europe string to UTF-8 via function given by this namespace,
|
||||
it must cause <B>undefined behavior</B>.
|
||||
|
||||
The correct function for doing these things introduced above is located in yycc::encoding namespace,
|
||||
or a more generic module located in yycc::carton::pycodec.
|
||||
This namespace is only suit for the convertion of UTF-8 string which was mis-presented by non-<TT>char8_t</TT> types.
|
||||
After understand this point, you now can safely use this namespace.
|
||||
|
||||
Additionally, due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
|
||||
If you run the convertion code introduced in this article with the string which is not encoded with UTF8,
|
||||
it may cause undefined behavior.
|
||||
|
||||
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
|
||||
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC compiler.
|
||||
Thus you can use the functions introduced in this article safely.
|
||||
Otherwise, you must guarteen that the argument you provided to these functions is encoded by UTF8 manually.
|
||||
|
||||
@@ -1,149 +0,0 @@
|
||||
namespace YYCC::StringHelper {
|
||||
/**
|
||||
|
||||
\page string_helper String Helper
|
||||
|
||||
\section string_helper__printf Printf VPrintf
|
||||
|
||||
YYCC::StringHelper provides 4 functions for formatting string.
|
||||
These functions are mainly provided to programmer who can not use C++ 20 \c std::format feature.
|
||||
|
||||
\code
|
||||
bool Printf(yycc_u8string&, const yycc_char8_t*, ...);
|
||||
bool VPrintf(yycc_u8string&, const yycc_char8_t*, va_list argptr);
|
||||
yycc_u8string Printf(const yycc_char8_t*, ...);
|
||||
yycc_u8string VPrintf(const yycc_char8_t*, va_list argptr);
|
||||
\endcode
|
||||
|
||||
#Printf and #VPrintf is similar to \c std::sprintf and \c std::vsprintf.
|
||||
#Printf accepts UTF8 format string and variadic arguments specifying data to print.
|
||||
This is commonly used by programmer.
|
||||
However, #VPrintf also do the same work but its second argument is \c va_list,
|
||||
the representation of variadic arguments.
|
||||
It is mostly used by other function which has variadic arguments.
|
||||
|
||||
The only difference between these function and standard library functions is
|
||||
that you don't need to worry about whether the space of given buffer is enough,
|
||||
because these functions help you to calculate this internally.
|
||||
|
||||
There is the same design like we introduced in \ref encoding_helper.
|
||||
There are 2 overloads for #Printf and #VPrintf respectively.
|
||||
First overload return bool value and require a string container as argument for storing result.
|
||||
The second overload return result string directly.
|
||||
As you expected, first overload will return false if fail to format string (this is barely happened).
|
||||
and second overload will return empty string when formatter failed.
|
||||
|
||||
\section string_helper__replace Replace
|
||||
|
||||
YYCC::StringHelper provide 2 functions for programmer do string replacement:
|
||||
|
||||
\code
|
||||
void Replace(yycc_u8string&, const yycc_u8string_view&, const yycc_u8string_view&);
|
||||
yycc_u8string Replace(const yycc_u8string_view&, const yycc_u8string_view&, const yycc_u8string_view&);
|
||||
\endcode
|
||||
|
||||
The first overload will do replacement in given string container directly.
|
||||
The second overload will produce a copy of original string and do replacement on the copied string.
|
||||
|
||||
#Replace has special treatments for following scenarios:
|
||||
|
||||
\li If given string is empty, the return value will be empty.
|
||||
\li If the character sequence to be replaced is empty string, no replacement will happen.
|
||||
\li If the character sequence will be replaced into string is or empty, it will simply delete found character sequence from given string.
|
||||
|
||||
\section string_helper__join Join
|
||||
|
||||
YYCC::StringHelper provide an universal way for joining string and various specialized join functions.
|
||||
|
||||
\subsection string_helper__join__universal Universal Join Function
|
||||
|
||||
Because C++ list types are various.
|
||||
There is no unique and convenient way to create an universal join function.
|
||||
So we create #JoinDataProvider to describe join context.
|
||||
|
||||
Before using universal join function,
|
||||
you should setup #JoinDataProvider first, the context of join function.
|
||||
It actually is an \c std::function object which can be easily fetched by C++ lambda syntax.
|
||||
This function pointer accept a reference to \c yycc_u8string_view,
|
||||
programmer should set it to the string to be joined when at each calling.
|
||||
And this function pointer return a bool value to indicate the end of join.
|
||||
You can simply return \c false to terminate join process.
|
||||
The argument you assigned to argument will not be taken into join process when you return false.
|
||||
|
||||
Then, you can pass the created #JoinDataProvider object to #Join function.
|
||||
And specify delimiter at the same time.
|
||||
Then you can get the final joined string.
|
||||
There is an example:
|
||||
|
||||
\code
|
||||
std::vector<yycc_u8string> data {
|
||||
YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
|
||||
};
|
||||
auto iter = data.cbegin();
|
||||
auto stop = data.cend();
|
||||
auto joined_string = YYCC::StringHelper::Join(
|
||||
[&iter, &stop](yycc_u8string_view& view) -> bool {
|
||||
if (iter == stop) return false;
|
||||
view = *iter;
|
||||
++iter;
|
||||
return true;
|
||||
},
|
||||
delimiter
|
||||
);
|
||||
\endcode
|
||||
|
||||
\subsection string_helper__join__specialized Specialized Join Function
|
||||
|
||||
Despite universal join function,
|
||||
YYCC::StringHelper also provide a specialized join functions for standard library container.
|
||||
For example, the code written above can be written in following code by using this specialized overload.
|
||||
The first two argument is just the begin and end iterator.
|
||||
However, you must make sure that we can dereference it and then implicitly convert it to yycc_u8string_view.
|
||||
Otherwise this overload will throw template error.
|
||||
|
||||
\code
|
||||
std::vector<yycc_u8string> data {
|
||||
YYCC_U8(""), YYCC_U8("1"), YYCC_U8("2"), YYCC_U8("")
|
||||
};
|
||||
auto joined_string = YYCC::StringHelper::Join(data.begin(), data.end(), delimiter);
|
||||
\endcode
|
||||
|
||||
\section string_helper__lower_upper Lower Upper
|
||||
|
||||
String helper provides Python-like string lower and upper function.
|
||||
Both lower and upper function have 2 overloads:
|
||||
|
||||
\code
|
||||
yycc_u8string Lower(const yycc_u8string_view&);
|
||||
void Lower(yycc_u8string&);
|
||||
\endcode
|
||||
|
||||
First overload accepts a string view as argument and return a \b copy whose content are all the lower case of original string.
|
||||
Second overload accepts a mutable string container as argument and will make all characters stored in it become their lower case.
|
||||
You can choose on of them for your flavor and requirements.
|
||||
Upper also has similar 2 overloads.
|
||||
|
||||
\section string_helper__split Split
|
||||
|
||||
String helper provides Python-like string split function.
|
||||
It has 2 types for you:
|
||||
|
||||
\code
|
||||
std::vector<yycc_u8string> Split(const yycc_u8string_view&, const yycc_u8string_view&);
|
||||
std::vector<yycc_u8string_view> SplitView(const yycc_u8string_view&, const yycc_u8string_view&);
|
||||
\endcode
|
||||
|
||||
All these overloads take a string view as the first argument representing the string need to be split.
|
||||
The second argument is a string view representing the delimiter for splitting.
|
||||
The only difference between these 2 split function are overt according to their names.
|
||||
The first split function will return a list of copied string as its split result.
|
||||
The second split function will return a list of string view as its split result,
|
||||
and it will keep valid as long as the life time of your given string view argument.
|
||||
It also means that the last overload will cost less memory if you don't need the copy of original string.
|
||||
|
||||
If the source string (the string need to be split) is empty, or the delimiter is empty,
|
||||
the result will only has 1 item and this item is source string itself.
|
||||
There is no way that these methods return an empty list, except the code is buggy.
|
||||
|
||||
*/
|
||||
}
|
||||
Reference in New Issue
Block a user