doc: finish string reinterpret doc
This commit is contained in:
2
LICENSE
2
LICENSE
@@ -1,6 +1,6 @@
|
|||||||
The MIT License (MIT)
|
The MIT License (MIT)
|
||||||
|
|
||||||
Copyright (c) 2024-2025 yyc12345
|
Copyright (c) 2024-2026 yyc12345
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
|||||||
@@ -1,35 +1,37 @@
|
|||||||
namespace YYCC::EnumHelper {
|
namespace yycc::cenum {
|
||||||
/**
|
/**
|
||||||
|
|
||||||
\page enum_helper Scoped Enum Helper
|
\page cenum Scoped Enum Helper
|
||||||
|
|
||||||
\section enum_helper__intro Intro
|
\section cenum__intro Intro
|
||||||
|
|
||||||
C++ introduce a new enum called scoped enum.
|
C++ introduce a new enum called scoped enum.
|
||||||
It is better than legacy C enum because it will not leak name into namespace where it locate,
|
It is better than legacy C enum because it will not leak name into namespace where it locate,
|
||||||
and also can specify an underlying type to it to make sure it is stored as specified size.
|
and also can specify an underlying type to it to make sure it is stored as specified size.
|
||||||
However, the shortcoming of it is that it lack bitwise operator comparing with legacy C enum.
|
However, the shortcoming of it is that it lack bitwise operator comparing with legacy C enum.
|
||||||
Programmer must implement them for scoped enum one by one.
|
Programmer must implement them for scoped enum one by one but it is a hardship and inconvenient.
|
||||||
It is a hardship and inconvenient.
|
This is the reason why I invent this class.
|
||||||
This is the reason why I invent this class
|
And this is the reason why I call this module "cenum"
|
||||||
|
because it gives scoped enum type with the same abilities of legacy C enum.
|
||||||
|
|
||||||
\section enum_helper__Usage Usage
|
\section cenum__Usage Usage
|
||||||
|
|
||||||
In this namespace, we provide all bitwise functions related to scoped enum type which may be used.
|
In this namespace, we provide all bitwise functions related to scoped enum type which may be used.
|
||||||
See YYCC::EnumHelper for more detail (It is more clear to read function annotation than I introduce in there repeatedly).
|
See yycc::cenum for more detail (It is more clear to read function annotation than I introduce in there repeatedly).
|
||||||
|
|
||||||
\section enum_helper__why Why not Operator Overload
|
\section cenum__why Why not Operator Overload Way
|
||||||
|
|
||||||
I have try it (and you even can see the relic of it in source code).
|
I have try it (and you even can see the relic of it in source code).
|
||||||
But it need a extra statement written in following to include it, otherwise compiler can not see it.
|
But it need a extra statement written in following to include it, otherwise compiler can not see it.
|
||||||
|
|
||||||
\code
|
\code
|
||||||
using namespace YYCC::EnumHelper;
|
using namespace yycc::cenum;
|
||||||
\endcode
|
\endcode
|
||||||
|
|
||||||
Another reason why I do not use this method is that
|
The last and most important reason why I do not use this method is that
|
||||||
this overload strategy may be applied to some type which should not be applied by accient, such as non-scoped enum type.
|
this overload strategy may be applied to some type which should not be applied by accient, such as non-scoped enum type.
|
||||||
So I gave up this solution.
|
So I gave up this solution.
|
||||||
|
It is much better that order user explicitly specify when to use them.
|
||||||
|
|
||||||
*/
|
*/
|
||||||
}
|
}
|
||||||
@@ -10,7 +10,7 @@
|
|||||||
<TD><CENTER>
|
<TD><CENTER>
|
||||||
<B>YYCCommonplace Programming Manual</B>
|
<B>YYCCommonplace Programming Manual</B>
|
||||||
|
|
||||||
Copyright 2024 by yyc12345.
|
Copyright 2024-2026 by yyc12345.
|
||||||
</CENTER></TD>
|
</CENTER></TD>
|
||||||
</TR>
|
</TR>
|
||||||
</TABLE>
|
</TABLE>
|
||||||
@@ -25,7 +25,7 @@
|
|||||||
<TR>
|
<TR>
|
||||||
<TD ALIGN="LEFT" VALIGN="TOP">
|
<TD ALIGN="LEFT" VALIGN="TOP">
|
||||||
|
|
||||||
<B>General Features</B>
|
<B>Overviews</B>
|
||||||
|
|
||||||
\li \subpage intro
|
\li \subpage intro
|
||||||
|
|
||||||
@@ -35,7 +35,9 @@
|
|||||||
|
|
||||||
\li \subpage macro
|
\li \subpage macro
|
||||||
|
|
||||||
\li \subpage string_reinterpret
|
\li \subpage cenum
|
||||||
|
|
||||||
|
\li \subpage string__reinterpret
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
\li \subpage string_helper
|
\li \subpage string_helper
|
||||||
@@ -50,7 +52,6 @@
|
|||||||
|
|
||||||
\li \subpage std_patch
|
\li \subpage std_patch
|
||||||
|
|
||||||
\li \subpage enum_helper
|
|
||||||
-->
|
-->
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
@@ -64,7 +65,7 @@
|
|||||||
</TD>
|
</TD>
|
||||||
<TD ALIGN="LEFT" VALIGN="TOP">
|
<TD ALIGN="LEFT" VALIGN="TOP">
|
||||||
|
|
||||||
<B>Advanced Features</B>
|
<B>Advanced Features (Carton)</B>
|
||||||
|
|
||||||
<B>Windows Specific Features</B>
|
<B>Windows Specific Features</B>
|
||||||
|
|
||||||
|
|||||||
@@ -6,7 +6,7 @@ namespace yycc::macro {
|
|||||||
In this page we will introduce the macros defined by this library
|
In this page we will introduce the macros defined by this library
|
||||||
which can not be grouped in other topic.
|
which can not be grouped in other topic.
|
||||||
|
|
||||||
\section macro__version_cmp Library Version and Version Comparison
|
\section macro__version Library Version
|
||||||
|
|
||||||
Version is a important things in modern software development, especially for a library.
|
Version is a important things in modern software development, especially for a library.
|
||||||
In YYCC, we use Semantic Versioning as our version standard.
|
In YYCC, we use Semantic Versioning as our version standard.
|
||||||
@@ -16,6 +16,8 @@ First, YYCC has its own version and it can be visited by
|
|||||||
\c YYCC_VER_MAJOR, \c YYCC_VER_MINOR, and \c YYCC_VER_PATCH.
|
\c YYCC_VER_MAJOR, \c YYCC_VER_MINOR, and \c YYCC_VER_PATCH.
|
||||||
Each part of Semantic Versioning is provided individually.
|
Each part of Semantic Versioning is provided individually.
|
||||||
|
|
||||||
|
\section macro__version_cmp Version Comparison
|
||||||
|
|
||||||
YYCC also provide a bunch of macros to compare 2 versions.
|
YYCC also provide a bunch of macros to compare 2 versions.
|
||||||
It also provides a way to check YYCC version in program using YYCC,
|
It also provides a way to check YYCC version in program using YYCC,
|
||||||
because some of them rely on a specific version of YYCC.
|
because some of them rely on a specific version of YYCC.
|
||||||
|
|||||||
@@ -1,28 +1,18 @@
|
|||||||
namespace yycc {
|
namespace yycc::string::reinterpret {
|
||||||
/**
|
/**
|
||||||
|
|
||||||
\page string_reinterpret String Reinterpret
|
\page string__reinterpret String Reinterpret
|
||||||
|
|
||||||
Now, you have know that we use UTF8 string everywhere in this project
|
Now, you have know that we use UTF8 string everywhere in this project
|
||||||
as we introduced in \ref premise_and_principle__string_encoding.
|
as we introduced in \ref premise_and_principle__string_encoding.
|
||||||
Now it's time to know how to fetch UTF8 string from user or anywhere else.
|
Now it's time to know how to fetch UTF8 string from user or anywhere else.
|
||||||
|
|
||||||
|
\section string__reinterpret__concept Concepts
|
||||||
\section string_reinterpret__utf8_type UTF8 Type
|
|
||||||
|
|
||||||
After upgrade the whole project into C++23, \c char8_t is the only valid UTF8 char type.
|
|
||||||
And \c std::u8string and \c std::u8string_view are the only valid UTF8 string container and viewer.
|
|
||||||
Additionally, \c u8 string literal prefix is the only way to create UTF8 string literal.
|
|
||||||
|
|
||||||
All in all, please use this library provided string functions with UTF8 format.
|
|
||||||
|
|
||||||
\section string_reinterpret__concept Concepts
|
|
||||||
|
|
||||||
In following content, you may be face with 2 words: ordinary string and UTF8 string.
|
In following content, you may be face with 2 words: ordinary string and UTF8 string.
|
||||||
|
|
||||||
UTF8 string, as its name, is the string encoded with UTF8.
|
UTF8 string, as its name, is the string encoded with UTF8.
|
||||||
The char type of it must is \c yycc_char8_t.
|
The char type of it must is \c char8_t.
|
||||||
(equivalent to \c char8_t after C++ 20.)
|
|
||||||
|
|
||||||
Ordinary string means the plain, native string.
|
Ordinary string means the plain, native string.
|
||||||
The result of C++ string literal without any prefix \c "foo bar" is a rdinary string.
|
The result of C++ string literal without any prefix \c "foo bar" is a rdinary string.
|
||||||
@@ -33,145 +23,77 @@ Its encoding depends on compiler and environment.
|
|||||||
For more infomation, please browse CppReference:
|
For more infomation, please browse CppReference:
|
||||||
https://en.cppreference.com/w/cpp/language/string_literal
|
https://en.cppreference.com/w/cpp/language/string_literal
|
||||||
|
|
||||||
\section string_reinterpret__utf8_literal UTF8 Literal
|
\section string__reinterpret__pointer UTF8 String Pointer
|
||||||
|
|
||||||
String literal is a C++ concept.
|
|
||||||
If you are not familar with it, please browse related article first, such as CppReference.
|
|
||||||
|
|
||||||
\subsection string_reinterpret__utf8_literal__single Single Literal
|
|
||||||
|
|
||||||
In short words, YYCC allow you declare an UTF8 literal like this:
|
|
||||||
|
|
||||||
\code
|
|
||||||
YYCC_U8("This is UTF8 literal.")
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
YYCC_U8 is macro.
|
|
||||||
You don't need add extra \c u8 prefix in string given to the macro.
|
|
||||||
This macro will do this automatically.
|
|
||||||
|
|
||||||
In detail, this macro do a \c reinterpret_cast to change the type of given argument to \c const \c yycc_char8_t* forcely.
|
|
||||||
This ensure that declared UTF8 literal is compatible with YYCC UTF8 types.
|
|
||||||
|
|
||||||
\subsection string_reinterpret__utf8_literal__char Single Char
|
|
||||||
|
|
||||||
Same as UTF8 literal, YYCC allow you cast normal \c char into \c yycc_char8_t as following code:
|
|
||||||
|
|
||||||
\code
|
|
||||||
YYCC_U8_CHAR('A')
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
YYCC_U8_CHAR is a macro.
|
|
||||||
It just simply use \c static_cast to cast given value to \c yycc_char8_t.
|
|
||||||
It doesn't mean that you can cast non-ASCII characters,
|
|
||||||
because the space these characters occupied usually more than the maximum value of \c char.
|
|
||||||
For example, following code is \b invalid:
|
|
||||||
|
|
||||||
\code
|
|
||||||
YYCC_U8_CHAR('文') // INVALID!
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
\subsection string_reinterpret__utf8_literal__concatenation Literal Concatenation
|
|
||||||
|
|
||||||
YYCC_U8 macro also works for string literal concatenation:
|
|
||||||
|
|
||||||
\code
|
|
||||||
YYCC_U8("Error code: " PRIu32 ". Please contact me.");
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
According to C++ standard for string literal concatenation,
|
|
||||||
<I>"If one of the strings has an encoding prefix and the other does not, the one that does not will be considered to have the same encoding prefix as the other."</I>
|
|
||||||
At the same time, YYCC_U8 macro will automatically add \c u8 prefix for the first component of this string literal concatenation.
|
|
||||||
So the whole string will be UTF8 literal.
|
|
||||||
It also order you should \b not add any prefix for other components of this string literal concatenation.
|
|
||||||
|
|
||||||
\subsection string_reinterpret__utf8_literal__why Why?
|
|
||||||
|
|
||||||
You may know that C++ standard allows programmer declare an UTF8 literal explicitly by writing code like this:
|
|
||||||
|
|
||||||
\code
|
|
||||||
u8"foo bar"
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
This is okey. But it may incompatible with YYCC UTF8 char type.
|
|
||||||
According to C++ standard, this UTF8 literal syntax will only return \c const \c char8_t* if your C++ standard higher or equal to C++ 20,
|
|
||||||
otherwise it will return \c const \c char*.
|
|
||||||
This behavior cause that you can not assign this UTF8 literal to \c yycc_u8string if you are in the environment which do not support \c char8_t,
|
|
||||||
because their types are different.
|
|
||||||
Thereas you can not use the functions provided by this library because they are all use YYCC defined UTF8 char type.
|
|
||||||
|
|
||||||
\section string_reinterpret__utf8_pointer UTF8 String Pointer
|
|
||||||
|
|
||||||
String pointer means the raw pointer pointing to a string, such as \c const \c char*, \c char*, \c char32_t* and etc.
|
String pointer means the raw pointer pointing to a string, such as \c const \c char*, \c char*, \c char32_t* and etc.
|
||||||
|
|
||||||
Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c yycc_char8_t.
|
Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c char8_t.
|
||||||
YYCC provides YYCC::EncodingHelper::ToUTF8 to resolve this issue. There is an exmaple:
|
YYCC provides as_utf8() to resolve this issue. There is an exmaple:
|
||||||
|
|
||||||
\code
|
\code
|
||||||
const char* absolutely_is_utf8 = "I confirm this is encoded with UTF8.";
|
const char* absolutely_is_utf8 = "I confirm this is encoded with UTF8.";
|
||||||
const yycc_char8_t* converted = YYCC::EncodingHelper::ToUTF8(absolutely_is_utf8);
|
const char8_t* converted = as_utf8(absolutely_is_utf8);
|
||||||
|
|
||||||
char* mutable_utf8 = const_cast<char*>(absolutely_is_utf8); // This is not safe. Just for example.
|
char* mutable_utf8 = const_cast<char*>(absolutely_is_utf8); // This is not safe. Just for example.
|
||||||
yycc_char8_t* mutable_converted = YYCC::EncodingHelper::ToUTF8(mutable_utf8);
|
char8_t* mutable_converted = as_utf8(mutable_utf8);
|
||||||
\endcode
|
\endcode
|
||||||
|
|
||||||
YYCC::EncodingHelper::ToUTF8 has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.
|
as_utf8() has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.
|
||||||
|
|
||||||
YYCC also has ability that convert YYCC UTF8 char type to ordinary char type by YYCC::EncodingHelper::ToOrdinary.
|
YYCC also has ability that convert UTF8 char type to ordinary char type by as_ordinary().
|
||||||
Here is an exmaple:
|
Here is an exmaple:
|
||||||
|
|
||||||
\code
|
\code
|
||||||
const yycc_char8_t* yycc_utf8 = YYCC_U8("I am UTF8 string.");
|
const char8_t* utf8 = u8"I am UTF8 string.";
|
||||||
const char* converted = YYCC::EncodingHelper::ToOrdinary(yycc_utf8);
|
const char* converted = as_ordinary(utf8);
|
||||||
|
|
||||||
yycc_char8_t* mutable_yycc_utf8 = const_cast<char*>(yycc_utf8); // Not safe. Also just for example.
|
char8_t* mutable_utf8 = const_cast<char*>(utf8); // Not safe. Also just for example.
|
||||||
char* mutable_converted = YYCC::EncodingHelper::ToOrdinary(mutable_yycc_utf8);
|
char* mutable_converted = as_ordinary(mutable_utf8);
|
||||||
\endcode
|
\endcode
|
||||||
|
|
||||||
Same as YYCC::EncodingHelper::ToUTF8, YYCC::EncodingHelper::ToOrdinary also has 2 overloads to handle constant and mutable string pointer.
|
Same as as_utf8(), as_ordinary() also has 2 overloads to handle constant and mutable string pointer.
|
||||||
|
|
||||||
\section string_reinterpret__utf8_container UTF8 String Container
|
\section string__reinterpret__container UTF8 String Container
|
||||||
|
|
||||||
String container usually means the standard library string container, such as \c std::string, \c std::wstring, \c std::u32string and etc.
|
String container usually means the standard library string container, such as \c std::string, \c std::wstring, \c std::u32string and etc.
|
||||||
|
|
||||||
In many personal project, programmer may use \c std::string everywhere because \c std::u8string may not be presented when writing peoject.
|
In many personal project, programmer may use \c std::string everywhere because \c std::u8string may not be presented when writing peoject.
|
||||||
How to do convertion between ordinary string container and YYCC UTF8 string container?
|
How to do convertion between ordinary string container and UTF8 string container?
|
||||||
It is definitely illegal that directly do force convertion. Because they may have different class layout.
|
It is definitely illegal that directly do force convertion. Because they may have different class layout.
|
||||||
Calm down and I will tell you how to do correct convertion.
|
Calm down and I will tell you how to do correct convertion.
|
||||||
YYCC provides YYCC::EncodingHelper::ToUTF8 to convert ordinary string container to YYCC UTF8 string container.
|
YYCC provides as_utf8() to convert ordinary string container to UTF8 string container.
|
||||||
There is an exmaple:
|
There is an exmaple:
|
||||||
|
|
||||||
\code
|
\code
|
||||||
std::string ordinary_string("I am UTF8");
|
std::string ordinary_string("I am UTF8");
|
||||||
yycc_u8string yycc_string = YYCC::EncodingHelper::ToUTF8(ordinary_string);
|
std::u8string utf8_string = as_utf8(ordinary_string);
|
||||||
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
|
|
||||||
\endcode
|
\endcode
|
||||||
|
|
||||||
Actually, YYCC::EncodingHelper::ToUTF8 accepts a reference to \c std::string_view as argument.
|
Actually, as_utf8() accepts a reference to \c std::string_view as argument.
|
||||||
However, there is a implicit convertion from \c std::string to \c std::string_view,
|
However, there is a implicit convertion from \c std::string to \c std::string_view,
|
||||||
so you can directly pass a \c std::string instance to it.
|
so you can directly pass a \c std::string instance to it.
|
||||||
|
|
||||||
String view will reduce unnecessary memory copy.
|
String view will reduce unnecessary memory copy.
|
||||||
If you just want to pass ordinary string container to function, and this function accepts \c yycc_u8string_view as its argument,
|
If you just want to pass ordinary string container to function, and this function accepts \c std::u8string_view as its argument,
|
||||||
you can use alternative YYCC::EncodingHelper::ToUTF8View.
|
you can use alternative as_utf8_view().
|
||||||
|
|
||||||
\code
|
\code
|
||||||
std::string ordinary_string("I am UTF8");
|
std::string ordinary_string("I am UTF8");
|
||||||
yycc_u8string_view yycc_string = YYCC::EncodingHelper::ToUTF8View(ordinary_string);
|
std::u8string_view utf8_string = as_utf8_view(ordinary_string);
|
||||||
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
|
|
||||||
\endcode
|
\endcode
|
||||||
|
|
||||||
Comparing with previous one, this example use less memory.
|
Comparing with previous one, this example use less memory.
|
||||||
The reduced memory is the content of \c yycc_string because string view is a view, not the copy of original string.
|
The reduced memory is the content of \c utf8_string because string view is a view, not the copy of original string.
|
||||||
|
|
||||||
Same as UTF8 string pointer, we also have YYCC::EncodingHelper::ToOrdinary and YYCC::EncodingHelper::ToOrdinaryView do correspondant reverse convertion.
|
Same as UTF8 string pointer, we also have as_ordinary() and as_ordinary_view() do correspondant reverse convertion.
|
||||||
Try to do your own research and figure out how to use them.
|
Try to do your own research and figure out how to use them.
|
||||||
It's pretty easy.
|
It's pretty easy.
|
||||||
|
|
||||||
\section string_reinterpret__windows Warnings to Windows Programmer
|
\section string__reinterpret__windows_warns Warnings to Windows Programmer
|
||||||
|
|
||||||
Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
|
Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
|
||||||
If you run the convertion code introduced in this article with the string which is not encoded with UTF8, it may cause undefined behavior.
|
If you run the convertion code introduced in this article with the string which is not encoded with UTF8,
|
||||||
|
it may cause undefined behavior.
|
||||||
|
|
||||||
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
|
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
|
||||||
Thus you can use the functions introduced in this article safely.
|
Thus you can use the functions introduced in this article safely.
|
||||||
|
|||||||
Reference in New Issue
Block a user