1
0

doc: finish string reinterpret doc

This commit is contained in:
2025-12-25 15:43:43 +08:00
parent 337734d340
commit 6dbd031e00
5 changed files with 52 additions and 125 deletions

View File

@@ -1,6 +1,6 @@
The MIT License (MIT)
Copyright (c) 2024-2025 yyc12345
Copyright (c) 2024-2026 yyc12345
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -1,35 +1,37 @@
namespace YYCC::EnumHelper {
namespace yycc::cenum {
/**
\page enum_helper Scoped Enum Helper
\page cenum Scoped Enum Helper
\section enum_helper__intro Intro
\section cenum__intro Intro
C++ introduce a new enum called scoped enum.
It is better than legacy C enum because it will not leak name into namespace where it locate,
and also can specify an underlying type to it to make sure it is stored as specified size.
However, the shortcoming of it is that it lack bitwise operator comparing with legacy C enum.
Programmer must implement them for scoped enum one by one.
It is a hardship and inconvenient.
This is the reason why I invent this class
Programmer must implement them for scoped enum one by one but it is a hardship and inconvenient.
This is the reason why I invent this class.
And this is the reason why I call this module "cenum"
because it gives scoped enum type with the same abilities of legacy C enum.
\section enum_helper__Usage Usage
\section cenum__Usage Usage
In this namespace, we provide all bitwise functions related to scoped enum type which may be used.
See YYCC::EnumHelper for more detail (It is more clear to read function annotation than I introduce in there repeatedly).
See yycc::cenum for more detail (It is more clear to read function annotation than I introduce in there repeatedly).
\section enum_helper__why Why not Operator Overload
\section cenum__why Why not Operator Overload Way
I have try it (and you even can see the relic of it in source code).
But it need a extra statement written in following to include it, otherwise compiler can not see it.
\code
using namespace YYCC::EnumHelper;
using namespace yycc::cenum;
\endcode
Another reason why I do not use this method is that
The last and most important reason why I do not use this method is that
this overload strategy may be applied to some type which should not be applied by accient, such as non-scoped enum type.
So I gave up this solution.
It is much better that order user explicitly specify when to use them.
*/
}

View File

@@ -10,7 +10,7 @@
<TD><CENTER>
<B>YYCCommonplace Programming Manual</B>
Copyright 2024 by yyc12345.
Copyright 2024-2026 by yyc12345.
</CENTER></TD>
</TR>
</TABLE>
@@ -25,7 +25,7 @@
<TR>
<TD ALIGN="LEFT" VALIGN="TOP">
<B>General Features</B>
<B>Overviews</B>
\li \subpage intro
@@ -35,7 +35,9 @@
\li \subpage macro
\li \subpage string_reinterpret
\li \subpage cenum
\li \subpage string__reinterpret
<!--
\li \subpage string_helper
@@ -50,7 +52,6 @@
\li \subpage std_patch
\li \subpage enum_helper
-->
<!--
@@ -64,7 +65,7 @@
</TD>
<TD ALIGN="LEFT" VALIGN="TOP">
<B>Advanced Features</B>
<B>Advanced Features (Carton)</B>
<B>Windows Specific Features</B>

View File

@@ -6,7 +6,7 @@ namespace yycc::macro {
In this page we will introduce the macros defined by this library
which can not be grouped in other topic.
\section macro__version_cmp Library Version and Version Comparison
\section macro__version Library Version
Version is a important things in modern software development, especially for a library.
In YYCC, we use Semantic Versioning as our version standard.
@@ -16,6 +16,8 @@ First, YYCC has its own version and it can be visited by
\c YYCC_VER_MAJOR, \c YYCC_VER_MINOR, and \c YYCC_VER_PATCH.
Each part of Semantic Versioning is provided individually.
\section macro__version_cmp Version Comparison
YYCC also provide a bunch of macros to compare 2 versions.
It also provides a way to check YYCC version in program using YYCC,
because some of them rely on a specific version of YYCC.

View File

@@ -1,28 +1,18 @@
namespace yycc {
namespace yycc::string::reinterpret {
/**
\page string_reinterpret String Reinterpret
\page string__reinterpret String Reinterpret
Now, you have know that we use UTF8 string everywhere in this project
as we introduced in \ref premise_and_principle__string_encoding.
Now it's time to know how to fetch UTF8 string from user or anywhere else.
\section string_reinterpret__utf8_type UTF8 Type
After upgrade the whole project into C++23, \c char8_t is the only valid UTF8 char type.
And \c std::u8string and \c std::u8string_view are the only valid UTF8 string container and viewer.
Additionally, \c u8 string literal prefix is the only way to create UTF8 string literal.
All in all, please use this library provided string functions with UTF8 format.
\section string_reinterpret__concept Concepts
\section string__reinterpret__concept Concepts
In following content, you may be face with 2 words: ordinary string and UTF8 string.
UTF8 string, as its name, is the string encoded with UTF8.
The char type of it must is \c yycc_char8_t.
(equivalent to \c char8_t after C++ 20.)
The char type of it must is \c char8_t.
Ordinary string means the plain, native string.
The result of C++ string literal without any prefix \c "foo bar" is a rdinary string.
@@ -33,145 +23,77 @@ Its encoding depends on compiler and environment.
For more infomation, please browse CppReference:
https://en.cppreference.com/w/cpp/language/string_literal
\section string_reinterpret__utf8_literal UTF8 Literal
String literal is a C++ concept.
If you are not familar with it, please browse related article first, such as CppReference.
\subsection string_reinterpret__utf8_literal__single Single Literal
In short words, YYCC allow you declare an UTF8 literal like this:
\code
YYCC_U8("This is UTF8 literal.")
\endcode
YYCC_U8 is macro.
You don't need add extra \c u8 prefix in string given to the macro.
This macro will do this automatically.
In detail, this macro do a \c reinterpret_cast to change the type of given argument to \c const \c yycc_char8_t* forcely.
This ensure that declared UTF8 literal is compatible with YYCC UTF8 types.
\subsection string_reinterpret__utf8_literal__char Single Char
Same as UTF8 literal, YYCC allow you cast normal \c char into \c yycc_char8_t as following code:
\code
YYCC_U8_CHAR('A')
\endcode
YYCC_U8_CHAR is a macro.
It just simply use \c static_cast to cast given value to \c yycc_char8_t.
It doesn't mean that you can cast non-ASCII characters,
because the space these characters occupied usually more than the maximum value of \c char.
For example, following code is \b invalid:
\code
YYCC_U8_CHAR('文') // INVALID!
\endcode
\subsection string_reinterpret__utf8_literal__concatenation Literal Concatenation
YYCC_U8 macro also works for string literal concatenation:
\code
YYCC_U8("Error code: " PRIu32 ". Please contact me.");
\endcode
According to C++ standard for string literal concatenation,
<I>"If one of the strings has an encoding prefix and the other does not, the one that does not will be considered to have the same encoding prefix as the other."</I>
At the same time, YYCC_U8 macro will automatically add \c u8 prefix for the first component of this string literal concatenation.
So the whole string will be UTF8 literal.
It also order you should \b not add any prefix for other components of this string literal concatenation.
\subsection string_reinterpret__utf8_literal__why Why?
You may know that C++ standard allows programmer declare an UTF8 literal explicitly by writing code like this:
\code
u8"foo bar"
\endcode
This is okey. But it may incompatible with YYCC UTF8 char type.
According to C++ standard, this UTF8 literal syntax will only return \c const \c char8_t* if your C++ standard higher or equal to C++ 20,
otherwise it will return \c const \c char*.
This behavior cause that you can not assign this UTF8 literal to \c yycc_u8string if you are in the environment which do not support \c char8_t,
because their types are different.
Thereas you can not use the functions provided by this library because they are all use YYCC defined UTF8 char type.
\section string_reinterpret__utf8_pointer UTF8 String Pointer
\section string__reinterpret__pointer UTF8 String Pointer
String pointer means the raw pointer pointing to a string, such as \c const \c char*, \c char*, \c char32_t* and etc.
Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c yycc_char8_t.
YYCC provides YYCC::EncodingHelper::ToUTF8 to resolve this issue. There is an exmaple:
Many legacy code assume \c char* is encoded with UTF8 (the exception is Windows). But \c char* is incompatible with \c char8_t.
YYCC provides as_utf8() to resolve this issue. There is an exmaple:
\code
const char* absolutely_is_utf8 = "I confirm this is encoded with UTF8.";
const yycc_char8_t* converted = YYCC::EncodingHelper::ToUTF8(absolutely_is_utf8);
const char8_t* converted = as_utf8(absolutely_is_utf8);
char* mutable_utf8 = const_cast<char*>(absolutely_is_utf8); // This is not safe. Just for example.
yycc_char8_t* mutable_converted = YYCC::EncodingHelper::ToUTF8(mutable_utf8);
char8_t* mutable_converted = as_utf8(mutable_utf8);
\endcode
YYCC::EncodingHelper::ToUTF8 has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.
as_utf8() has 2 overloads which can handle constant and mutable stirng pointer convertion respectively.
YYCC also has ability that convert YYCC UTF8 char type to ordinary char type by YYCC::EncodingHelper::ToOrdinary.
YYCC also has ability that convert UTF8 char type to ordinary char type by as_ordinary().
Here is an exmaple:
\code
const yycc_char8_t* yycc_utf8 = YYCC_U8("I am UTF8 string.");
const char* converted = YYCC::EncodingHelper::ToOrdinary(yycc_utf8);
const char8_t* utf8 = u8"I am UTF8 string.";
const char* converted = as_ordinary(utf8);
yycc_char8_t* mutable_yycc_utf8 = const_cast<char*>(yycc_utf8); // Not safe. Also just for example.
char* mutable_converted = YYCC::EncodingHelper::ToOrdinary(mutable_yycc_utf8);
char8_t* mutable_utf8 = const_cast<char*>(utf8); // Not safe. Also just for example.
char* mutable_converted = as_ordinary(mutable_utf8);
\endcode
Same as YYCC::EncodingHelper::ToUTF8, YYCC::EncodingHelper::ToOrdinary also has 2 overloads to handle constant and mutable string pointer.
Same as as_utf8(), as_ordinary() also has 2 overloads to handle constant and mutable string pointer.
\section string_reinterpret__utf8_container UTF8 String Container
\section string__reinterpret__container UTF8 String Container
String container usually means the standard library string container, such as \c std::string, \c std::wstring, \c std::u32string and etc.
In many personal project, programmer may use \c std::string everywhere because \c std::u8string may not be presented when writing peoject.
How to do convertion between ordinary string container and YYCC UTF8 string container?
How to do convertion between ordinary string container and UTF8 string container?
It is definitely illegal that directly do force convertion. Because they may have different class layout.
Calm down and I will tell you how to do correct convertion.
YYCC provides YYCC::EncodingHelper::ToUTF8 to convert ordinary string container to YYCC UTF8 string container.
YYCC provides as_utf8() to convert ordinary string container to UTF8 string container.
There is an exmaple:
\code
std::string ordinary_string("I am UTF8");
yycc_u8string yycc_string = YYCC::EncodingHelper::ToUTF8(ordinary_string);
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
std::u8string utf8_string = as_utf8(ordinary_string);
\endcode
Actually, YYCC::EncodingHelper::ToUTF8 accepts a reference to \c std::string_view as argument.
Actually, as_utf8() accepts a reference to \c std::string_view as argument.
However, there is a implicit convertion from \c std::string to \c std::string_view,
so you can directly pass a \c std::string instance to it.
String view will reduce unnecessary memory copy.
If you just want to pass ordinary string container to function, and this function accepts \c yycc_u8string_view as its argument,
you can use alternative YYCC::EncodingHelper::ToUTF8View.
If you just want to pass ordinary string container to function, and this function accepts \c std::u8string_view as its argument,
you can use alternative as_utf8_view().
\code
std::string ordinary_string("I am UTF8");
yycc_u8string_view yycc_string = YYCC::EncodingHelper::ToUTF8View(ordinary_string);
auto result = YYCC::EncodingHelper::UTF8ToUTF32(yycc_string);
std::u8string_view utf8_string = as_utf8_view(ordinary_string);
\endcode
Comparing with previous one, this example use less memory.
The reduced memory is the content of \c yycc_string because string view is a view, not the copy of original string.
The reduced memory is the content of \c utf8_string because string view is a view, not the copy of original string.
Same as UTF8 string pointer, we also have YYCC::EncodingHelper::ToOrdinary and YYCC::EncodingHelper::ToOrdinaryView do correspondant reverse convertion.
Same as UTF8 string pointer, we also have as_ordinary() and as_ordinary_view() do correspondant reverse convertion.
Try to do your own research and figure out how to use them.
It's pretty easy.
\section string_reinterpret__windows Warnings to Windows Programmer
\section string__reinterpret__windows_warns Warnings to Windows Programmer
Due to the legacy of MSVC, the encoding of \c char* may not be UTF8 in most cases.
If you run the convertion code introduced in this article with the string which is not encoded with UTF8, it may cause undefined behavior.
If you run the convertion code introduced in this article with the string which is not encoded with UTF8,
it may cause undefined behavior.
To enable UTF8 mode of MSVC, please deliver \c /utf-8 switch to MSVC.
Thus you can use the functions introduced in this article safely.