1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-07-27 00:37:45 +00:00

LibUnicode: Support code point names that apply to ranges of code points

For example, consider the following adjacent entries in UnicodeData.txt:

    3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
    4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;

Our current implementation would assign the display name "CJK Ideograph
Extension A" to code points U+3400 & U+4DBF, but not to the code points
in between. Not only should those code points be assigned a name, but
the Unicode spec also has formatting rules on what the names should be
(the names for these ranged code points are not as they appear in
UnicodeData.txt).

The spec also defines names for code point ranges that actually are
listed individually in UnicodeData.txt. For example:

    2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;;
    2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;;
    2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;;

Code points are only coalesced into a range if all fields after the name
are equivalent. Our parser will insert the range and its name formatting
pattern when it comes across the first code point in that range, then
ignore other code points in that range. This reduces the number of names
we generated by nearly 2,000.
This commit is contained in:
Timothy Flynn 2021-11-23 08:24:13 -05:00 committed by Andreas Kling
parent f2f4980f15
commit 7e6ad172a4
4 changed files with 137 additions and 51 deletions

View file

@ -222,13 +222,10 @@ u32 to_unicode_uppercase(u32 code_point)
#endif
}
Optional<StringView> code_point_display_name([[maybe_unused]] u32 code_point)
Optional<String> code_point_display_name([[maybe_unused]] u32 code_point)
{
#if ENABLE_UNICODE_DATA
auto name = Detail::code_point_display_name(code_point);
if (name.is_null())
return {};
return name;
return Detail::code_point_display_name(code_point);
#else
return {};
#endif

View file

@ -19,7 +19,7 @@ namespace Unicode {
u32 to_unicode_lowercase(u32 code_point);
u32 to_unicode_uppercase(u32 code_point);
Optional<StringView> code_point_display_name(u32 code_point);
Optional<String> code_point_display_name(u32 code_point);
String to_unicode_lowercase_full(StringView, Optional<StringView> locale = {});
String to_unicode_uppercase_full(StringView, Optional<StringView> locale = {});