1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-14 11:34:59 +00:00

LibUnicode: Perform code point case conversion lookups in constant time

Similar to commit 0652cc4, we now generate 2-stage lookup tables for
case conversion information. Only about 1500 code points are actually
cased. This means that case information is rather highly compressible,
as the blocks we break the code points into will generally all have no
casing information at all.

In total, this change:

    * Does not change the size of libunicode.so (which is nice because,
      generally, the 2-stage lookup tables are expected to trade a bit
      of size for performance).

    * Reduces the runtime of the new benchmark test case added here from
      1.383s to 1.127s (about an 18.5% improvement).
This commit is contained in:
Timothy Flynn 2023-07-26 12:54:05 -04:00 committed by Andreas Kling
parent 0ee133af90
commit 456211932f
3 changed files with 149 additions and 57 deletions

View file

@ -124,6 +124,16 @@ TEST_CASE(to_unicode_casefold)
EXPECT_EQ(result, "\u03B1\u0342\u03B9"sv);
}
BENCHMARK_CASE(casing)
{
for (size_t i = 0; i < 50'000; ++i) {
__test_to_unicode_lowercase();
__test_to_unicode_uppercase();
__test_to_unicode_titlecase();
__test_to_unicode_casefold();
}
}
TEST_CASE(to_unicode_lowercase_unconditional_special_casing)
{
// LATIN SMALL LETTER SHARP S