1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-05-14 12:54:58 +00:00
Commit graph

6 commits

Author SHA1 Message Date
Timothy Flynn
7dce2bfe23 LibUnicode: Generate separate tables for General Category properties
Previously, each code point's General Category was part of the generated
UnicodeData structure. This ultimately presented two problems, one
functional and one performance related:

  * Some General Categories are applied to unassigned code points, for
    example the Unassigned (Cn) category. Unassigned code points are
    strictly excluded from UnicodeData.txt, so by relying on that file,
    the generator is unable to handle these categories.

  * Lookups for General Categories are slower when searching through the
    large UnicodeData hash map. Even though lookups are O(1), the hash
    function turned out to be slower than binary searching through a
    category-specific table.

So, now a table is generated for each General Category. When querying a
code point for a category, a binary search is done on each code point
range in that category's table to check if code point has that category.

Further, General Categories are now parsed from the UCD file
DerivedGeneralCategory.txt. This file is a normal "prop list" file and
contains the categories for unassigned code points.
2021-08-11 13:11:01 +02:00
Timothy Flynn
c4bfda7f7f LibUnicode: Handle code points that are both cased and case-ignorable
Apparently, some code points fit both categories, for example U+0345
(COMBINING GREEK YPOGEGRAMMENI). Handle this fact when determining if
a code point is a final code point in a string.
2021-07-28 23:42:29 +02:00
Timothy Flynn
7827aede6f LibUnicode: Check word break when deciding on case-ignorable code points 2021-07-28 23:42:29 +02:00
Timothy Flynn
c45a014645 LibUnicode: Check property list when deciding if a code point is cased 2021-07-28 23:42:29 +02:00
Timothy Flynn
39f971e42b LibUnicode: Begin implementing special Unicode case folding
This implements unconditional special case folding, and conditional
folding for non-locale cases. Worth noting that the only conditional,
non-locale special case is for converting an uppercase sigma to
lowercase.
2021-07-27 21:04:36 +01:00
Timothy Flynn
4dda3edc9e LibUnicode: Introduce a Unicode library for interacting with UCD files
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.

As a start, LibUnicode includes upper- and lower-case code point
converters.
2021-07-26 17:03:55 +01:00