serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-28 23:22:33 +00:00

Author	SHA1	Message	Date
Timothy Flynn	5ac23d244d	LibUnicode: Generate separate tables for Unicode properties Similar to General Categories, this generates separate tables for the Property list.	2021-08-11 13:11:01 +02:00
Timothy Flynn	7dce2bfe23	LibUnicode: Generate separate tables for General Category properties Previously, each code point's General Category was part of the generated UnicodeData structure. This ultimately presented two problems, one functional and one performance related: * Some General Categories are applied to unassigned code points, for example the Unassigned (Cn) category. Unassigned code points are strictly excluded from UnicodeData.txt, so by relying on that file, the generator is unable to handle these categories. * Lookups for General Categories are slower when searching through the large UnicodeData hash map. Even though lookups are O(1), the hash function turned out to be slower than binary searching through a category-specific table. So, now a table is generated for each General Category. When querying a code point for a category, a binary search is done on each code point range in that category's table to check if code point has that category. Further, General Categories are now parsed from the UCD file DerivedGeneralCategory.txt. This file is a normal "prop list" file and contains the categories for unassigned code points.	2021-08-11 13:11:01 +02:00
Timothy Flynn	f5c1bbc00b	LibUnicode: Parse UCD Scripts.txt and generate as a Unicode property There are a couple of minor nuances with parsing script values, compared to other properties. In Scripts.txt, the UCD file lists the full name of each script; other properties, like General Category, list the shorter name in their primary files. This means that the aliases listed in PropertyValueAliases.txt are reversed for script values.	2021-08-04 13:50:32 +01:00
Timothy Flynn	16e86ae743	LibUnicode: Generate General Category unions and aliases This downloads the PropertyValueAliases.txt UCD file, which contains a set of General Category aliases. This changes the General Category enumeration to now be generated as a bitmask. This is to easily allow General Category unions. For example, the LC (Cased_Letter) category is the union of the Ll, Lu, and Lt categories.	2021-08-02 21:02:09 +04:30
Timothy Flynn	f1809db994	LibUnicode: Add public methods to compare and lookup Unicode properties Adds methods to retrieve a Unicode property from a string and to check if a code point matches a Unicode property. Also adds a <LibUnicode/Forward.h> header.	2021-07-30 21:26:31 +01:00

5 commits