serenity

mirror of https://github.com/RGBCube/serenity synced 2025-10-21 04:22:08 +00:00

Author	SHA1	Message	Date
matcool	70d0c1616f	LibUnicode: Add decomposition mappings and Unicode normalization The mappings are exposed via `Unicode::code_point_decomposition(u32)` and `Unicode::code_point_decompositions()`, the latter being useful for reverse searching a code point from its decomposition. The normalization code does not make use of `Quick_Check` props (https://www.unicode.org/reports/tr44/#Decompositions_and_Normalization), meaning no quick check optimizations.	2022-10-06 08:24:39 -04:00
Timothy Flynn	b61eca0a1e	LibUncode: Parse and generate emoji code point data According to TR #51, the "best definition of the full set [of emojis] is in the emoji-test.txt file". This defines not only the emoji themselves, but the order in which they should be displayed, and what "group" of emojis they belong to.	2022-09-08 23:12:31 +01:00
Timothy Flynn	9e860d973e	LibLocale: Move locale source files to the LibLocale library Everything is now setup to create the LibLocale library and link it where needed.	2022-09-05 14:37:16 -04:00
Timothy Flynn	43a3471298	LibLocale: Move locale source files to the LibLocale folder These are still included in LibUnicode, but this updates their location and the include paths of other files which include them.	2022-09-05 14:37:16 -04:00
Timothy Flynn	1e0276f541	LibLocale+LibUnicode: Move generated CLDR data files to LibLocale folder They are still included into LibUnicode, but this moves their generated location to be under LibLocale.	2022-09-05 14:37:16 -04:00
Timothy Flynn	fc8bf7ac3e	LibUnicode+Userland: Migrate generated CLDR data to LibLocaleData Currently, LibUnicodeData contains the generated UCD and CLDR data. Move the UCD data to the main LibUnicode library, and rename LibUnicodeData to LibLocaleData. This is another prepatory change to migrate to LibLocale.	2022-09-05 14:37:16 -04:00
Timothy Flynn	89d1813b5d	LibUnicode: Move CLDR data generators to a LibLocale subfolder To prepare for placing all CLDR generated data in a new library, LibLocale, this moves the code generators for the CLDR data to the LibLocale subfolder.	2022-09-05 14:37:16 -04:00
Timothy Flynn	ea78bac36d	LibUnicode: Parse and generate per-locale plural rules from the CLDR Plural rules in the CLDR are of the form: "cs": { "pluralRule-count-one": "i = 1 and v = 0 @integer 1", "pluralRule-count-few": "i = 2..4 and v = 0 @integer 2~4", "pluralRule-count-many": "v != 0 @decimal 0.0~1.5, 10.0, 100.0 ...", "pluralRule-count-other": "@integer 0, 5~19, 100, 1000, 10000 ..." } The syntax is described here: https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax There are up to 2 sets of rules for each locale, a cardinal set and an ordinal set. The approach here is to generate a C++ function for each set of rules. Each condition in the rules (e.g. "i = 1 and v = 0") is transpiled to a C++ if-statement within its function. Then lookup tables are generated to match locales to their generated functions. NOTE: -Wno-parentheses-equality is added to the LibUnicodeData compile flags because the generated plural rules have lots of extra parentheses (because e.g. we need to selectively negate and combine rules). The code to generate only exactly the right number of parentheses is quite hairy, so this just tells the compiler to ignore the extras.	2022-07-08 11:51:54 +02:00
Timothy Flynn	789f093b2e	LibUnicode: Parse and generate relative-time format patterns Relative-time format patterns are of one of two forms: * Tensed - refer to the past or the future, e.g. "N years ago" or "in N years". * Numbered - refer to a specific numeric value, e.g. "in 1 year" becomes "next year" and "in 0 years" becomes "this year". In ECMA-402, tensed and numbered refer to the numeric formatting options of "always" and "auto", respectively.	2022-01-27 21:16:44 +00:00
Timothy Flynn	0a4430fc41	LibJS+LibTimeZone+LibUnicode: Remove direct linkage to LibTimeZone This is no longer needed now that LibTimeZone is included within LibC. Remove the direct linkage so that others do not mistakenly copy-paste the CMakeLists text elsewhere.	2022-01-23 12:48:26 +00:00
Timothy Flynn	8d35563f28	LibUnicode: Implement TR-35's localized GMT offset formatting This adds an API to use LibTimeZone to convert a time zone such as "America/New_York" to a GMT offset string like "GMT-5" (short form) or "GMT-05:00" (long form).	2022-01-11 23:56:35 +01:00
Timothy Flynn	498b741434	LibUnicode: Use LibTimeZone's list of time zone names LibUnicode no longer needs to generate a list of time zone names that it parsed from metaZones.json. We can defer to the TZDB for a golden list of time zones.	2022-01-08 12:45:34 +01:00
Timothy Flynn	1116a29c19	LibUnicode: Remove now unused Unicode symbol loader All generated sources are now linked via weak symbols.	2022-01-04 22:49:43 +00:00
Timothy Flynn	c417374dd6	LibUnicode: Remove linkage from LibUnicode to LibUnicodeData LibUnicodeData can now be loaded dynamically at runtime.	2021-12-21 13:09:49 -08:00
Timothy Flynn	3fd53baa25	LibUnicode: Dynamically load the generated UnicodeData symbols The generated data for libunicodedata.so is quite large, and loading it is a price paid by nearly every application by way of depending on LibRegex. In order to defer this cost until an application actually uses one of the surrounding APIs, dynamically load the generated symbols. To be able to load the symbols dynamically, the generated methods must have demangled names. Typically, this is accomplished with `extern "C"` blocks. The clang toolchain complains about this here because the types returned from the generators are strictly C++ types. So to demangle the names, we use the asm() compiler directive to manually define a symbol name; the caveat is that we must be sure the symbols are unique. As an extra precaution, we prefix each symbol name with "unicode_". For more details, see: https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html This symbol loader used in this implementation provides the additional benefit of removing many [[maybe_unused]] attributes from the LibUnicode methods. Internally, if ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF, the loader is able to stub out the function pointers it returns. Note that as of this commit, LibUnicode is still directly linked against LibUnicodeData. This commit is just a first step towards removing that.	2021-12-21 13:09:49 -08:00
Timothy Flynn	92233660b8	LibUnicode: Compile generated sources optimized for size This breaks LibUnicode into two libraries: LibUnicode containing the public APIs for accessing the library, and LibUnicodeData containing the generated source files. LibUnicodeData has compile options optimized for size, which save about 1MB of data in total.	2021-12-15 13:26:03 +00:00
Timothy Flynn	f471ecdbe9	LibUnicode: Parse and generate date, time, and date-time format patterns	2021-11-29 22:48:46 +00:00
Timothy Flynn	914675e826	LibJS+LibUnicode: Separate number formatting methods from Locale.h Currently, we generate separate data files for locale and number format related tables/methods, but provide public accessors for all of the data in one Locale.h file. Rather than continuing this trend for date-time, relative time, etc. formatting, it's a bit easier to reason about if the public accessors are also in separate files.	2021-11-29 22:48:46 +00:00
Timothy Flynn	e6334cb856	LibUnicode: Add some data related to currency codes This data is published under ISO-4217 as an XML file. Since we can't parse XML files yet, and the data isn't very large, it was translated to C++ manually here.	2021-09-11 11:05:50 +01:00
Andrew Kaster	e88761b2b9	Meta+LibUnicode: Move unicode_data helper to Meta/CMake Moving this helper CMake file to the centralized Meta/CMake folder helps to get a better grasp on what extra files are required for the build, and what files are generated. While we're at it, don't use add_compile_definitions for ENABLE_UNICODE_DATA, which only needs to be seen by LibUnicode sources.	2021-08-28 08:44:17 +01:00
Timothy Flynn	b7a95cba65	LibUnicode: Implement grammar validators for Unicode TR-35 ECMA-402 requires validating user input against the EBNF grammar for Unicode locales described in TR-35: https://www.unicode.org/reports/tr35 This commit adds validators for that grammar, as well as other helper to e.g. canonicalize a locale string.	2021-08-26 22:04:09 +01:00
Timothy Flynn	4dda3edc9e	LibUnicode: Introduce a Unicode library for interacting with UCD files The Unicode standard publishes the Unicode Character Database (UCD) with information about every code point, such as each code point's upper case mapping. LibUnicode exists to download and parse UCD files at build time and to provide accessors to that data. As a start, LibUnicode includes upper- and lower-case code point converters.	2021-07-26 17:03:55 +01:00

22 commits