serenity

mirror of https://github.com/RGBCube/serenity synced 2025-06-28 20:22:08 +00:00

History

Timothy Flynn fa96811a22 LibUnicode: Skip over emoji sequences in grapheme boundary segmentation Emoji sequences in the grapheme segmentation spec are a bit tricky: \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic} Our current strategy of tracking a boolean to indicate if we are in an emoji sequence was causing us to break up emoji made of multiple sub- sequences. For example, in the "family: man, woman, girl, boy" sequence: U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 We would break at indices 0 (correctly) and 6 (incorrectly). Instead of tracking a boolean, it's quite a bit simpler to reason about emoji sequences by just skipping past them entirely. Note that in cases like the above emoji, we skip one sub-sequence at a time.		2023-02-25 22:23:39 +01:00
..
CharacterTypes.cpp	LibUnicode+LibJS: Move text segmentation algorithms to their own files	2023-02-15 12:36:47 +01:00
CharacterTypes.h	LibUnicode+LibJS: Move text segmentation algorithms to their own files	2023-02-15 12:36:47 +01:00
CMakeLists.txt	LibUnicode+LibJS: Move text segmentation algorithms to their own files	2023-02-15 12:36:47 +01:00
CurrencyCode.cpp	LibUnicode+LibJS: Move Unicode::get_available_currencies() to Locale.h	2022-09-05 14:37:16 -04:00
CurrencyCode.h	LibUnicode+LibJS: Move Unicode::get_available_currencies() to Locale.h	2022-09-05 14:37:16 -04:00
Emoji.cpp	LibUnicode: Add a method to check if a code point could start an emoji	2023-02-24 19:48:47 +01:00
Emoji.h	LibUnicode: Add a method to check if a code point could start an emoji	2023-02-24 19:48:47 +01:00
Forward.h	LibUnicode: Add decomposition mappings and Unicode normalization	2022-10-06 08:24:39 -04:00
Normalize.cpp	LibUnicode: Return a String from Unicode normalization	2023-01-15 01:00:20 +00:00
Normalize.h	LibUnicode: Return a String from Unicode normalization	2023-01-15 01:00:20 +00:00
Segmentation.cpp	LibUnicode: Skip over emoji sequences in grapheme boundary segmentation	2023-02-25 22:23:39 +01:00
Segmentation.h	LibUnicode: Remove non-iterative text segmentation algorithms	2023-02-16 11:18:53 +01:00
String.cpp	AK+LibUnicode: Provide Unicode-aware caseless String matching	2023-01-18 14:43:40 +00:00
UnicodeUtils.cpp	LibUnicode: Use iterative text segmentation algorithms for titlecasing	2023-02-16 11:18:53 +01:00
UnicodeUtils.h	LibUnicode: Parse and generate case folding code point data	2023-01-18 14:43:40 +00:00