mirror of
https://github.com/RGBCube/serenity
synced 2025-05-14 07:24:58 +00:00
![]() Emoji sequences in the grapheme segmentation spec are a bit tricky: \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic} Our current strategy of tracking a boolean to indicate if we are in an emoji sequence was causing us to break up emoji made of multiple sub- sequences. For example, in the "family: man, woman, girl, boy" sequence: U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 We would break at indices 0 (correctly) and 6 (incorrectly). Instead of tracking a boolean, it's quite a bit simpler to reason about emoji sequences by just skipping past them entirely. Note that in cases like the above emoji, we skip one sub-sequence at a time. |
||
---|---|---|
.. | ||
CharacterTypes.cpp | ||
CharacterTypes.h | ||
CMakeLists.txt | ||
CurrencyCode.cpp | ||
CurrencyCode.h | ||
Emoji.cpp | ||
Emoji.h | ||
Forward.h | ||
Normalize.cpp | ||
Normalize.h | ||
Segmentation.cpp | ||
Segmentation.h | ||
String.cpp | ||
UnicodeUtils.cpp | ||
UnicodeUtils.h |