Previously, the loops would stop before reaching EOF, meaning that the
values that should have been set to EOF were left with their 0 initial
values. Now, we initialize to EOFs instead. The if/else inside the loops
always ran the else branch so I have removed the if branches.
With `IconPath`, you can override the icon used for the application
shortcut. This currently only supports resolving the icon through
`GUI::FileIconProvider`, the implementation for pointing to actual
image files is left as an exercise for the reader.
There are a couple of minor nuances with parsing script values, compared
to other properties. In Scripts.txt, the UCD file lists the full name of
each script; other properties, like General Category, list the shorter
name in their primary files. This means that the aliases listed in
PropertyValueAliases.txt are reversed for script values.
It takes a non-neglible amount of time to parse all of the UCD files and
generate the Unicode data files. To help compile times, only invoke the
generator once.
The current strategy of searching for a code point within the generated
table is slow for code points > U+0377 (the last code point whose index
is the same value as the code point itself). For larger code points, we
are doing a linear search through the table.
Instead, generate a HashMap of each code point to its entry in the table
for faster runtime lookups.
This had the added benefit of being able to remove a fair amount of code
from the generator. We no longer need to track that last contiguous code
point (U+0377) nor each code point's index in the generated table.
PrimitiveString may currently only be created with a UTF-8 string, and
it transcodes on the fly when a UTF-16 string is needed. Allow creating
a PrimitiveString from a UTF-16 string to avoid unnecessary transcoding
when the caller only wants UTF-16.
Rather than deferring this decoding to PrimitiveString, we can decode
surrogate pairs when parsing the string. This prevents a string copy
when constructing the PrimitiveString.
In non-Unicode mode, the existing MatchState::string_position is tracked
in code units; in Unicode mode, it is tracked in code points.
In order for some RegexStringView operations to be performant, it is
useful for the MatchState to have a field to always track the position
in code units. This will allow RegexStringView methods (e.g. operator[])
to perform lookups based on code unit offsets, rather than needing to
iterate over the entire string to find a code point offset.
The current method of iterating through the string to access a code
point hurts performance quite badly for very large strings. The test262
test "RegExp/property-escapes/generated/Any.js" previously took 3 hours
to complete; this one change brings it down to under 10 seconds.
Before this change the file stream was generated two times:
one time in the parse_header(), and another time for the whole class
in the constructor.
The previous commit moved the m_stream initialization before
executing the parse_header function, so we can now reuse that here.
Before this change opening the file in the system resulted in crash
caused by assertion saying:
SoundPlayer(32:32): ASSERTION FAILED: m_ptr
../.././AK/OwnPtr.h:139
[#0 SoundPlayer(32:32)]: Terminating SoundPlayer(32) due to signal 6
[#0 FinalizerTask(4:4)]: 0xdeadc0de
The issue was that 845d403b8c started
using m_stream in the parse_header() function, but that variable wasn't
initialized if the Loader plugin was created using a file path
(which is used everywhere except for the fuzz testing),
resulting in a crash mentioned above.
Problem:
- New `all_of` implementation takes the entire container so the user
does not need to pass explicit begin/end iterators. This is unused
except is in tests.
Solution:
- Make use of the new and more user-friendly version where possible.
Before now, only binary properties could be parsed. Non-binary props are
of the form "Type=Value", where "Type" may be General_Category, Script,
or Script_Extension (or their aliases). Of these, LibUnicode currently
supports General_Category, so LibRegex can parse only that type.
This changes LibRegex to parse the property escape as a Variant of
Unicode Property & General Category values. A byte code instruction is
added to perform matching based on General Category values.