Previously, the loops would stop before reaching EOF, meaning that the
values that should have been set to EOF were left with their 0 initial
values. Now, we initialize to EOFs instead. The if/else inside the loops
always ran the else branch so I have removed the if branches.
With `IconPath`, you can override the icon used for the application
shortcut. This currently only supports resolving the icon through
`GUI::FileIconProvider`, the implementation for pointing to actual
image files is left as an exercise for the reader.
There are a couple of minor nuances with parsing script values, compared
to other properties. In Scripts.txt, the UCD file lists the full name of
each script; other properties, like General Category, list the shorter
name in their primary files. This means that the aliases listed in
PropertyValueAliases.txt are reversed for script values.
It takes a non-neglible amount of time to parse all of the UCD files and
generate the Unicode data files. To help compile times, only invoke the
generator once.
The current strategy of searching for a code point within the generated
table is slow for code points > U+0377 (the last code point whose index
is the same value as the code point itself). For larger code points, we
are doing a linear search through the table.
Instead, generate a HashMap of each code point to its entry in the table
for faster runtime lookups.
This had the added benefit of being able to remove a fair amount of code
from the generator. We no longer need to track that last contiguous code
point (U+0377) nor each code point's index in the generated table.
PrimitiveString may currently only be created with a UTF-8 string, and
it transcodes on the fly when a UTF-16 string is needed. Allow creating
a PrimitiveString from a UTF-16 string to avoid unnecessary transcoding
when the caller only wants UTF-16.
Rather than deferring this decoding to PrimitiveString, we can decode
surrogate pairs when parsing the string. This prevents a string copy
when constructing the PrimitiveString.
In non-Unicode mode, the existing MatchState::string_position is tracked
in code units; in Unicode mode, it is tracked in code points.
In order for some RegexStringView operations to be performant, it is
useful for the MatchState to have a field to always track the position
in code units. This will allow RegexStringView methods (e.g. operator[])
to perform lookups based on code unit offsets, rather than needing to
iterate over the entire string to find a code point offset.
The current method of iterating through the string to access a code
point hurts performance quite badly for very large strings. The test262
test "RegExp/property-escapes/generated/Any.js" previously took 3 hours
to complete; this one change brings it down to under 10 seconds.
Before this change the file stream was generated two times:
one time in the parse_header(), and another time for the whole class
in the constructor.
The previous commit moved the m_stream initialization before
executing the parse_header function, so we can now reuse that here.
Before this change opening the file in the system resulted in crash
caused by assertion saying:
SoundPlayer(32:32): ASSERTION FAILED: m_ptr
../.././AK/OwnPtr.h:139
[#0 SoundPlayer(32:32)]: Terminating SoundPlayer(32) due to signal 6
[#0 FinalizerTask(4:4)]: 0xdeadc0de
The issue was that 845d403b8c started
using m_stream in the parse_header() function, but that variable wasn't
initialized if the Loader plugin was created using a file path
(which is used everywhere except for the fuzz testing),
resulting in a crash mentioned above.
Problem:
- New `all_of` implementation takes the entire container so the user
does not need to pass explicit begin/end iterators. This is unused
except is in tests.
Solution:
- Make use of the new and more user-friendly version where possible.
Before now, only binary properties could be parsed. Non-binary props are
of the form "Type=Value", where "Type" may be General_Category, Script,
or Script_Extension (or their aliases). Of these, LibUnicode currently
supports General_Category, so LibRegex can parse only that type.
This changes LibRegex to parse the property escape as a Variant of
Unicode Property & General Category values. A byte code instruction is
added to perform matching based on General Category values.
This downloads the PropertyValueAliases.txt UCD file, which contains a
set of General Category aliases.
This changes the General Category enumeration to now be generated as a
bitmask. This is to easily allow General Category unions. For example,
the LC (Cased_Letter) category is the union of the Ll, Lu, and Lt
categories.
Change all the places that were including the deprecated parser, to
include the new one instead, and then delete the old parser code.
`ParentNode::query_selector[_all]()` now treat their input as a
comma-separated list of selectors, instead of just one, and return
elements that match any of the selectors in that list. This is according
to these specs:
- querySelector/querySelectorAll:
https://dom.spec.whatwg.org/#ref-for-dom-parentnode-queryselector%E2%91%A0
- selector matching algorithm:
https://www.w3.org/TR/selectors-4/#match-against-tree
These mostly match the API in `DeprecatedCSSParser.h`. The exception is
that `parse_selector()` returns a `SelectorList` instead of just one
`Selector`. The only uses of that are in
`ParentNode::query_selector[_all]()` which should be matching against a
list, according to the spec.
`parse_html_length()` is an odd case. It's used for `width="200"` cases
in HTML, so is not really CSS related, but does produce a StyleValue.
The values allowed in `width/height` in HTML vary per element, but they
are a lot more restricted than in CSS, so it's slightly inappropriate to
use the CSS parser for them, even though it's convenient.
We also ignore a few functions:
- `parse_line_width()`
- `parse_line_style()`
- `parse_color()`
These are all only used in `StyleResolver`, when it is given a property
value as a String. That won't happen once the old parser is removed.
`parse_as_foo()` implies that the Parser's internal data is used,
whereas `parse_a_foo()` implies that the passed-in data is used.
Also, made all the `parse_a_foo()` methods private, as they are only
required within the Parser, and this makes the API clearer to outsiders.
The `parse_a(s)_foo()` naming is a little awkward, but it comes from
section 5.3 of the spec, so seemed worth keeping:
https://www.w3.org/TR/css-syntax-3/#parser-entry-points
This ensures that the user can't copy/cut text from password boxes which
would reveal the password. It also makes sure that the undo/redo actions
stay disabled because it's difficult to reason about what these do
exactly without being able to see the result.