1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-24 01:22:32 +00:00
Commit graph

48 commits

Author SHA1 Message Date
Sam Atkins
3685a8813a LibWeb: Port CSS Tokenizer to new Strings
Specifically, this uses FlyString, because the data gets held long-term
as a FlyString anyway.
2023-02-15 12:48:26 -05:00
Sam Atkins
8af65108e4 LibWeb: Construct CSS Tokenizer and Parser with a StringView encoding
This doesn't need to be a full (Deprecated)String, so let's not force it
to be.
2023-02-15 12:48:26 -05:00
Sam Atkins
7fc72d3838 LibWeb: Convert CSS Token value to new FlyString 2023-02-13 14:35:40 +00:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
davidot
8abd4f6102 LibWeb: Make the CSS parser use the new double parser
This could potentially be sped up by tracking the up to three different
ranges of characters known to be digits. This would save the double
parser from checking whether these are digits and because it has the
size it can use the fast parsing method.
2022-10-23 15:48:45 +02:00
Sam Atkins
164094e161 LibWeb: Bring CSS tokenization preprocessing closer to spec
This is based on an editorial change in the December 2021 version of
SYNTAX-3: https://www.w3.org/TR/2021/CRD-css-syntax-3-20211224/

They named this step "filter code points", so let's use that name.
2022-10-03 17:09:41 +01:00
Sam Atkins
97e174afcd LibWeb: Use the term "ident sequence" instead of "name"
This is an editorial change in the December 2021 version of SYNTAX-3:
https://www.w3.org/TR/2021/CRD-css-syntax-3-20211224/
2022-10-03 17:09:41 +01:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
stelar7
cd73d5c1d0 LibWeb: Add missing preprocessing step to the css tokenizer 2022-05-08 16:29:46 +02:00
Sam Atkins
bf786d66b1 LibWeb: Move Token and Tokenizer into Parser namespace 2022-04-12 23:03:46 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Sam Atkins
13e1232d79 LibWeb: Remove separate Token::m_unit field
Dimension tokens don't make use of the m_value string for anything else,
so we can sneak the unit string in there.

- Token goes from 72 to 64 bytes
- StyleComponentValueRule goes from 80 to 72 bytes
2022-03-22 15:47:36 +01:00
Sam Atkins
fe372cd073 LibWeb: Use CSS::Number for Token numeric values 2022-03-22 15:47:36 +01:00
Sam Atkins
0795b9f7bb LibWeb: Use floats instead of doubles for CSS numbers
Using doubles isn't necessary, and they make things slightly bigger and
slower, so let's use floats instead.
2022-03-22 15:47:36 +01:00
Sam Atkins
1f5b5d3f99 LibWeb: Use intermediate ints when converting strings to numbers in CSS
These three are all integers - we just repeatedly multiply them by 10
and then add a digit - so using an integer here is both faster and more
accurate. :^)
2022-03-22 15:47:36 +01:00
Karol Kosek
fd235d8a06 LibWeb: Don't put a backslash after escape sequences in text-like tokens
Previously, a string token like '\41' would be tokenized to 'A\'. This
could be seen on Wikipedia headlines.
2022-03-19 13:10:00 -07:00
Lenny Maiorani
f912a48315 Userland: Change static const variables to static constexpr
`static const` variables can be computed and initialized at run-time
during initialization or the first time a function is called. Change
them to `static constexpr` to ensure they are computed at
compile-time.

This allows some removal of `strlen` because the length of the
`StringView` can be used which is pre-computed at compile-time.
2022-03-18 19:58:57 +01:00
Sam Atkins
2a7a8d2cab LibWeb: Don't verify that a dimension unit isn't whitespace
Raw whitespace is not allowed inside a name, but escaped whitespace is,
for example `\9`, which is the tab character.

This stops yakzz.com from crashing the Browser, since it was using `\9`
in various places as a hack to only apply those properties to IE8/9.
2022-02-02 18:29:05 +01:00
Sam Atkins
5d0851cb0e LibWeb: Use start_of_input_stream_twin() for is_valid_escape_sequence()
This means we can get rid of the hacks where we were peeking a code
point instead of getting the next one so that we could peek_twin()
later. Now, we follow the spec more closely. :^)
2021-12-27 22:56:08 +01:00
Sam Atkins
269a24d4ca LibWeb: Pass correct values to would_start_an_identifier()
Same as with would_start_a_number(), we were skipping a code point.
2021-12-27 22:56:08 +01:00
Sam Atkins
bb82ee5530 LibWeb: Pass correct values to would_start_a_number()
This fixes the crash that Luke found using Domato:
```css
. foo {
    mso-border-alt: solid  .-1pt;
}
```

The spec distinguishes between "If the next 3 code points would
start..." and "If the input stream starts with..." but we were treating
them the same way, skipping the first code point in the process.
2021-12-27 22:56:08 +01:00
Sam Atkins
981badb45f LibWeb: Add CSS::Tokenizer::start_of_input_stream_[twin|triplet]()
These correspond to "If the input stream starts with..." in the spec,
which up until now we were not handling correctly, which led to some fun
bugs.

As noted, reconsuming the input code point in order to read its value is
hacky, but works. Keeping track of the current code point in Tokenizer
would be nicer, when I'm feeling brave enough to mess with it!
2021-12-27 22:56:08 +01:00
Sam Atkins
85e5586a27 LibWeb: Add spec comments to CSS Tokenizer
Some of the code has been slightly rearranged to match the spec order,
but otherwise I've tried not to mess with it.
2021-11-19 22:35:05 +01:00
Sam Atkins
9403cc42f9 LibWeb: Convert CSS Token::m_value from StringBuilder to FlyString
Again, this value does not change once we have finished creating the
Token, so it can be more lightweight.
2021-11-19 22:35:05 +01:00
Sam Atkins
75e7c2c5c0 LibWeb: Convert CSS Token::m_unit from StringBuilder to FlyString
This value doesn't change once it's assigned to the Token, so it can be
more lightweight than a StringBuilder.
2021-11-19 22:35:05 +01:00
Sam Atkins
f6869797a7 LibWeb: Convert numeric tokens to numbers in CSS Tokenizer
The spec wants us to produce numeric values as the Tokenizer sees them,
rather than waiting until the parse stage. This is a first step towards
that.
2021-11-19 22:35:05 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Sam Atkins
ecf5368535 LibWeb: Record position information in CSS Tokens
This is a requirement to be able to use the Tokens for syntax
highlighting.
2021-10-23 19:07:44 +02:00
Sam Atkins
9a2eecaca4 LibWeb: Add CSS Tokenizer::consume_as_much_whitespace_as_possible()
This is a step in the spec in 3 places, and we had it implemented
differently in each one. This unifies them and makes it clearer what
we're doing.
2021-10-23 19:07:44 +02:00
Sam Atkins
dfbdc20f87 LibWeb: Add spec links to CSS Tokenizer
Also renamed `starts_with_a_number()` -> `would_start_a_number()` to
better match spec terminology.
2021-10-23 19:07:44 +02:00
Sam Atkins
bb1cc99750 LibWeb: Stop treating EOF as a valid part of an identifier
This was specifically causing the string "0" to be parsed as an invalid
Dimension token with no units, instead of as a Number. That then caused
out generated `property_initial_value()` function to fail for those
values.
2021-09-17 23:06:45 +02:00
sin-ack
d9900ece2f LibWeb: Preprocess the CSS stream in the Tokenizer
This commit implements the input preprocessing algorithm that CSS Syntax
Module Level 3 defines.
2021-08-30 00:08:40 +02:00
Sam Atkins
74c9587798 LibWeb: Fix EOF handling in CSS Tokenizer peek_{twin,triplet}()
Previously, the loops would stop before reaching EOF, meaning that the
values that should have been set to EOF were left with their 0 initial
values. Now, we initialize to EOFs instead. The if/else inside the loops
always ran the else branch so I have removed the if branches.
2021-08-04 19:04:12 +04:30
Sam Atkins
e54531244f LibWeb: Define proper debug symbols for CSS Parser and Tokenizer
You can now turn debug logging for them on using `CSS_PARSER_DEBUG` and
`CSS_TOKENIZER_DEBUG`.
2021-07-31 00:18:11 +02:00
Sam Atkins
7439fbd896 LibWeb: Get CSS @import rules working in new parser
Also added css-import.html, which tests the 3 syntax variations on
`@import` statements. Note that the optional media-query parameter to
`@import` is not handled yet.
2021-07-31 00:18:11 +02:00
Sam Atkins
c249fbd17c LibWeb: Correct escape handling in CSS Tokenizer
Calling is_valid_escape_sequence() with no arguments hides what it
is operating on, so I have removed that, so that you must explicitly
tell it what you are testing.

The call from consume_a_token() was using the wrong tokens, so it
returned false incorrectly. This was resulting in corrupted output
when faced with this code from Acid2. (Abbreviated)

```css
.parser { error: \}; }
.parser { }
```
2021-07-11 23:19:56 +02:00
Sam Atkins
b7116711bf LibWeb: Add TokenStream class to CSS Parser
The entry points for CSS parsing in the spec are defined as accepting
any of a stream of Tokens, or a stream of ComponentValues, or a String.
TokenStream is an attempt to reduce the duplication of code for that.
2021-07-11 23:19:56 +02:00
Sam Atkins
6c03123b2d LibWeb: Give CSS Token and StyleComponentValueRule matching is() funcs
The end goal here is to make the two classes mostly interchangeable, as
the CSS spec requires that the various parser algorithms can take a
stream of either class, and we want to have that functionality without
needing to duplicate all of the code.
2021-07-11 23:19:56 +02:00
Sam Atkins
9c14504bbb LibWeb: Rename CSS::Token::TokenType -> Type 2021-07-11 23:19:56 +02:00
Sam Atkins
985ed47a38 LibWeb: Use EOF code point instead of Optional in CSS Tokenizer
Optional seems like a good idea, but in many places we were not
checking if it had a value, which was causing crashes when the
Tokenizer was given malformed input. Using an EOF value along with
is_eof() makes things a lot simpler.
2021-07-11 23:19:56 +02:00
Sam Atkins
9115c23bd5 LibWeb: Fix greedy CSS Tokenizer whitespace parsing
Whitespace parsing was too greedy, consuming the first non-
whitespace character after it.
2021-07-11 23:19:56 +02:00
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Andreas Kling
12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Linus Groh
649d2faeab Everywhere: Use "the SerenityOS developers." in copyright headers
We had some inconsistencies before:

- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."

I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.

By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
2021-04-29 00:59:26 +02:00
Brian Gianforcaro
1411ae1bc7 LibWeb: Utilize SourceLocation for CSS/Tokenizer logging 2021-04-25 09:32:03 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
078f0a5c67 LibWeb: Add specification-based CSS tokenizer
Original work by @stelar7 for #2628.
2021-03-09 17:35:38 +01:00