Sometimes people put a '}' where it doesn't belong, or various other
things go wrong. 99% of the time, it's our fault, but either way,
this patch makes us not crash or infinite-loop in some common cases.
The real solution here is to write a proper CSS lexer-parser according
to the language spec, this is just a hack fix to make more sites load
at all.
We now implement the somewhat fuzzy shrink-to-fit algorithm when laying
out inline-block elements with both block and inline children.
Shrink-to-fit works by doing two speculative layouts of the entire
subtree inside the current block, to compute two things:
1. Preferred minimum width: If we made a line break at every chance we
had, how wide would the widest line be?
2. Preferred width: We break only when explicitly told to (e.g "<br>")
How wide would the widest line be?
We then shrink the width of the inline-block element to an appropriate
value based on the above, taking the available width in the containing
block into consideration (sans all the box model fluff.)
To make the speculative layouts possible, plumb a LayoutMode enum
throughout the layout system since it needs to be respected in various
places.
Note that this is quite hackish and I'm sure there are smarter ways to
do a lot of this. But it does kinda work! :^)
In step 4 of the "renstruct the active formatting elements" algorithm it
says:
Rewind: If there are no entries before entry in the list of active
formatting elements, then jump to the step labeled create.
Prior to this patch, the implementation accorded to the spec only for
the first loop iteration.
This was causing very tall lines on many websites. We can now see the
section header thingy on google.com (although it's broken into lines
where it should not be..) :^)
And move canonicalized_path() to a static method on LexicalPath.
This is to make it clear that FileSystemPath/canonicalized_path() only
perform *lexical* canonicalization.
Unless otherwise stated, we shouldn't stop parsing just because there's
a parse error, so let's allow ourselves to continue.
With this change, we can now tokenize and parse the ACID1 test. :^)
Now that we've gotten rid of the misguided character buffering in the
tokenizer, it actually spits out character tokens that we have to deal
with in the parser.
This patch implements enough to bring us back to speed with simple.html
When watching the video of the new HTML parser I noticed a small copy
and paste error. In one of the cases in `handle_after_head` the code
was checking for end tags when it should check for start tags.
I haven't tested this change, just looking at the spec.
Add reasonable support for all values of white-space CSS property.
Values of the property are translated into a 3-tuple of rules:
do_collapse: whether whitespace is to be collapsed
do_wrap_lines: whether to wrap on word boundaries when
lines get too long
do_wrap_breaks: whether to wrap on linebreaks
The previously separate handling of per-line splitting and per-word
splitting have been unified. The Word structure is now a more
general Chunk, which represents different amounts of text depending
on splitting rules.
The "audible bell" character ('\a' U+0007) was treated as whitespace
while the "line feed" character ('\n' U+000a) was not.
'\a' is no longer considered whitespace.
'\n' is now considered whitespace.
We can now parse a little DOM like this:
<!DOCTYPE html>
<html>
<head></head>
<body>
<div></div>
</body>
</html>
This is pretty slow work, but the incremental progress is satisfying!
This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.
The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)
This is going to become pretty large, but it's pretty cool!
When hit testing encountered a block with inline children, we assumed
that the inline children are nothing but text boxes. An inline-block
box is actually a block child of a block with inline children, so we
have to handle that scenario as well. :^)
Fixes#2353.
Instead of emitting data-bearing tokens immediately, do it lazily at
the next state change. This allows us to accumulate full bursts of
text in between tags instead of having one token per character. :^)
This makes it a compile error to omit the END_STATE. Also add some more
missing END_STATE's exposed by this (nice!)
Thanks to @predmond for suggesting the multi-pair trick! :^)