This patch adds an IndexedProperties object for storing indexed
properties within an Object. This accomplishes two goals: indexed
properties now have an associated descriptor, and objects now gracefully
handle sparse properties.
The IndexedProperties class is a wrapper around two other classes, one
for simple indexed properties storage, and one for general indexed
property storage. Simple indexed property storage is the common-case,
and is simply a vector of properties which all have attributes of
default_attributes (writable, enumerable, and configurable).
General indexed property storage is for a collection of indexed
properties where EITHER one or more properties have attributes other
than default_attributes OR there is a property with a large index (in
particular, large is '200' or higher).
Indexed properties are now treated relatively the same as storage within
the various Object methods. Additionally, there is a custom iterator
class for IndexedProperties which makes iteration easy. The iterator
skips empty values by default, but can be configured otherwise.
Likewise, it evaluates getters by default, but can be set not to.
This seems to have a higher chance of generating somewhat recognizable
content compared to inline layout. This problem will gradually go away
as we implement more display values.
We still don't handle non-ASCII input correctly, but at least now we'll
convert e.g ISO-8859-1 to UTF-8 before starting to tokenize.
This patch also makes "view source" work with the new parser. :^)
This is enough to parse the Google front page! (Note: I did have to
hack the tokenizer while parsing Google, in order to avoid named
character references screwing everything up. We'll fix that too soon
enough!)
While we're still supporting both the old and the new parser, we have
to deal with the way they load inline stylesheet (and scripts) a bit
differently.
The old parser loads all the text content up front, and then notifies
the containing element. The new parser creates the containing element
up front and appends text inside it afterwards.
For now, we simply do an empty "children_changed" notification when
first inserting a text node inside an element. This at least prevents
the CSS parser from choking on a single-character stylesheet.
The AAA is a somewhat daunting algorithm you have to run for certain
tag when inserted inside the <body> element. The purpose of it is to
resolve issues with mismatched tags.
This patch implements the first half of the AAA. We also move the
"list of active formatting elements" to its own class, since it kept
accumulating little behaviors. "Marker" entries are now signified by
null Element pointers in the list.
You can now pass "-n" to the browser to use the new HTML parser.
It's not turned on by default since it's still very immature, but this
is a huge step towards bringing it into maturity. :^)
Previously, the Object class had many different types of functions for
each action. For example: get_by_index, get(PropertyName),
get(FlyString). This is a bit verbose, so these methods have been
shortened to simply use the PropertyName structure. The methods then
internally call _by_index if necessary. Note that the _by_index
have been made private to enforce this change.
Secondly, a clear distinction has been made between "putting" and
"defining" an object property. "Putting" should mean modifying a
(potentially) already existing property. This is akin to doing "a.b =
'foo'".
This implies two things about put operations:
- They will search the prototype chain for setters and call them, if
necessary.
- If no property exists with a particular key, the put operation
should create a new property with the default attributes
(configurable, writable, and enumerable).
In contrast, "defining" a property should completely overwrite any
existing value without calling setters (if that property is
configurable, of course).
Thus, all of the many JS objects have had any "put" calls changed to
"define_property" calls. Additionally, "put_native_function" and
"put_native_property" have had their "put" replaced with "define".
Finally, "put_own_property" has been made private, as all necessary
functionality should be exposed with the put and define_property
methods.
Instead of creating extremely common FlyStrings like "id" and "class"
on demand every time they are needed, we now have AttributeNames.h,
which provides Web::HTML::AttributeNames::{id,class_}
This avoids a bunch of string allocations during selector matching.
Instead of string splitting every time you call Element::has_class(),
we now split the "class" attribute value when it changes, and cache
the individual classes as FlyStrings in Element::m_classes.
This makes has_class() significantly faster and moves the pain point
of selector matching somewhere else.
Sometimes people put a '}' where it doesn't belong, or various other
things go wrong. 99% of the time, it's our fault, but either way,
this patch makes us not crash or infinite-loop in some common cases.
The real solution here is to write a proper CSS lexer-parser according
to the language spec, this is just a hack fix to make more sites load
at all.
We now implement the somewhat fuzzy shrink-to-fit algorithm when laying
out inline-block elements with both block and inline children.
Shrink-to-fit works by doing two speculative layouts of the entire
subtree inside the current block, to compute two things:
1. Preferred minimum width: If we made a line break at every chance we
had, how wide would the widest line be?
2. Preferred width: We break only when explicitly told to (e.g "<br>")
How wide would the widest line be?
We then shrink the width of the inline-block element to an appropriate
value based on the above, taking the available width in the containing
block into consideration (sans all the box model fluff.)
To make the speculative layouts possible, plumb a LayoutMode enum
throughout the layout system since it needs to be respected in various
places.
Note that this is quite hackish and I'm sure there are smarter ways to
do a lot of this. But it does kinda work! :^)
In step 4 of the "renstruct the active formatting elements" algorithm it
says:
Rewind: If there are no entries before entry in the list of active
formatting elements, then jump to the step labeled create.
Prior to this patch, the implementation accorded to the spec only for
the first loop iteration.
This was causing very tall lines on many websites. We can now see the
section header thingy on google.com (although it's broken into lines
where it should not be..) :^)
And move canonicalized_path() to a static method on LexicalPath.
This is to make it clear that FileSystemPath/canonicalized_path() only
perform *lexical* canonicalization.