This patch implements the HTML specification's "encoding sniffing
algorithm", which is used when no encoding can be obtained from the
Content-Type header (either because it doesn't contain a charset=...)
value or the file has not been opened via HTTP (as with local files).
It also modifies the creator of the HTMLDocumentParser to use the new
HTMLDocumentParser::create_with_uncertain_encoding static method, which
runs the encoding sniffing algorithm before instantiating the parser.
This now allows us to load local HTML pages (or remote pages without a
charset specified in the 'Content-Type' header) with a non-UTF-8
encoding such as 'windows-1252'. This would previously crash the
browser. :^)
This modifies the Document class to use Optional<String> for the
encoding. If the encoding is unknown, the Optional will not have a
value. It also implements the has_encoding() and encoding_or_default()
instance methods, the latter of which will return "UTF-8" as a fallback
if no encoding is present.
The usage of Optional<String> instead of the null string is part of an
effort to explicitly indicate that a string could not have a value.
This also modifies the former callers of encoding() to use
encoding_or_default(). Furthermore, the encoding will now only be set if
it is actually known, rather than just guessed by earlier code.
This modifies the Resource class to use Optional<String> for the
encoding. If the encoding is unknown, the Optional will not have a
value (instead of using the null state of the String class). It also
implements a has_encoding() instance method and modifies the callers
of Resource::encoding() to use the new API.
This patch changes the encoding_from_content_type function to only
return an encoding if it actually finds one, and leave it up to the
caller to decided on a default to use.
It also modifies the caller to expect an Optional<String> (instead of
relying on the null state of the String class) as a return value and
separates the encoding and MIME type determination. This will be built
upon in a further commit.
This patch changes get_standardized_encoding to use an Optional<String>
return type instead of just returning the null string when unable to
match the provided encoding to one of the canonical encoding names.
This is part of an effort to move away from using null strings towards
explicitly using Optional<String> to indicate that the String may not
have a value.
Second batch of detectable formats, this time with verious offsets, as
enabled by the previous commit.
This adds tar, and the three signature variants for iso-9660 image
files.
This commit adds the Renderer class, which is responsible for rendering
a page into a Gfx::Bitmap. There are many improvements to make here,
but this is a great start!
We now discard these strings instead of copying them into a String
which we immediately destruct. This should result in both a perf uplift
and lower memory usage.
This was caused by a double notifier on the TLS socket, which caused
the TLS code to freak out about not being able to read properly. In
addition, the existing loop inside of drain_read() has been replaced by
code that actually works, and which includes new warnings when the
drain method is called before initialization is done or after the
websocket gets closed.
Problem:
- Function local `constexpr` variables do not need to be
`static`. This consumes memory which is unnecessary and can prevent
some optimizations.
Solution:
- Remove `static` keyword.
Problem:
- `size_classes` is a C-style array which makes it difficult to use in
algorithms.
- `all_of` algorithm is re-written for the specific implementation.
Solution:
- Change `size_classes` to be an `Array`.
- Directly use the generic `all_of` algorithm instead of
reimplementing.
We already do this for the SimpleIndexedPropertyStorage, so for indexed
properties with GenericIndexedPropertyStorage this would previously
crash. Since overwriting the array-like size with a larger value won't
magically insert values at previously unset indices, we need to handle
such an out of bounds access gracefully and just return an empty value.
Fixes#7043.
This was a bit hard to find as a local variable - rename it to uppercase
LENGTH_SETTER_GENERIC_STORAGE_THRESHOLD and move it to the top (next to
SPARSE_ARRAY_HOLE_THRESHOLD) for good visibility.
These aren't actually an extra set, without them the fold operation
would be syntactically invalid.
Also remove possible cast of float->double/double->float in Value::to()
This commit is a bit of a mixed bag, but most of the changes are
repetitive enough to just include in a single commit.
The following instructions remain unimplemented:
- br.table
- table.init
- table.get
- table.set
- table.copy
- table.size
- table.grow
- table.fill
- ref.null
- ref.func
- ref.is_null
- drop
- i32/i64.clz
- i32/i64.ctz
- i32/i64.popcnt
- i32/i64.rotl
- i32/i64.rotr
- X.trunc.Y
- X.trunc_sat.Y
- memory.size
- memory.grow
- memory.init
- memory.copy
- memory.fill
- elem.drop
- data.drop
As the parser now flattens out the instructions and inserts synthetic
nesting/structured instructions where needed, we can treat the whole
thing as a simple parsed bytecode stream.
This currently knows how to execute the following instructions:
- unreachable
- nop
- local.get
- local.set
- {i,f}{32,64}.const
- block
- loop
- if/else
- branch / branch_if
- i32_add
- i32_and/or/xor
- i32_ne
This also extends the 'wasm' utility to optionally execute the first
function in the module with optionally user-supplied arguments.
Before this patch, every shape would permanently remember every other
shape it had ever transitioned to. This could lead to pathological
accumulation of unused shape objects in some cases.
Fix this by using WeakPtr instead of a strongly visited Shape* in the
the forward transition chain map. This means that we will now miss out
on some shape sharing opportunities, but since this is not required
for correctness it doesn't matter.
Note that the backward transition chain is still strongly cached,
as it's necessary for the reification of property tables.
An interesting future optimization could be to allow property tables
to get garbage collected (by detaching them from the shape object)
and then reconstituted from the backwards transition chain (if needed.)
When the path component of the request URL was empty we'd end up
sending requests like "GET HTTP/1.1" (note the missing /). This
ensures that we always send a path.
Problem:
- Static variables take memory and can be subject to less optimization
(https://serenityos.godbolt.org/z/7EYebr1aa)
- This static variable is only used in 1 place.
Solution:
- Move the variable into the function and make it non-static.
If we're able to allocate cells from a freelist, we should always
prefer that over the lazy freelist, since this may further defer
faulting in additional memory for the HeapBlock.
Thanks to @gunnarbeutner for pointing this out. :^)