mirror of
https://github.com/RGBCube/serenity
synced 2025-05-31 05:48:12 +00:00

In order to actually view the web as it is, we're gonna need a proper HTML parser. So let's build one! This patch introduces the Web::HTMLTokenizer class, which currently operates on a StringView input stream where it fetches (ASCII only atm) codepoints and tokenizes acccording to the HTML spec tokenization algo. The tokenizer state machine looks a bit weird but is written in a way that tries to mimic the spec as closely as possible, in order to make development easier and bugs less likely. This initial version is far from finished, but it can parse a trivial document with a DOCTYPE and open/close tags. :^)
16 lines
517 B
C++
16 lines
517 B
C++
#include <LibWeb/Parser/HTMLTokenizer.h>
|
|
#include <LibCore/File.h>
|
|
#include <AK/ByteBuffer.h>
|
|
#include <AK/LogStream.h>
|
|
|
|
int main(int, char**)
|
|
{
|
|
// This is a temporary test program to aid with bringing up the new HTML parser. :^)
|
|
auto file_or_error = Core::File::open("/home/anon/www/simple.html", Core::File::ReadOnly);
|
|
if (file_or_error.is_error())
|
|
return 1;
|
|
auto contents = file_or_error.value()->read_all();
|
|
Web::HTMLTokenizer tokenizer(contents);
|
|
tokenizer.run();
|
|
return 0;
|
|
}
|