1
Fork 0
mirror of https://github.com/RGBCube/serenity synced 2025-10-25 11:02:33 +00:00
Commit graph

15 commits

Author SHA1 Message Date
Sam Atkins
d7ffa51424 LibTextCodec: Ignore BYTE ORDER MARK at the start of utf8/16 strings
Before, this was getting included as part of the output text, which was
confusing the HTML parser. Nobody needs the BOM after we have identified
the codec, so now we remove it when converting to UTF-8.
2021-09-15 17:00:18 +02:00
sin-ack
e6818388e4 LibTextCodec: Add "process" API for allocation-free code point iteration
This commit adds a new process method to all Decoder subclasses which
do what to_utf8 used to do, and allows callers to customize the handling
of individiual UTF-8 code points through a callback. Decoder::to_utf8
now uses this API to generate a string via StringBuilder, preserving the
original behavior.
2021-08-30 00:08:40 +02:00
Andreas Kling
ed7a2f21ff LibTextCodec: Remove unused is_standardized_encoding() 2021-08-20 15:31:46 +02:00
Aatos Majava
3b2a528b33 LibTextCodec: Add Turkish (aka ISO-8859-9, Windows-1254) encoding 2021-06-23 16:32:47 +01:00
Aatos Majava
7597cca5c6 LibTextCodec: Add ISO-8859-15 (aka Latin-9) encoding 2021-06-15 15:12:09 +01:00
Max Wipfli
d325403cb5 LibTextCodec: Use Optional<String> for get_standardized_encoding
This patch changes get_standardized_encoding to use an Optional<String>
return type instead of just returning the null string when unable to
match the provided encoding to one of the canonical encoding names.

This is part of an effort to move away from using null strings towards
explicitly using Optional<String> to indicate that the String may not
have a value.
2021-05-18 21:02:07 +02:00
Idan Horowitz
87cabda80d LibTextCodec: Implement a Windows-1251 decoder
This encoding (a superset of ascii that adds in the cyrillic alphabet)
is currently the third most used encoding on the web, and because
cyrillic glyphs were added by Dmitrii Trifonov recently, we can now
support it as well :^)
2021-05-01 17:59:08 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Idan Horowitz
4a2c0d721f LibTextCodec: Implement a Windows-1255 decoder.
This is a superset of ascii that adds in the hebrew alphabet.
(Google currently assumes we are running windows due to not
recognizing Serenity as the OS in the user agent, resulting
in this encoding instead of UTF8 in google search results)
2021-04-17 18:13:20 +02:00
Idan Horowitz
c9f25bca04 LibTextCodec: Make UTF16BEDecoder read only up to an even offset
Reading up to the end of the input string of odd length results in
an out-of-bounds read
2021-03-15 16:08:12 +01:00
Luke
7427e3a8a6 LibTextCodec: Fix IBM666 => IBM866 typo 2021-03-14 11:27:52 +01:00
Andreas Kling
0538dc4e28 LibTextCodec: Add a simple UTF-16BE decoder 2021-02-16 17:31:22 +01:00
Ben Wiederhake
aa3d915260 LibTextCodec: Avoid duplicate definition of standard encodings 2021-02-01 09:54:32 +01:00
asynts
01879d27c2 Everywhere: Replace a bundle of dbg with dbgln.
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.
2021-01-16 11:54:35 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
Renamed from Libraries/LibTextCodec/Decoder.cpp (Browse further)