mirror of
https://github.com/RGBCube/serenity
synced 2025-05-31 15:48:12 +00:00
AK+Tests: Avoid creating invalid code points from malformed UTF-8
Instead of doing anything reasonable, Utf8CodePointIterator returned invalid code points, for example U+123456. However, many callers of this iterator assume that a code point is always at most 0x10FFFF. In fact, this is one of two reasons for the following OSS Fuzz issue: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49184 This is probably a very old bug. In the particular case of URLParser, AK::is_url_code_point got confused: return /* ... */ || code_point >= 0xA0; If code_point is a "code point" beyond 0x10FFFF, this violates the condition given in the preceding comment, but satisfies the given condition, which eventually causes URLParser to crash. This commit fixes *only* the erroneous UTF-8 decoding, and does not fully resolve OSS-Fuzz#49184.
This commit is contained in:
parent
3aeb57ed09
commit
ff8f3814cc
2 changed files with 31 additions and 0 deletions
|
@ -274,6 +274,10 @@ u32 Utf8CodePointIterator::operator*() const
|
|||
code_point_value_so_far |= m_ptr[offset] & 63;
|
||||
}
|
||||
|
||||
if (code_point_value_so_far > 0x10FFFF) {
|
||||
dbgln_if(UTF8_DEBUG, "Multi-byte sequence is otherwise valid, but code point {:#x} is not permissible.", code_point_value_so_far);
|
||||
return 0xFFFD;
|
||||
}
|
||||
return code_point_value_so_far;
|
||||
}
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue