diff --git a/site/dump/unicode/utf-8-bypass.md b/site/dump/unicode/utf-8-bypass.md new file mode 100644 index 0000000..e1ce72e --- /dev/null +++ b/site/dump/unicode/utf-8-bypass.md @@ -0,0 +1,22 @@ +--- +title: UTF-8 Bypass +date: 2025-07-03 +--- + +Did you know that you used to be able to encode the "/" (solidus, also known as +slash) character in UTF-8 in 3 different ways? + +These were `0x2F`, or `0xC0 0xAF`, or `0xE0 0x80 0xAF`. + +This led to security issues and let attackers bypass validation logic. + +The Unicode specification later was revised to say that a UTF-8 encoder must +produce the shortest possible sequence that can represent a codepoint, and a +decoder must reject any byte sequence that’s longer than it needs to be to fix +this issue. + +More reading: + +- Corrected UTF-8: +- CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic: +