From 9d1b0cbde8ae327bfc9b948e5c3f75e5b4001f1a Mon Sep 17 00:00:00 2001 From: RGBCube Date: Thu, 3 Jul 2025 18:42:01 +0300 Subject: [PATCH] dump(unicode.utf-8-bypass): update --- site/dump/unicode/utf-8-bypass.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 site/dump/unicode/utf-8-bypass.md diff --git a/site/dump/unicode/utf-8-bypass.md b/site/dump/unicode/utf-8-bypass.md new file mode 100644 index 0000000..e1ce72e --- /dev/null +++ b/site/dump/unicode/utf-8-bypass.md @@ -0,0 +1,22 @@ +--- +title: UTF-8 Bypass +date: 2025-07-03 +--- + +Did you know that you used to be able to encode the "/" (solidus, also known as +slash) character in UTF-8 in 3 different ways? + +These were `0x2F`, or `0xC0 0xAF`, or `0xE0 0x80 0xAF`. + +This led to security issues and let attackers bypass validation logic. + +The Unicode specification later was revised to say that a UTF-8 encoder must +produce the shortest possible sequence that can represent a codepoint, and a +decoder must reject any byte sequence that’s longer than it needs to be to fix +this issue. + +More reading: + +- Corrected UTF-8: +- CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic: +