dump(unicode.utf-8-bypass): update

2025-07-30 12:37:50 +00:00 · 2025-07-03 18:42:01 +03:00 · 2025-07-03 18:42:01 +03:00 · 9d1b0cbde8
commit 9d1b0cbde8
parent 82b304945d
1 changed files with 22 additions and 0 deletions
--- a/site/dump/unicode/utf-8-bypass.md
+++ b/site/dump/unicode/utf-8-bypass.md
@ -0,0 +1,22 @@
+---
+title: UTF-8 Bypass
+date: 2025-07-03
+---
+
+Did you know that you used to be able to encode the "/" (solidus, also known as
+slash) character in UTF-8 in 3 different ways?
+
+These were `0x2F`, or `0xC0 0xAF`, or `0xE0 0x80 0xAF`.
+
+This led to security issues and let attackers bypass validation logic.
+
+The Unicode specification later was revised to say that a UTF-8 encoder must
+produce the shortest possible sequence that can represent a codepoint, and a
+decoder must reject any byte sequence that’s longer than it needs to be to fix
+this issue.
+
+More reading:
+
+- Corrected UTF-8: <https://www.owlfolio.org/development/corrected-utf-8/>
+- CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic:
+  <https://capec.mitre.org/data/definitions/80.html>