From 9d1b0cbde8ae327bfc9b948e5c3f75e5b4001f1a Mon Sep 17 00:00:00 2001
From: RGBCube <git@rgbcu.be>
Date: Thu, 3 Jul 2025 18:42:01 +0300
Subject: [PATCH] dump(unicode.utf-8-bypass): update

---
 site/dump/unicode/utf-8-bypass.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 site/dump/unicode/utf-8-bypass.md

diff --git a/site/dump/unicode/utf-8-bypass.md b/site/dump/unicode/utf-8-bypass.md
new file mode 100644
index 0000000..e1ce72e
--- /dev/null
+++ b/site/dump/unicode/utf-8-bypass.md
@@ -0,0 +1,22 @@
+---
+title: UTF-8 Bypass
+date: 2025-07-03
+---
+
+Did you know that you used to be able to encode the "/" (solidus, also known as
+slash) character in UTF-8 in 3 different ways?
+
+These were `0x2F`, or `0xC0 0xAF`, or `0xE0 0x80 0xAF`.
+
+This led to security issues and let attackers bypass validation logic.
+
+The Unicode specification later was revised to say that a UTF-8 encoder must
+produce the shortest possible sequence that can represent a codepoint, and a
+decoder must reject any byte sequence that’s longer than it needs to be to fix
+this issue.
+
+More reading:
+
+- Corrected UTF-8: <https://www.owlfolio.org/development/corrected-utf-8/>
+- CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic:
+  <https://capec.mitre.org/data/definitions/80.html>