Dark Mode
Capec-80 Detail
Using UTF-8 Encoding to Bypass Validation Logic
Detailed Software Likelihood: High Typical Severity: High
Parents: 267
Threats: T62 T290 T291
This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable- length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.
Not present
| External ID | Source | Link | Description |
|---|---|---|---|
| CAPEC-80 | capec | https://capec.mitre.org/data/definitions/80.html | |
| CWE-173 | cwe | http://cwe.mitre.org/data/definitions/173.html | |
| CWE-172 | cwe | http://cwe.mitre.org/data/definitions/172.html | |
| CWE-180 | cwe | http://cwe.mitre.org/data/definitions/180.html | |
| CWE-181 | cwe | http://cwe.mitre.org/data/definitions/181.html | |
| CWE-73 | cwe | http://cwe.mitre.org/data/definitions/73.html | |
| CWE-74 | cwe | http://cwe.mitre.org/data/definitions/74.html | |
| CWE-20 | cwe | http://cwe.mitre.org/data/definitions/20.html | |
| CWE-697 | cwe | http://cwe.mitre.org/data/definitions/697.html | |
| CWE-692 | cwe | http://cwe.mitre.org/data/definitions/692.html | |
| REF-1 | reference_from_CAPEC | G. Hoglund, G. McGraw, Exploiting Software: How to Break Code, 2004--02, Addison-Wesley | |
| REF-112 | reference_from_CAPEC | http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/character-encoding.html | David Wheeler, Secure Programming for Linux and Unix HOWTO |
| REF-530 | reference_from_CAPEC | Michael Howard, David LeBlanc, Writing Secure Code, Microsoft Press | |
| REF-531 | reference_from_CAPEC | https://www.schneier.com/crypto-gram/archives/2000/0715.html | Bruce Schneier, Security Risks of Unicode, Crypto-Gram Newsletter, 2000--07---15 |
| REF-532 | reference_from_CAPEC | http://en.wikipedia.org/wiki/UTF-8 | Wikipedia, The Wikimedia Foundation, Inc |
| REF-533 | reference_from_CAPEC | http://www.faqs.org/rfcs/rfc3629.html | F. Yergeau, RFC 3629 - UTF-8, a transformation format of ISO 10646, 2003--11 |
| REF-114 | reference_from_CAPEC | http://www.securityfocus.com/infocus/1232 | Eric Hacker, IDS Evasion with Unicode, 2001--01---03 |
| REF-535 | reference_from_CAPEC | http://www.unicode.org/versions/corrigendum1.html | Corrigendum #1: UTF-8 Shortest Form, The Unicode Standard, 2001--03, Unicode, Inc. |
| REF-525 | reference_from_CAPEC | http://www.cl.cam.ac.uk/~mgk25/unicode.html | Markus Kuhn, UTF-8 and Unicode FAQ for Unix/Linux, 1999--06---04 |
| REF-537 | reference_from_CAPEC | http://www.cl.cam.ac.uk/%7Emgk25/ucs/examples/UTF-8-test.txt | Markus Kuhn, UTF-8 decoder capability and stress test, 2003--02---19 |
Explore
-
Survey the application for user-controllable inputs: Using a browser or an automated tool, an attacker follows all public links and actions on a web site. They record all the links, the forms, the resources accessed and all other potential entry-points for the web application.
| Techniques |
|---|
| Use a spidering tool to follow and record all links and analyze the web pages to find entry points. Make special note of any links that include parameters in the URL. |
| Use a proxy tool to record all user input entry points visited during a manual traversal of the web application. |
| Use a browser to manually explore the website and analyze how it is constructed. Many browsers' plugins are available to facilitate the analysis or automate the discovery. |
Experiment
-
Probe entry points to locate vulnerabilities: The attacker uses the entry points gathered in the "Explore" phase as a target list and injects various UTF-8 encoded payloads to determine if an entry point actually represents a vulnerability with insufficient validation logic and to characterize the extent to which the vulnerability can be exploited.
| Techniques |
|---|
| Try to use UTF-8 encoding of content in Scripts in order to bypass validation routines. |
| Try to use UTF-8 encoding of content in HTML in order to bypass validation routines. |
| Try to use UTF-8 encoding of content in CSS in order to bypass validation routines. |
- The application's UTF-8 decoder accepts and interprets illegal UTF-8 characters or non-shortest format of UTF-8 encoding.
- Input filtering and validating is not done properly leaving the door open to harmful characters for the target host.
Not present
| Low | Medium |
|---|---|
| An attacker can inject different representation of a filtered character in UTF-8 format. | |
| An attacker may craft subtle encoding of input data by using the knowledge that they have gathered about the target host. |
| Integrity | Availability | Authorization | Access Control | Confidentiality |
|---|---|---|---|---|
| Execute Unauthorized Commands (Run Arbitrary Code) | Execute Unauthorized Commands (Run Arbitrary Code) | Bypass Protection Mechanism | Bypass Protection Mechanism | Bypass Protection Mechanism |
| Modify Data | Unreliable Execution | Execute Unauthorized Commands (Run Arbitrary Code) |
- Perhaps the most famous UTF-8 attack was against unpatched Microsoft Internet Information Server (IIS) 4 and IIS 5 servers. If an attacker made a request that looked like this http://servername/scripts/..%c0%af../winnt/system32/ cmd.exe the server didn't correctly handle %c0%af in the URL. What do you think %c0%af means? It's 11000000 10101111 in binary; and if it's broken up using the UTF-8 mapping rules, we get this: 11000000 10101111. Therefore, the character is 00000101111, or 0x2F, the slash (/) character! The %c0%af is an invalid UTF-8 representation of the / character. Such an invalid UTF-8 escape is often referred to as an overlong sequence. So when the attacker requested the tainted URL, they accessed http://servername/scripts/../../winnt/system32/cmd.exe In other words, they walked out of the script's virtual directory, which is marked to allow program execution, up to the root and down into the system32 directory, where they could pass commands to the command shell, Cmd.exe.See also: CVE-2000-0884