UTF8 Encode Decode - Convert String to UTF8

UTF8 encoder/decoder – Online converter tools, Encode/Decode strings to UTF8 and vice versa with interactive UTF8 encoding algorithm by ConvertCodes.

Text

UTF8

Additional Condition

Select notation

Ignore convert white space

Optional: Upload file to encode / decode

Source File Encoding

Remark: UTF8 Encode / Decode input box limit 2,000 Characters. For a large data, please convert by upload a file.

What is UTF8

UTF8 is a Unicode standard encoding which encodes by one to four bytes of 8-bits. UTF8 can represent all existing characters in the world. It is compatible with ASCII encoding because it was designed the same as ASCII binary value. While ASCII encoding using 7 bits and UTF8 are using 8 bits with the same binary value, therefore, ASCII encoding will be a subset of UTF8.

Now, UTF8 become the most popular character encoding for all website. Unfortunately, most people did not notice it because the browser has already been converted it to human characters especially on Non-English characters.

Pros and Cons of UTF8 encoding

Pros of UTF8 encoding

UTF8 support many languages.
Most of the programming languages support UTF8.
UTF8 is compatible with ASCII
UTF8 able to convert to other charsets easily by ICONV.

Cons of UTF8 encoding

UTF-8 uses a variable length encoding especially on high code point, so it hard to determine the number of UTF8 bytes.
Require encoding module for programming languages.
UTF8 consume more processing time to find sequence code unit because UTF-8 uses a variable length encoding.

How to encode UTF8 (UTF8 Converter)

Example – Encode string “₹” to UTF8 hexadecimal. (UTF8 Encode)

Search for “₹” or rupee sign code point, which is “U+20B9”

2. Convert “20B9” hexadecimal to binary numbers

Hexadecimal	Binary
2	0010
0	0000
B	1011
9	1001

"20B9" = "0010 0000 1011 1001"

3. Refer to Table UTF8 Code Point Prefix, Binary 16 bits need 3 bytes format below.

Code Point 16 Bits = "1110(XXXX) 10(XXXXXX) 10(XXXXXX)"

Start to rearrange bits from the left-hand side of previous binary 16 bits as UTF8 encoding format.

Rearrange: 0010 0000 1011 1001 -> 0010 000010 111001

Put prefix binary in each byte to rearrange formatted.

UTF8 Prefix: "1110(0010) 10(000010) 10(111001)"

4. Now, you will get 3 bytes of UTF8 binary. Convert all binary back to hexadecimal.

Binary	Hexadecimal
11100010	E2
10000010	82
10111001	B9

The result of “₹” UTF8 encoding will be

Hexadecimal  : E2 82 B9
Hex notation : \xE2\x82\xB9

How to decode UTF8 (UTF8 Converter)

UTF8 Decoding

Convert all hexadecimal to binary bits.
Start to read binary bits and determine the starter prefix of each byte as we see in table UTF8 Code Point Prefix.
Eliminating prefix bits and convert binary data back to Unicode code point.
Mapping code point back to a string.

Table UTF8 Code Point Prefix

utf8-code-point-bit

Link