Question 1

What is a Unicode code point?

Accepted Answer

A Unicode code point is a unique number assigned to each character in the Unicode standard. Code points are written as U+ followed by a hexadecimal number, for example U+0041 for the letter A or U+1F600 for the grinning face emoji. The Unicode standard covers over 140,000 characters across more than 150 scripts.

Question 2

What is the difference between UTF-8 and UTF-16?

Accepted Answer

UTF-8 and UTF-16 are two ways to encode Unicode code points as bytes. UTF-8 uses 1 to 4 bytes per character and is backward-compatible with ASCII. UTF-16 uses 2 or 4 bytes per character. UTF-8 is the dominant encoding on the web; UTF-16 is used internally by JavaScript and Java.

Question 3

Why do some emoji appear as two characters?

Accepted Answer

Characters with code points above U+FFFF are outside the Basic Multilingual Plane (BMP). In UTF-16, they are encoded as surrogate pairs — two 16-bit code units. JavaScript strings are UTF-16 internally, so emoji like 😀 (U+1F600) occupy two UTF-16 code units. This tool shows each Unicode character as a single entry regardless of its UTF-16 representation.

Question 4

What is a Unicode escape sequence?

Accepted Answer

A Unicode escape sequence represents a character using its code point in source code. In JavaScript/TypeScript, BMP characters use \uXXXX (e.g. \u0041 for A) and supplementary characters use \u{XXXXX} (e.g. \u{1F600} for 😀). These escapes let you include any Unicode character in string literals.

Unicode インスペクター

What Is the Unicode Inspector?

How to Use the Unicode Inspector

Features

FAQ

What is a Unicode code point?

What is the difference between UTF-8 and UTF-16?

Why do some emoji appear as two characters?

What is a Unicode escape sequence?