Unicode インスペクター

文字列内の各文字の Unicode コードポイント、UTF-8 バイト、UTF-16 エンコード、エスケープシーケンス、Unicode ブロックを表示します。

入力テキスト0 文字

What Is the Unicode Inspector?

The Unicode Inspector breaks any text string into its individual Unicode characters and displays detailed technical information for each one: the code point (U+XXXX notation), decimal value, UTF-8 byte sequence, UTF-16 encoding, JavaScript/TypeScript escape sequence, and Unicode block category. This is useful for debugging encoding issues, understanding how emoji or special characters are stored, verifying that strings are normalized correctly, and learning the Unicode standard.

The tool handles the full Unicode range, including characters outside the Basic Multilingual Plane (BMP) such as emoji and historic scripts that require UTF-16 surrogate pairs.

How to Use the Unicode Inspector

  1. Paste or type any text into the input field.
  2. Each character appears as a tile in the character grid. Click a tile to see its full details in the panel on the right.
  3. Scroll down to the table for a compact view of all characters at once, including their UTF-8 bytes and escape sequences.
  4. Click any table row to highlight that character in the detail panel.

Features

  • Unicode code point in U+XXXX notation and decimal
  • UTF-8 byte sequence in hexadecimal
  • UTF-16 encoding (including surrogate pairs for supplementary characters)
  • JavaScript/TypeScript escape sequence (\uXXXX or \u{XXXXX})
  • Unicode block / category classification
  • Interactive character grid — click to inspect any character
  • Compact table view for all characters at once
  • Runs entirely in your browser — no data uploaded

FAQ

What is a Unicode code point?

A Unicode code point is a unique number assigned to each character in the Unicode standard, written as U+ followed by a hexadecimal number — for example U+0041 for the letter A or U+1F600 for the grinning face emoji. Unicode covers over 140,000 characters across more than 150 scripts.

What is the difference between UTF-8 and UTF-16?

UTF-8 uses 1 to 4 bytes per character and is backward-compatible with ASCII, making it the dominant encoding on the web. UTF-16 uses 2 or 4 bytes per character and is used internally by JavaScript and Java. Both encode the same Unicode code points — they differ only in the byte representation.

Why do some emoji appear as two characters?

Characters above U+FFFF require UTF-16 surrogate pairs — two 16-bit code units. JavaScript strings are UTF-16 internally, so these characters occupy two code units. This inspector shows each Unicode scalar value as a single entry, correctly handling the full supplementary character range.

What is a Unicode escape sequence?

A Unicode escape sequence lets you write any character using its code point in source code. In JavaScript/TypeScript, BMP characters use \uXXXX (e.g. A for A) and supplementary characters use \u{XXXXX} (e.g. \u{1F600} for 😀).