Slide 1: Introduction
This presentation covers Unicode, UTF encodings, and surrogate pairs — fundamental concepts
for working with text in modern programming languages.
Slide 2: What is Unicode?
Unicode is a universal character encoding standard used to represent text in computers. It assigns
a unique number (code point) to every character across all languages.
Slide 3: Code Points
A code point is a number assigned to each character in Unicode. Example: 'A' = U+0041, '■' =
U+1F60A.
Slide 4: Encoding Systems
Encodings like UTF-8, UTF-16, and UTF-32 define how code points are stored in memory using
bytes.
Slide 5: UTF-8 Encoding
UTF-8 is the most common encoding on the web. It uses 1 to 4 bytes to represent a character.
Slide 6: UTF-16 and UTF-32
UTF-16 uses 2 or 4 bytes; UTF-32 uses a fixed 4 bytes per character. UTF-16 is common in
Windows & Dart.
Slide 7: What are Surrogate Pairs?
In UTF-16, characters outside the Basic Multilingual Plane (above U+FFFF) are encoded using two
16-bit units called surrogate pairs.
Slide 8: Basic Multilingual Plane (BMP)
The BMP includes characters from U+0000 to U+FFFF. Most common scripts reside here.
Slide 9: Dart & Unicode
Dart uses UTF-16 encoding internally. Characters like emojis are treated as surrogate pairs in
strings.
Slide 10: Real-World Example in Dart
Example:
final heart = '■';
print(heart.runes); // (128153)
print(heart.length); // 2
Slide 11: Why is Unicode Important?
- Globalization
- Multilingual apps
- Emoji and symbol support
- Security (avoiding spoofing)
Slide 12: Practical Use Cases
- Web development (HTML uses UTF-8)
- Mobile apps (Flutter/Dart)
- Databases
- APIs and internationalization
Slide 13: Visual Diagram
[BMP] --> UTF-16 (1 unit)
[Non-BMP] --> UTF-16 (2 units = surrogate pair)
U+1F600 ➝ D83D DE00
Slide 14: Common Issues
- Misinterpreted encoding
- Character corruption
- String length confusion (e.g. emojis)
Slide 15: Glossary
- Unicode: Universal character encoding
- Code Point: Numeric value like U+1F600
- UTF: Encoding form
- Surrogate Pair: Two units for one character
Slide 16: Security Aspects
Unicode can hide malicious input using homoglyphs (e.g. Cyrillic '■' vs Latin 'a').
Slide 17: Unicode in Dart Libraries
- 'characters' package for grapheme clusters
- 'intl' for localization
- .runes and .codeUnits for low-level access
Slide 18: Summary
• Unicode assigns a unique code point to every character
• UTF encodes these for storage
• Dart uses UTF-16 internally
• Surrogate pairs represent non-BMP characters
Slide 19: Questions & Discussion
Any questions?
You can ask about UTFs, Dart handling of Unicode, or encoding practices in web/mobile apps.
Slide 20: Thank You!
Presentation by [Your Name].
Prepared for Dart Programming Lab.