"how numbers are stored and used in computers"
Unicode is a comprehensive character encoding standard that aims to represent every character from every writing system in the world. It was developed to overcome the limitations of earlier character encodings like ASCII, which could only represent a limited set of characters and lacked support for many languages and special symbols.
The development of Unicode began in the late 1980s, driven by the need for a universal character encoding that could handle the diverse writing systems used around the world. The Unicode Consortium was formed in 1991 to develop and maintain the standard. The first version of Unicode was published in 1991, and it has been continuously updated since then to include more characters and features.
The standard has evolved through several major versions, with each version adding support for more characters and writing systems. The latest version, Unicode 15.0, was released in 2022 and includes over 149,000 characters, covering 161 modern and historic scripts, as well as various symbols and emoji.
Unicode uses a unique code point to represent each character. A code point is a number that uniquely identifies a character in the Unicode standard. The standard currently defines code points in the range from 0 to 10FFFF (hexadecimal), which allows for over 1.1 million possible characters.
The Unicode standard is organized into 17 planes, each containing 65,536 code points. The first plane, known as the Basic Multilingual Plane (BMP), contains the most commonly used characters. The remaining planes are used for less common characters, historical scripts, and special symbols.
Each character in Unicode has several properties that define its behavior and usage:
These properties are used by software to correctly display and process text in different languages and writing systems.
Unicode is implemented through various encoding forms, including UTF-8, UTF-16, and UTF-32. These encoding forms define how Unicode code points are represented as sequences of bytes. Each encoding form has its own advantages and use cases:
The choice of encoding form depends on factors such as storage efficiency, processing speed, and compatibility with existing systems.
Unicode has had a profound impact on computing and communication. It has enabled the development of software that can handle text in any language, making it possible to create truly international applications. The standard is used in a wide range of applications, including: