Standards
Many of the pages on this site will reference different standards. Below we list links to the various standards mentioned.
Encoding
Section titled “Encoding”ISO/IEC 10646 specifies the Universal Coded Character Set (UCS). It is intended to have the exact repertoire as is in The Unicode Standard.
The Unicode Standard — the standards body for the internationalization of software and services.
Languages
Section titled “Languages”ISO 639 — Code for individual languages and language groups. The first version of the standard, ISO 639-1, included a few hundred well-known languages that were identified by unique two-letter codes. ISO 639-2 included a larger number of languages, and began to use three-letter codes as well as the two-letter ones. ISO 639-3 is intended to be much more comprehensive and assigns each language a unique three-letter code. While ISO 639-1 is mostly obsolete, the ISO 639-2 standard continues to be used alongside of ISO 639-3, including the two-letter codes for languages that have been assigned them.
ISO 639-3 — ISO 639 is the International Organization for Standardization’s registry of the languages of the world. It is comprised of living languages taken from SIL’s Ethnologue, as well as extinct, ancient, reconstructed, and artificial languages.
The current registry includes over 7,000 languages, each identified by a unique three-letter code. Languages are also assigned a name and a type (living, extinct, ancient, or constructed).
The standard distinguishes between individual and macrolanguages. A macrolanguage is a grouping of dialects that are closely enough related that they may be considered a single language for some purposes. For instance, the macrolanguage called Arabic (ara) consists of Standard Arabic (arb) as well as Egyptian Arabic (arz), Moroccan Arabic (ary), and several dozen other dialects. Similarly, the individual Central, Northern and Southern varieties of Kurdish (ckb, kmr and sdh, respectively) are grouped together into the Kurdish macrolanguage (kur).
If you would like to suggest improvements to the ISO 639-3, please submit data to the registrar per these instructions: Submitting ISO 639-3 Change Requests.
Ethnologue — a catalog of every known living language still in use today. The living and recently extinct languages in ISO 639-3 correspond, with a very few exceptions, to the languages of the Ethnologue. The most significant difference between the two registries is that the ISO standard includes ancient, reconstructed, and artificial languages as well as those currently or recently in use. The Ethnologue defines additional categories for nearly extinct languages and those that are spoken only as a second language. The Ethnologue also includes extensive information gained from SIL’s linguistic research.
ROLV (Registry Of Language Varieties) — This registry defines standardized codes used for identifying the language varieties (sometimes called dialects) of the world.
Scripts
Section titled “Scripts”ISO 15924 — The International Organization for Standardization (ISO) has appointed the Unicode Consortium as the Registration Authority for International Standard, Codes for the representation of names of scripts. Each script is identified by a name and four-letter code. These script codes are used in language tagging. Suggestions for additions and corrections can be made from the Unicode’s ISO 15924 page.
Regions
Section titled “Regions”ISO 3166— two-letter country codes are used in making a language tag.
ISO 3166-1 Numeric — three-digit region codes are sometimes used when a country code is insufficient.
Writing system support
Section titled “Writing system support”BCP 47 / IANA Language Subtag Registry — A BCP 47 tag is a standard way of referencing a language, used widely in the computer industry. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry.
OpenType Layout Tag Registry — OpenType Layout tag registry for Script tags, Language tags, Feature tags, and Baseline tags.
CLDR (Unicode Common Locale Data Repository) — provides key building blocks for software to support the world’s languages, with the largest and most extensive standard repository of locale data available.