Skip to content

Component Coredata Examples

Instead of writing “U+0634 ش ARABIC LETTER SHEEN” (and trying to figure out how to type the correct character and find the exact Unicode character name), you can just write <Character usv="0634"/> to produce carefully-styled text with USV, character and Unicode character name: ش

You can also specify a character by typing it literally <Character char="á"/> to produce: á

Transformation of data to a normal form. For historical reasons,
the Unicode Standard allows some characters to have more than one
encoded representation. For example, _á_ may be represented as a single codepoint,
<span class='USV'>U+00E1</span>&nbsp;<span class='UnicodeCharName'>
LATIN SMALL LETTER A WITH ACUTE</span>, or two codepoints, <span class='USV'>
U+0061</span>&nbsp;<span class='UnicodeCharName'>LATIN SMALL LETTER A</span>
and <span class='USV'>U+0301</span>&nbsp;<span class='UnicodeCharName'>COMBINING
ACUTE ACCENT</span>. A normalization scheme is used to standardize the codepoints
so that every character is always represented by the same sequence of codepoints.
Normalization is described in the Unicode Standard Section 5.7, Normalization.

Transformation of data to a normal form. For historical reasons, the Unicode standard allows some characters to have more than one encoded representation. For example, á may be represented as a single codepoint, U+00E1 LATIN SMALL LETTER A WITH ACUTE, or two codepoints, U+0061 LATIN SMALL LETTER A and U+0301 COMBINING ACUTE ACCENT. A normalization scheme is used to standardize the codepoints so that every character is always represented by the same sequence of codepoints. Normalization is described in the Unicode Standard Section 5.7, Normalization.

Transformation of data to a normal form. For historical reasons, the Unicode
standard allows some characters to have more than one encoded representation.
For example, á may be represented as a single codepoint, <Character char="á"/>,
or two codepoints, <Character char="a"/> and <Character usv="0301"/>.
A normalization scheme is used to standardize the codepoints so that every
character is always represented by the same sequence of codepoints. Normalization
is described in the Unicode Standard Section 5.7, Normalization.

Transformation of data to a normal form. For historical reasons, the Unicode standard allows some characters to have more than one encoded representation. For example, á may be represented as a single codepoint, á , or two codepoints, a and ◌́ . A normalization scheme is used to standardize the codepoints so that every character is always represented by the same sequence of codepoints. Normalization is described in the Unicode Standard Section 5.7, Normalization.

<CharacterTable startUsv="063D" endUsv="063F" caption="Farsi Yeh forms"/>

Consider these Farsi Yeh forms:

USV Char Unicode Character Name
ؽ
ؾ
ؿ
Farsi Yeh forms
<ScriptInfo scriptCode="Bali" />
Property Value
script_alternateNames
script_baseline hanging
script_casing no
script_code Beng
script_complexPos yes
script_contextualForms yes
script_diacritics yes
script_direction LTR
script_displayCode Beng
script_family Indic
script_features
script_ligatures required
script_name Bengali (Bangla)
script_numericCode 325
script_numericCodePlus 325 (alphasyllabic)
script_openTypeTags beng, bng2
script_reordering yes
script_sample a:8:{s:9:"file_name";s:16:"Bengali-crop.png";s:14:"file_MIME_type";s:9:"image/png";s:9:"file_type";s:3:"png";s:10:"is_package";b:0;s:7:"caption";s:92:"From the Universal Declaration of Human Rights in Bengali. See Use and History subject area.";s:6:"author";s:0:"";s:9:"copyright";s:0:"";s:5:"notes";s:0:"";}
script_shortName Bengali script
script_sortName Bengali (Bangla)
script_splitGraphs yes
script_status Current
script_subjectTabFlags 123
script_type abugida
script_typicalFont
script_whiteSpace between words
Properties for Bengali (Bangla) (Beng)

Show scripts that are do reordering:

<ScriptTable filter="script_reordering=yes" />
Script Code Script Name Script Type Script Family
Bali Balinese abugida Insular Southeast Asian
Beng Bengali (Bangla) abugida Indic
Brah Brahmi abugida Indic
Deva Devanagari (Nagari) abugida Indic
Dogr Dogra abugida Indic
Gujr Gujarati abugida Indic
Guru Gurmukhi abugida Indic
Hmng Pahawh Hmong alphabet East Asian
Khmr Khmer abugida Mainland Southeast Asian
Kthi Kaithi abugida Indic
Laoo Lao abugida Mainland Southeast Asian
Lepc Lepcha (Róng) abugida Central Asian
Maka Makasar abugida Insular Southeast Asian
Mlym Malayalam abugida Indic
Mtei Meitei Mayek (Meithei, Meetei) abugida Indic
Mymr Myanmar (Burmese) abugida Mainland Southeast Asian
Newa Newa, Newar, Newari, Nepāla lipi abugida Indic
Qa74 Bhujinmol abugida Indic
Ranj Ranjana abugida Indic
Shrd Sharada, Śāradā abugida Indic
Sind Khudawadi, Sindhi abugida Indic
Sinh Sinhala abugida Indic
Sund Sundanese abugida Insular Southeast Asian
Talu New Tai Lue abugida Mainland Southeast Asian
Taml Tamil abugida Indic
Tavt Tai Viet abugida Mainland Southeast Asian
Thai Thai abugida Mainland Southeast Asian
Toto Toto alphabet Indic
Tutg Tulu-Tigalari abugida Indic

The following table displays data from the characters table, showing USV and character name. The results are filtered to include only records where the character_category is Lm and Unicode version is 3.0, and are sorted by usv in ascending order.

<DataTable
table = "characters"
fields = "character_usv=USV, character_name=Character Name"
filter = "character_category = 'Lm' AND character_unicodeVersion = '3.0'"
order = "character_usv ASC"
caption = "List of Modifier Letters (Lm) added in Unicode 3.0"
/>
USVCharacter Name
02ECMODIFIER LETTER VOICING
02EEMODIFIER LETTER DOUBLE APOSTROPHE
17D7KHMER SIGN LEK TOO
1843MONGOLIAN LETTER TODO LONG VOWEL SIGN
A015YI SYLLABLE WU
List of Modifier Letters (Lm) added in Unicode 3.0