Component Coredata Examples
Inline USV example #1
Section titled “Inline USV example #1”Instead of writing “U+0634 ش ARABIC LETTER SHEEN” (and trying to figure out how to type the correct character and find the exact Unicode character name), you can just write <Character usv="0634"/> to produce carefully-styled text with USV, character and Unicode character name: ش
You can also specify a character by typing it literally <Character char="á"/> to produce: á
Inline USV example #2
Section titled “Inline USV example #2”From the glossary, using inline HTML:
Section titled “From the glossary, using inline HTML:”Transformation of data to a normal form. For historical reasons,the Unicode Standard allows some characters to have more than oneencoded representation. For example, _á_ may be represented as a single codepoint,<span class='USV'>U+00E1</span> <span class='UnicodeCharName'>LATIN SMALL LETTER A WITH ACUTE</span>, or two codepoints, <span class='USV'>U+0061</span> <span class='UnicodeCharName'>LATIN SMALL LETTER A</span>and <span class='USV'>U+0301</span> <span class='UnicodeCharName'>COMBININGACUTE ACCENT</span>. A normalization scheme is used to standardize the codepointsso that every character is always represented by the same sequence of codepoints.Normalization is described in the Unicode Standard Section 5.7, Normalization.Transformation of data to a normal form. For historical reasons, the Unicode standard allows some characters to have more than one encoded representation. For example, á may be represented as a single codepoint, U+00E1 LATIN SMALL LETTER A WITH ACUTE, or two codepoints, U+0061 LATIN SMALL LETTER A and U+0301 COMBINING ACUTE ACCENT. A normalization scheme is used to standardize the codepoints so that every character is always represented by the same sequence of codepoints. Normalization is described in the Unicode Standard Section 5.7, Normalization.
Using the Character component:
Section titled “Using the Character component:”Transformation of data to a normal form. For historical reasons, the Unicodestandard allows some characters to have more than one encoded representation.For example, á may be represented as a single codepoint, <Character char="á"/>,or two codepoints, <Character char="a"/> and <Character usv="0301"/>.A normalization scheme is used to standardize the codepoints so that everycharacter is always represented by the same sequence of codepoints. Normalizationis described in the Unicode Standard Section 5.7, Normalization.Transformation of data to a normal form. For historical reasons, the Unicode standard allows some characters to have more than one encoded representation. For example, á may be represented as a single codepoint, á , or two codepoints, a and ◌́ . A normalization scheme is used to standardize the codepoints so that every character is always represented by the same sequence of codepoints. Normalization is described in the Unicode Standard Section 5.7, Normalization.
CharacterTable example
Section titled “CharacterTable example”<CharacterTable startUsv="063D" endUsv="063F" caption="Farsi Yeh forms"/>Consider these Farsi Yeh forms:
| USV | Char | Unicode Character Name |
|---|---|---|
| U+063D | ؽ | ARABIC LETTER FARSI YEH WITH INVERTED V |
| U+063E | ؾ | ARABIC LETTER FARSI YEH WITH TWO DOTS ABOVE |
| U+063F | ؿ | ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE |
ScriptInfo example
Section titled “ScriptInfo example”<ScriptInfo scriptCode="Bali" />| Property | Value |
|---|---|
| script_alternateNames | |
| script_baseline | hanging |
| script_casing | no |
| script_code | Beng |
| script_complexPos | yes |
| script_contextualForms | yes |
| script_diacritics | yes |
| script_direction | LTR |
| script_displayCode | Beng |
| script_family | Indic |
| script_features | |
| script_ligatures | required |
| script_name | Bengali (Bangla) |
| script_numericCode | 325 |
| script_numericCodePlus | 325 (alphasyllabic) |
| script_openTypeTags | beng, bng2 |
| script_reordering | yes |
| script_sample | a:8:{s:9:"file_name";s:16:"Bengali-crop.png";s:14:"file_MIME_type";s:9:"image/png";s:9:"file_type";s:3:"png";s:10:"is_package";b:0;s:7:"caption";s:92:"From the Universal Declaration of Human Rights in Bengali. See Use and History subject area.";s:6:"author";s:0:"";s:9:"copyright";s:0:"";s:5:"notes";s:0:"";} |
| script_shortName | Bengali script |
| script_sortName | Bengali (Bangla) |
| script_splitGraphs | yes |
| script_status | Current |
| script_subjectTabFlags | 123 |
| script_type | abugida |
| script_typicalFont | |
| script_whiteSpace | between words |
ScriptTable example
Section titled “ScriptTable example”Show scripts that are do reordering:
<ScriptTable filter="script_reordering=yes" />| Script Code | Script Name | Script Type | Script Family |
|---|---|---|---|
| Bali | Balinese | abugida | Insular Southeast Asian |
| Beng | Bengali (Bangla) | abugida | Indic |
| Brah | Brahmi | abugida | Indic |
| Deva | Devanagari (Nagari) | abugida | Indic |
| Dogr | Dogra | abugida | Indic |
| Gujr | Gujarati | abugida | Indic |
| Guru | Gurmukhi | abugida | Indic |
| Hmng | Pahawh Hmong | alphabet | East Asian |
| Khmr | Khmer | abugida | Mainland Southeast Asian |
| Kthi | Kaithi | abugida | Indic |
| Laoo | Lao | abugida | Mainland Southeast Asian |
| Lepc | Lepcha (Róng) | abugida | Central Asian |
| Maka | Makasar | abugida | Insular Southeast Asian |
| Mlym | Malayalam | abugida | Indic |
| Mtei | Meitei Mayek (Meithei, Meetei) | abugida | Indic |
| Mymr | Myanmar (Burmese) | abugida | Mainland Southeast Asian |
| Newa | Newa, Newar, Newari, Nepāla lipi | abugida | Indic |
| Qa74 | Bhujinmol | abugida | Indic |
| Ranj | Ranjana | abugida | Indic |
| Shrd | Sharada, Śāradā | abugida | Indic |
| Sind | Khudawadi, Sindhi | abugida | Indic |
| Sinh | Sinhala | abugida | Indic |
| Sund | Sundanese | abugida | Insular Southeast Asian |
| Talu | New Tai Lue | abugida | Mainland Southeast Asian |
| Taml | Tamil | abugida | Indic |
| Tavt | Tai Viet | abugida | Mainland Southeast Asian |
| Thai | Thai | abugida | Mainland Southeast Asian |
| Toto | Toto | alphabet | Indic |
| Tutg | Tulu-Tigalari | abugida | Indic |
DataTable example
Section titled “DataTable example”The following table displays data from the characters table, showing USV and character name. The results are filtered to include only records where the character_category is Lm and Unicode version is 3.0, and are sorted by usv in ascending order.
<DataTable table = "characters" fields = "character_usv=USV, character_name=Character Name" filter = "character_category = 'Lm' AND character_unicodeVersion = '3.0'" order = "character_usv ASC" caption = "List of Modifier Letters (Lm) added in Unicode 3.0"/>| USV | Character Name |
|---|---|
| 02EC | MODIFIER LETTER VOICING |
| 02EE | MODIFIER LETTER DOUBLE APOSTROPHE |
| 17D7 | KHMER SIGN LEK TOO |
| 1843 | MONGOLIAN LETTER TODO LONG VOWEL SIGN |
| A015 | YI SYLLABLE WU |