Component Coredata Examples

Inline USV example #1

Instead of writing “U+0634 ش ARABIC LETTER SHEEN” (and trying to figure out how to type the correct character and find the exact Unicode character name), you can just write <Character usv="0634"/> to produce carefully-styled text with USV, character and Unicode character name: U+0634 ش ARABIC LETTER SHEEN

You can also specify a character by typing it literally <Character char="á"/> to produce: U+00E1 á LATIN SMALL LETTER A WITH ACUTE

Inline USV example #2

From the glossary, using inline HTML:

Transformation of data to a normal form. For historical reasons,
the Unicode Standard allows some characters to have more than one
encoded representation. For example, _á_ may be represented as a single codepoint,
<span class='USV'>U+00E1</span>&nbsp;<span class='UnicodeCharName'>
LATIN SMALL LETTER A WITH ACUTE</span>, or two codepoints, <span class='USV'>
U+0061</span>&nbsp;<span class='UnicodeCharName'>LATIN SMALL LETTER A</span>
and <span class='USV'>U+0301</span>&nbsp;<span class='UnicodeCharName'>COMBINING
ACUTE ACCENT</span>. A normalization scheme is used to standardize the codepoints
so that every character is always represented by the same sequence of codepoints.
Normalization is described in the Unicode Standard Section 5.7, Normalization.

Transformation of data to a normal form. For historical reasons, the Unicode standard allows some characters to have more than one encoded representation. For example, á may be represented as a single codepoint, U+00E1 LATIN SMALL LETTER A WITH ACUTE, or two codepoints, U+0061 LATIN SMALL LETTER A and U+0301 COMBINING ACUTE ACCENT. A normalization scheme is used to standardize the codepoints so that every character is always represented by the same sequence of codepoints. Normalization is described in the Unicode Standard Section 5.7, Normalization.

Using the `Character` component:

Transformation of data to a normal form. For historical reasons, the Unicode
standard allows some characters to have more than one encoded representation.
For example, á may be represented as a single codepoint, <Character char="á"/>,
or two codepoints, <Character char="a"/> and <Character usv="0301"/>.
A normalization scheme is used to standardize the codepoints so that every
character is always represented by the same sequence of codepoints. Normalization
is described in the Unicode Standard Section 5.7, Normalization.

Transformation of data to a normal form. For historical reasons, the Unicode standard allows some characters to have more than one encoded representation. For example, á may be represented as a single codepoint, U+00E1 á LATIN SMALL LETTER A WITH ACUTE, or two codepoints, U+0061 a LATIN SMALL LETTER A and U+0301 ◌́ COMBINING ACUTE ACCENT. A normalization scheme is used to standardize the codepoints so that every character is always represented by the same sequence of codepoints. Normalization is described in the Unicode Standard Section 5.7, Normalization.

CharacterTable example

<CharacterTable startUsv="063D" endUsv="063F" caption="Farsi Yeh forms"/>

Consider these Farsi Yeh forms:

USV	Char	Unicode Character Name
U+063D	ؽ	ARABIC LETTER FARSI YEH WITH INVERTED V
U+063E	ؾ	ARABIC LETTER FARSI YEH WITH TWO DOTS ABOVE
U+063F	ؿ	ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE

Farsi Yeh forms

ScriptInfo example

<ScriptInfo scriptCode="Bali" />

Property	Value
script_alternateNames
script_baseline	hanging
script_casing	no
script_code	Beng
script_complexPos	yes
script_contextualForms	yes
script_diacritics	yes
script_direction	LTR
script_displayCode	Beng
script_family	Indic
script_features
script_ligatures	required
script_name	Bengali (Bangla)
script_numericCode	325
script_numericCodePlus	325 (alphasyllabic)
script_openTypeTags	beng, bng2
script_reordering	yes
script_sample	a:8:{s:9:"file_name";s:16:"Bengali-crop.png";s:14:"file_MIME_type";s:9:"image/png";s:9:"file_type";s:3:"png";s:10:"is_package";b:0;s:7:"caption";s:92:"From the Universal Declaration of Human Rights in Bengali. See Use and History subject area.";s:6:"author";s:0:"";s:9:"copyright";s:0:"";s:5:"notes";s:0:"";}
script_shortName	Bengali script
script_sortName	Bengali (Bangla)
script_splitGraphs	yes
script_status	Current
script_subjectTabFlags	123
script_type	abugida
script_typicalFont
script_whiteSpace	between words

Properties for Bengali (Bangla) (Beng)

ScriptTable example

Show scripts that are do reordering:

<ScriptTable filter="script_reordering=yes" />

Script Code	Script Name	Script Type	Script Family
Bali	Balinese	abugida	Insular Southeast Asian
Beng	Bengali (Bangla)	abugida	Indic
Brah	Brahmi	abugida	Indic
Deva	Devanagari (Nagari)	abugida	Indic
Dogr	Dogra	abugida	Indic
Gujr	Gujarati	abugida	Indic
Guru	Gurmukhi	abugida	Indic
Hmng	Pahawh Hmong	alphabet	East Asian
Khmr	Khmer	abugida	Mainland Southeast Asian
Kthi	Kaithi	abugida	Indic
Laoo	Lao	abugida	Mainland Southeast Asian
Lepc	Lepcha (Róng)	abugida	Central Asian
Maka	Makasar	abugida	Insular Southeast Asian
Mlym	Malayalam	abugida	Indic
Mtei	Meitei Mayek (Meithei, Meetei)	abugida	Indic
Mymr	Myanmar (Burmese)	abugida	Mainland Southeast Asian
Newa	Newa, Newar, Newari, Nepāla lipi	abugida	Indic
Qa74	Bhujinmol	abugida	Indic
Ranj	Ranjana	abugida	Indic
Shrd	Sharada, Śāradā	abugida	Indic
Sind	Khudawadi, Sindhi	abugida	Indic
Sinh	Sinhala	abugida	Indic
Sund	Sundanese	abugida	Insular Southeast Asian
Talu	New Tai Lue	abugida	Mainland Southeast Asian
Taml	Tamil	abugida	Indic
Tavt	Tai Viet	abugida	Mainland Southeast Asian
Thai	Thai	abugida	Mainland Southeast Asian
Toto	Toto	alphabet	Indic
Tutg	Tulu-Tigalari	abugida	Indic

DataTable example

The following table displays data from the characters table, showing USV and character name. The results are filtered to include only records where the character_category is Lm and Unicode version is 3.0, and are sorted by usv in ascending order.

<DataTable
  table   = "characters"
  fields  = "character_usv=USV, character_name=Character Name"
  filter  = "character_category = 'Lm' AND character_unicodeVersion = '3.0'"
  order   = "character_usv ASC"
  caption = "List of Modifier Letters (Lm) added in Unicode 3.0"
/>

USV	Character Name
02EC	MODIFIER LETTER VOICING
02EE	MODIFIER LETTER DOUBLE APOSTROPHE
17D7	KHMER SIGN LEK TOO
1843	MONGOLIAN LETTER TODO LONG VOWEL SIGN
A015	YI SYLLABLE WU

List of Modifier Letters (Lm) added in Unicode 3.0