Skip to content

Unicode Character Browsing

For the casually interested, browsing among the thousands of characters available in the Unicode standard can be a fascinating experience. For those implementing writing system components, finding exact details about a particular character or set of characters can be essential to a successful design.

Fortunately there are a variety of tools available for browsing the Unicode characters and their properties, and this article discusses a number of them.

Each tool has strengths and weaknesses, so you may not find one tool that does all you need.

The starting place — the horse’s mouth as it were — for all Unicode character information is the Unicode website itself. There you can browse the latest version of the standard as well as many of the archived previous versions. The text chapters of the standard and the charts are generally provided as PDFs, while other textual items such as the annexes and technical reports are provided as html pages. Most of the details about a particular character, however, are encoded in data files known as the Unicode Character Database or UCD.

Don’t be frightened off by the term database — all the files in the UCD are human readable text files so you can open them directly in your favorite plain text editor. Even so, there are over 60 separate files in the UCD (or, if you prefer XML, the same information is contained in a handful of structured XML files), so finding your way around in this forest can be difficult. And the primary UCD file contains over 24,000 lines of text — and that is not counting the 300,000 lines describing unified Han characters that are elsewhere in the UCD. Whew!

This is where Unicode character browsers help out: they provide easy ways to find and study the characters that interest you.

Arabic block shown using Unibook from Unicode
Arabic block shown using Unibook from Unicode

Unibook’s sophisticated search facilities can find and highlight characters based on up to four criteria including Unicode property values, character names, font coverage, and even user-supplied property files. Other advanced capabilities include Bidi and Line Break algorithm demos, legacy character set views, and flexible font selections.

One of Unibook’s main strengths is that it runs locally and does not require an internet connection. Unfortunately it has two weaknesses: it is Windows only, and can display glyphs only for those characters for which you have fonts installed on your system. Nonetheless it has been a mainstay of Unicode developers for years.

Arabic block shown using Richard Ishida$quot;s UniView
Arabic block shown using Richard Ishida$quot;s UniView

By default UniView displays server-generated graphics for each character, meaning you don’t have to have the fonts on your local machine. There is an option to display characters as text rather than graphics, in which case you would need to configure the desired fonts into your browser settings.

In addition to browsing the Unicode character database, UniView can serve as a character picker and as a clipboard viewer, allowing one to see exactly what Unicode characters are in the clipboard text.

![FileFormat.Info character detail](images/unicode-character-browsing-4.Info character detail”)One of the links on UniView leads us to another useful Unicode resource at FileFormat.Info. At left is the information FileFormat.Info displays for a single Arabic character. One of the strengths of this site is that it shows how the character would be expressed in various Unicode Encoding Forms and programming languages. Another useful aspect of this site is that you can put the character you want directly in the URL for example:

http://www.fileformat.info/info/unicode/char/067b/index.htm

Thus you could set up quick-searches in your browser to go directly to the character.

This article, written by Bob Hallissy, formerly appeared on ScriptSource.