Unicode

Tutorial
by Rainer Erich Scheichelbauer
en fr zh

27 July 2022 Published on 12 February 2013

For your font to work properly, some of your glyphs need proper Unicodes. Here is essentially what you need to know about it as a type designer.

Characters versus glyphs

Type designers create new glyphs, i.e. pictures representing characters. Type designers usually do not create new characters, i.e. the meanings of those pictures, or, more technically put, ‘the smallest component of written language that has semantic value.’ We don’t invent alphabets, we merely re-interpret existing ones. (Okay, sometimes, we do invent new alphabets, but that’s a different story. Keep calm and continue reading.)

In short: characters are what you type, glyphs are what you see.

One glyph usually corresponds to one character, be it a letter, a figure or a punctuation sign. Characters have Unicodes. The glyph-character relationship is expressed by the Unicode value associated to the glyph. Glyphs displays the Unicode value in various ways. For instance, in Font View, you’ll find it right next to the glyph name:

A glyph can also represent more than one character at once. Take an f_f_f ligature as an example. It represents three f characters in a row. Ligatures do not have Unicodes, because the separate characters already have codes and the the fact that it’s a ligature does not change the meaning of its parts. (Well, actually, some ligatures do have legacy codes, but solely for backwards compatibility with outdated encodings from the long-gone, dark ages of eight-bit computing. E.g. f_f can have the U+FB00 LATIN SMALL LIGATURE FF code point. If ‘eight-bit’ does not tell you anything, please erase everything you read within these parentheses from your memory immediately, keep calm and continue reading.)

Sometimes, a glyph only serves as a part for other glyphs. You can, for instance, use separate glyphs for your serifs. So, to sum up, a glyph is a picture, either of a character, or of many characters or as a part in other glyphs. And type designers draw such pictures.

Unicode, planes and blocks

Over the centuries, the alphabet has become more complicated than the regular a to z. First, we have all sorts of diacritics, special letters, (almost) everything in both upper- and lowercase. Then, there’s much, much, much more than just the Latin script. Plus hundreds and thousands of symbols, figures, punctuation signs et cetera et cetera et cetera.

So, to cut a long story short, we need a way to keep track of everything. Something like a huge table for all characters of all scripts, essentially everything you could possibly ever want to enter as a part of a text anywhere. Well, this is called Unicode.

Unicode is a table that maps codes to symbols and a bunch of metadata. There are different ways to represent the individual codes (known as Unicode transformation formats or UTF, and Glyphs uses the hexadecimal UTF-16 convention. Hexadecimal, in case you did not know, is counting with 16 instead of the usual 10 digits. We just use the letters A through F as additional digits, so we count like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1A, 1B, and so on, you get the idea.

Unicode has seventeen planes of 65,536 characters each, the most important being the Basic Multilingual Plane BMP or Plane 0) ranging from U+0000 to U+FFFF and the Supplementary Multilingual Plane SMP or Plane 1) from U+10000 to U+1FFFF. Planes are subdivided in many character blocks, usually comprising a script, e.g. Tamil from U+0B80 to U+0BFF or something funny like Emoticons (starting at U+1F600).

Not all (17×65,536=) 1,114,112 code points of Unicode are currently in use. At the time of writing, the latest Unicode standard was version 14.0.0 from September 2021 with 144,697 characters. Some parts of Unicode are intentionally left free for private use. Unicode knows three such Private Use Areas, the most important one ranging from U+E000 to U+F8FF.

For unofficial scripts that are unlikely to make it into official Unicode, like Klingon pIqaD, geeks have come up with the ConScript Unicode Registry which makes use of the Private Use Area.

Unicode and Glyphs

In Glyphs, you usually do not need to worry about Unicodes, since Glyphs determines the right code (or the lack of a code) by the glyph name. Open Window > Glyph Info (Cmd-Opt-I) and look for specific glyphs by typing (part of) a glyph name in the search field:

Glyphs has a built-in database of predefined glyph names. You can add your own definitions if you are so inclined. The Glyph Info window displays name, Unicode (if available) and some metadata about each glyph. Starting in Glyphs version 1.3.18, you can quickly add a glyph to your font by selecting it and pushing the Add to Font button in the bottom right corner.

If you resize your Glyph Info window, you can even see the very handy decomposition information for the individual glyphs:

For example, if you plan to add adieresismacron to your font, make sure you have a, dieresis and macron already in your font. Then Glyphs can use them as components for adieresismacron and build the complete glyph right away. Cool.

If you plan to add a glyph that has no entry in the Glyph Info window, you can use the uniXXXX scheme for relating it to a character in the Basic Multilingual Plane. The XXXX stands for the four-digit UTF-16 hexadecimal code. Use uXXXXX for any of the higher planes, i.e. just a u instead of uni. Glyphs will then automatically add the proper Unicode.

Keep in mind that not every glyph needs a Unicode. Especially if the glyph is invoked via an OpenType feature, there is no need to encode the glyph. You only need a code if you need to type it directly. For example, you do not need to encode small caps or a stylistic variation of a letter, since the user will type his or her text in plain upper- and lowercase, and then activate the appropriate function of the software he or she uses.

Multiple Unicode values for the same shape

What if you want to share the same glyph shape between two Unicode values? There are a few situation where you would need that. E.g., the symbol increment U+2206 and the Greek letter Delta U+0394 should look the same. There is a similar issue with Ohm U+2126 and Omega U+03A9. Or, you are creating an all-caps font. Or you simply want to reuse the same space glyph for both the space U+0020 and non-breaking space U+00A0.

You have two options: double mappings or double glyphs.

  1. Double mappings: You give the same glyph multiple Unicode values. To do this, simply click in the Unicode field and type the second Unicode value, e.g., add 00A0 next to 0020 in the space, and press Return:

This has the advantage of keeping the file size small. So, this may be a good idea for webfonts, which need to be as small as possible. On the downside, there is an issue with copying text out of a PDF that was set with a font containing double mappings. (If you want to know more about it, read the section ‘Problems with Double Encodings’ in the All Caps tutorial.) We still recommend this method though, because extracting text out of a PDF is broken on so many levels that it is a hopeless case anyway, and thus not worth shedding tears over.

  1. Double glyphs: You can duplicate a glyph as a component copy with a simple recipe. A component copy is a glyph that has exactly one other glyph as a component. If the component is auto-aligned, the original and the duplicate always stay in sync. It is easy to create a component copy with a recipe: Choose Glyph > Add Glyphs… (Cmd-Shift-G) and type the original glyph name, followed by an equals sign, followed by the name of the duplicate.

Example: you already have the Greek Delta, but want to reuse it as increment symbol (and you are concerned about the PDF text extraction issues mentioned above). You press Cmd-Shift-G to bring up the Add Glyphs… dialog, type Delta=increment, and press Generate:

If you want to keep space glyphs identical, you do not need components, of course. It is much smarter to simply use a Metrics Key for the width. For example, if you want the non-breaking space to be the same as the regular space, open your nbspace glyph for editing, and type =space in the Width field. Then choose Glyph > Update Metrics (Ctrl-Cmd-M) to make sure the glyph width is in sync. Or, if you have multiple masters, hold down the Option key, and the command will magically change into Update Metrics on All Masters. Cool. Read more about Metrics Keys in the Spacing tutorial.

UnicodeChecker

Glyphs can interoperate with a shareware tool called UnicodeChecker from the great people at Earthling Soft. UnicodeChecker helps you quickly find all the info you need about characters. To invoke UnicodeChecker, click on the little arrow next to the Unicode in the grey info area:

Glyphs will then send the code of the current glyph to UnicodeChecker, which in turn will tell you all sorts of useful information about the character in question:

UnicodeChecker is also useful for quickly finding all characters related to a script. For instance, if you really want to cover all Latin characters, switch to UnicodeChecker and in the Character Blocks > Alphabetically: Ideographic-Lydian submenu, you will find all Latin blocks:

If you haven’t installed UnicodeChecker yet, Glyphs will prompt you to do so when you click on the arrow. Downloading and installing UnicodeChecker is free. However, if you do use it on a regular basis, think about how much time and nerves this little tool saves you and consider a donation to the makers. Support shareware, it supports you.


Update 2019-02-13: changed for Unicode 11.0.0 and updated link, rewrite of Double Mappings.
Update 2022-08-03: changed for Unicode 14.0.0, updated screenshots, related articles, minor formatting.