Unicode

  • by Rainer Erich Scheichelbauer
  • Tutorial

If you want to make sure your font works properly, some of your glyphs need proper Unicodes. This is what you need to know about it as a type designer.

Characters versus glyphs

Type designers create new glyphs, i.e. pictures representing characters. Type designers usually do not create new characters, i.e. the meanings of those pictures, or, more technically put, ‘the smallest component of written language that has semantic value.’ We don’t invent alphabets, we merely re-interpret existing ones. (Okay, sometimes, we do invent new alphabets, but that’s a different story. Keep calm and continue reading.)

In short: characters are what you type, glyphs are what you see.

One glyph usually corresponds to one character, be it a letter, a figure or a punctuation sign. Characters have Unicodes. The glyph-character relationship is expressed by the Unicode value associated to the glyph. Glyphs displays the Unicode value in various ways. For instance, in Font View, you’ll find it right next to the glyphname:

A glyph can also represent more than one character at once. Take an f_f_f ligature as an example. It represents three f characters in a row. Ligatures do not have Unicodes, because the separate characters already have codes and the the fact that it’s a ligature does not change the meaning of its parts. (Well, actually, some ligatures do have legacy codes, but solely for backwards compatibility with outdated encodings from the long-gone, dark ages of eight-bit computing. E.g. f_f can have the U+FB00 LATIN SMALL LIGATURE FF code point. If ‘eight-bit’ does not tell you anything, please erase everything you read within these parentheses from your memory immediately, keep calm and continue reading.)

Sometimes, a glyph only serves as a part for other glyphs. You can, for instance, use separate glyphs for your serifs. So, to sum up, a glyph is a picture, either of a character, or of many characters or as a part in other glyphs. And type designers draw such pictures.

Unicode, planes and blocks

Over the centuries, the alphabet has become more complicated than the regular a to z. First, we have all sorts of diacritics, special letters, (almost) everything in both upper- and lowercase. Then, there’s much, much, much more than just the Latin script. Plus hundreds and thousands of symbols, figures, punctuation signs et cetera et cetera et cetera.

So, to cut a long story short, we need a way to keep track of everything. Something like a huge table for all characters of all scripts, essentially everything you could possibly ever want to enter as a part of a text anywhere. Well, this is called Unicode.

Unicode is a table that maps codes to symbols and a bunch of metadata. There are different ways to represent the individual codes (known as Unicode transformation formats or UTF), and Glyphs uses the hexadecimal UTF-16 convention. Hexadecimal, in case you did not know, is counting with 16 instead of the usual 10 digits. We just use the letters A through F as additional digits, so we count like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1A, 1B, and so on, you get the idea.

Unicode has seventeen planes of 65,536 characters each, the most important being the Basic Multilingual Plane (BMP or Plane 0) ranging from U+0000 to U+FFFF and the Supplementary Multilingual Plane (SMP or Plane 1) from U+10000 to U+1FFFF. Planes are subdivided in many character blocks, usually comprising a script, e.g. Tamil from U+0B80 to U+0BFF or something funny like Emoticons (starting at U+1F600).

Not all (17×65,536=) 1,114,112 code points of Unicode are currently in use. At the time of writing, the latest Unicode standard was version 6.2 from September 2012 with 110,117 characters. Some parts of Unicode are intentionally left free for private use. Unicode knows three such Private Use Areas (PUA), the most important one ranging from U+E000 to U+F8FF.

For unofficial scripts that are unlikely to make it into official Unicode, like Klingon pIqaD, geeks have come up with the ConScript Unicode Registry which makes use of the Private Use Area.

Unicode and Glyphs

In Glyphs, you usually do not need to worry about Unicodes, since Glyphs determines the right code (or the lack of a code) by the glyphname. Open Window > Glyph Info (Cmd-Opt-I) and look for specific glyphs by typing (part of) a glyph name in the search field:

Glyphs has a built-in database of predefined glyphnames. You can add your own definitions if you are so inclined. The Glyph Info window displays name, Unicode (if available) and some metadata about each glyph. Starting in Glyphs version 1.3.18, you can quickly add a glyph to your font by selecting it and pushing the Add to Font button in the bottom right corner.

If you resize your Glyph Info window, you can even see the very handy decomposition information for the individual glyphs:

For example, if you plan to add adieresismacron to your font, make sure you have a, dieresis and macron already in your font. Then Glyphs can use them as components for adieresismacron and build the complete glyph right away. Cool.

If you plan to add a glyph that has no entry in the Glyph Info window, you can use the uniXXXX scheme for relating it to a character in the Basic Multilingual Plane. The XXXX stands for the four-digit UTF-16 hexadecimal code. Use uXXXXX for any of the higher planes, i.e. just a u instead of uni. Glyphs will then automatically add the proper Unicode.

Keep in mind that not every glyph needs a Unicode. Especially if the glyph is invoked via an OpenType feature, there is no need to encode the glyph. You only need a code if you need to type it directly. For example, you do not need to encode small caps or a stylistic variation of a letter, since the user will type his or her text in plain upper- and lowercase, and then activate the appropriate function of the software he or she uses.

Double mappings

If you have some experience in type design, chances are you have come across double-mapping of glyphs. That means that one glyph, is associated with two different Unicodes. This is bad. Don’t do it. There’s a whole lot of technical problems that comes with this practice. That’s why Glyphs doesn’t let you do it. To tell you the truth, Adobe’s makeotf doesn’t let you do that either. Since Glyphs relies on makeotf under the hood, it wouldn’t make any sense anyway.

But what if you do want two Unicodes for the same glyph? Take, for instance, the Delta glyph, which is both a Greek capital letter (U+0394) and a mathematical symbol (U+2206)? In cases like this, take a look at Window > Glyph Info (Cmd-Opt-I) and look for the Unicodes you want to apply:

You will quickly find out that the mathematical symbol is called increment while the Greek letter is simply called Delta. If you want both to have the same shape, draw Delta first, and then choose Glyph > Add Glyphs… (Cmd-Shift-G), type Delta=increment and hit the Generate button. This will create a glyph called increment with a Delta component in it. Metrics are automatically linked unless you disable automatic alignment.

More generally put, if you want to reuse the same shape for two different Unicodes, add a component-based copy of the glyph using the firstGlyphName=secondGlyphName and make sure you give both glyphs their proper names.

UnicodeChecker

Glyphs can interoperate with a shareware tool called UnicodeChecker from the great people at Earthling Soft. UnicodeChecker helps you quickly find all the info you need about characters. To invoke UnicodeChecker, click on the little arrow next to the Unicode in the grey info area:

Glyphs will then send the code of the current glyph to UnicodeChecker, which in turn will tell you all sorts of useful information about the character in question:

UnicodeChecker is also useful for quickly finding all characters related to a script. For instance, if you really want to cover all Latin characters, switch to UnicodeChecker and in the Character Blocks > Alphabetically: Ideographic-Lydian submenu, you will find all Latin blocks:

If you haven’t installed UnicodeChecker yet, Glyphs will prompt you to do so when you click on the arrow. Downloading and installing UnicodeChecker is free. However, if you do use it on a regular basis, think about how much time and nerves this little tool saves you and consider a donation to the makers. Support shareware, it supports you.