Roll your own glyph data

Tutorial
by Rainer Erich Scheichelbauer
en zh

8 August 2022 Published on 26 January 2013

Each glyph name has a bulk of info associated with it, from Unicode value to categorization and sorting. All this is stored in the internal glyph database, or shortly: the ‘Glyph Data’.

Ever wondered why Glyphs knows so much about the glyphs you create? Like, when you enter the name of a glyph, it sets the Unicode and when you choose Glyph > Set Anchors (Cmd-U) or Glyph > Reset Anchors (Cmd-Shift-U), it sets the correct diacritical anchors. Where does it store all that information?

GlyphData.xml

Tucked away inside the Glyphs application, there is a file called GlyphData.xml. You are not supposed to touch the one inside the app, but you can make a copy of it in the Application Support folder and keep your personal customizations there. Glyphs will use your glyph info to override the built-in settings.

Okay, so we have to dig into the application and fish out the XML file. To do that, you first make sure that your Glyphs application is called Glyphs 3.app and resides in your Applications folder. Then, copy this line into the clipboard:

/Applications/Glyphs 3.app/Contents/Frameworks/GlyphsCore.framework/Versions/A/Resources/GlyphData.xml

Then, in Finder, choose Go > Go to Folder… (Cmd-Shift-G). Paste the line we just copied, into the dialog that appears:

Hit Go and the Finder will show you a file called GlyphData.xml, buried somewhere in the depths of the Glyphs application:

While the XML file is selected, choose Edit > Copy GlyphData.xml (Cmd-C). You can close the Finder window now.

Now, we have to navigate to the Application Support folder. The easiest way to get there is to choose Script > Open Scripts Folder (Cmd-Shift-Y) in Glyphs. This will take you to the Scripts folder inside the Application Support folder. All you need to do is make sure you are in the enclosing folder. Next to the Scripts folder, you may see Temp and Plugins folders as well as a CustomFilter.plist file. If you do not see an Info folder there yet, it’s time to create it (Cmd-Shift-N). Inside that Info folder, paste the GlyphData.xml:

Editing the XML

Open the XML file with your favorite XML or plaintext editor. Personally, I recommend TextMate, many people also like SublimeText and Atom, and you may also be happy with BBEdit.

Both XML files, the one buried in the app as well as the one we just copied into the Info folder, contain all relevant glyph info. They compliment each other, so you can limit your copy of the XML file to just the letters you need. To do so, remove everything between <glyphData> (should be located around line 25) and </glyphData> (the last line). So you end up with something like this:

Now, an entry for a letter must adhere to a certain form. Let’s take an example. Imagine you want to encode a letter of your favourite script Tengwar, what about U+E000 TENGWAR LETTER TINCO, and maybe throw in U+E046 TENGWAR SIGN ACUTE as well. We would add these lines to our copy of GlyphData.xml:

<glyph unicode="E000" name="tinco-tengwar" decompose="longCarrier-tengwar, ooreStemless-tengwar" category="Letter" subCategory="Primary" script="tengwar" altNames="tincoTengwar, tengwarTinco" production="uniE000" description="TENGWAR LETTER TINCO" anchors="top, bottom" accents="threeDotsAbove-tengwar, threeDotsBelow-tengwar, twoDotsAbove-tengwar, twoDotsBelow-tengwar, dotAbove-tengwar, dotBelow-tengwar, acute-tengwar, doubleAcute-tengwar, rightCurl-tengwar, doubleRightCurl-tengwar, leftCurl-tengwar, doubleLeftCurl-tengwar, nasalizer-tengwar, doubler-tengwar, tilde-tengwar, breve-tengwar"  />
<glyph unicode="E046" name="acute-tengwar" category="Mark" subCategory="Nonspacing" script="tengwar" altNames="acuteTengwar, tengwarAcute, andaith" production="uniE046" description="TENGWAR SIGN ACUTE" />

Now, it should look like this (I admit I cheated a bit with the indentation here):

Save it, restart Glyphs, take a look in Window > Glyph Info, and search for tengwar to see if Glyphs accepted your addition. And if you did everything right, you will see something like this:

Mission accomplished. Or wait a minute, not quite. It still lacks all the other Tengwar glyphs. But don’t strain yourself, Toshi Omagari has already beaten you to it.

XML specification

You see, if you want to add a new glyph to the database, you have to add an XML element called glyph. Its basic structure is:

<glyph attribute="value" />

Each glyph element can take various attributes of the structure attribute="value". And every glyph entry needs these required attributes:

  • name is the name of the glyph. Glyphs recognizes your glyph by its name, so this must be set to a valid glyph name, and it must be unique all throughout your glyph data.
  • category is the category or group of the glyph. Possible values are:
    • Letter for letters like x or ä or ن or घ
    • Number for figures like 3 or ३ or ۳
    • Mark for both spacing and combining marks like the acute mark
    • Punctuation for things like question mark, period, comma, but also quotes, slashes and asterisks
    • Separator for the wordspace or things like the .notdef glyph
    • Symbol for symbols like ©@§& as well as currency signs, math operators (+−÷×= etc.), arrows, emojis, and the like.

You can (and where possible, should) make use of these optional attributes:

  • description is the Unicode-style descriptive name of your glyph. If you have an encoded glyph, you can find the official name with Unicode Checker or the unofficial name from the ConScript Unicode Registry.

  • unicode is the hexadecimal UTF16 value. Leave it out if you want to create an unencoded glyph like a ligature.

  • subCategory helps you further define the kind of the glyph. This, of course, depends on the category.

  • case defines the letter case for casefolding scripts like Latin, Greek, Cyrillic, Georgian and Armenian. Leave out if it does not apply. Possible values are:

    • upper for uppercase
    • lower for lowercase
    • smallCaps for small caps
    • minor for subscript and superscript letters as well as small figures, which includes scientific inferiors and superiors and fraction figures
  • script defines the scripting system the glyph belongs to. Leave out if it doesn’t belong to any script (e.g. for math symbols). Possible values include latin, arabic, cyrillic, devanagari, ethiopic, greek, han etc. You get the idea.

  • anchors is comma-separated list of possible diacritical anchors for the glyph. The usual suspects are top, bottom, center, ogonek, topleft, topright, bottomleft, bottomright, left, right . Corresponding mark anchors need preceding underscores, e.g. _top. Stackable combining marks can have both kinds of anchors. Omit this attribute if your glyph cannot be a base for a diacritic or vice versa.

  • marks (accents in Glyphs 2) defines the possible accents the glyph can take. This mainly helps Glyphs draw the mark cloud when you click on an anchor.

  • altNames is a comma-separated list of alternate glyph names that are recognised by the application, so the glyph can be sorted or renamed correctly. E.g., oslash was sometimes called ostroke. When you open a legacy font that uses this weird name, Glyphs can update it to oslash when you run Glyph > Update Glyph Info.

  • production is what the glyph is renamed to at export time. Usually describes the legacy Adobe Glyph List name. You probably want to make use of this attribute wherever the AGL uses uni followed by the 4-digit Unicode or u and the 5-digit code. E.g. the glyph element for Romanian and Moldovan Tcommaaccent has both a name="Tcommaaccent" and an production="uni021A" attribute.

  • decompose defines the components of a composite glyph. In other words, the parts that make up the glyph. This information is used when you construct such a letter using the Glyph > Create Composite command (Ctrl-Cmd-C). Make sure the components are listed in the right order. E.g., the base letter comes first and all the accents follow. This is also useful for ligatures. In that case, you add the names of the glyphs that comprise the ligature, e.g. decompose="f, f, k" for an f_f_k ligature.

  • sortName: by default, Glyphs orders the glyphs alphabetically within their category. If you want to manipulate the display order, add this attribute. For instance, to make sure that AE comes after all the A diacritics instead of between Adieresis and Agrave, there’s a sortName="Az" attribute in it. This is very important for figures, where the sortName can look like Number.dnom.4 etc.

  • sortNameKeep: same as sortName, but is preferred if File > Font Info > Other > Keep alternates next to base glyph is turned on.

  • direction explicitly sets the writing direction for a glyph. Can influence component alignment and kerning behavior. To some extent, writing direction is already taken care of by an app-internal list that sets directions for scripts. E.g., if a glyph is defined to belong to Hebrew, for instance, you will usually not need to explicitly set its direction.

    • LTR: left to right.
    • RTL: right to left.
    • BIDI for bidirectional glyphs, i.e., glyphs that can be used in both LTR and RTL text. Typically applies to punctuation, quotes and dashes.
  • unicodeLegacy: same as unicode but will not be written into the font because the character is not supposed to be used anymore. Famous example: the Latin representation forms (mostly ligatures like f_f) that exist in Unicode only for roundtrip conversions with deprecated encodings, but serve no purpose in fonts. (As the only exception, you can force-export legacy Unicode values for Arabic positional forms with the custom parameter Use Arabic Presentation Form Unicodes in your Font Info. However, do not ever do this for a shipping font, because it will break text processing. It only serves beta testing purposes, nothing else. You have been warned.)

EditGlyphData app

There is a smarter and more convenient way to edit your glyph data: it is called EditGlyphData and you will find it in the Tools section of our website. Use it also to merge the data from multiple XML files into one, and, maybe best of all, export the data to a tab-separated text file, so you can edit it in your favourite spreadsheet app. Plus, avoid all XML inconsistency problems (see below). Cool.

Potential pitfalls

Be careful and precise. If you mess up your glyph data, you will run into problems. Here are a few common problems, so you can avoid them right from the start:

First, make sure you always fill out the required attributes. Always.

Secondly, it is seductive to to create your own naming schemes with this trick. But keep in mind that Glyphs expects certain names for automatically building OpenType features. So, you may have to write your own feature code as well, once you roll your own glyph data.

Thirdly, and this is important: Glyphs will ignore your custom glyph data if your GlyphData.xml contains broken XML. So, make sure you properly validate your XML from time to time. Many tools like TextMate sport a built-in validator. You can, of course, also copy and paste your XML into a web-based validator such as the W3 Markup Validation Service. Or, use the EditGlyphData app (see above).

That’s all there is to injecting your custom info into Glyphs. If you feel that other people could also profit from your additions, you can put your GlyphData.xml file on Github, and if you do so, do it with the GlyphsInfo GitHub repository (instructions in the readme). Or, if you are uncomfortable with git, post a suggestion in the Glyphs Forum, or, if you think your changes should be the default for every Glyphs user, file a feature request in the forum or get in touch with us otherwise.


Update 2013-02-03: added note on how to navigate to GlyphData.xml in 10.6 (thanks @typefacts); two minor text improvements.
Update 2013-02-12: added ‘Potential pitfalls’.
Update 2013-03-25: corrected XML file name typo (thanks George Thomas).
Update 2014-12-11: updated to new notation for dotless glyphs, changed the sortName example to AE.
Update 2015-07-08: Partial rewrite. Updated to Glyphs 2, new and updated links. Removed outdated passages. Added Tengwar and links (thanks @Tosche_E).
Update 2017-05-30: added reference to EditGlyphData app, removed deprecated bugreport link.
Update 2018-09-25: added link to GlyphsInfo repo. Thx Dave.
Update 2021-12-10: minor updates for Glyphs 3.
Update 2022-01-19: added missing attributes unicodeLegacy and direction. Thx Deluge.