What does a TrueType font look like on the inside? And how do you make one? (Part II)

In this post, I’m going to discuss a simple bitmap vectorization algorithm that’s compatible with TrueType. The goal is to convert a C64 character set to a usable TrueType font.

Most technical details about TrueType and character mapping (i.e. C64 character sets vs. PETSCII vs. ASCII vs. Unicode) can wait for the upcoming blog posts. However, there are already a few facts about C64 characters and TrueType we do have to keep in mind. These are:

  • Each C64 character glyph is 8×8 pixels, monochromatic, and monospaced. Because they are monospaced, we don’t have to concern ourselves with issues like kerning.
  • TrueType only has limited support for bitmap fonts. Therefore, we need to vectorize the 8×8 bitmaps. No need for fancy bezier curves. Just crisp retro-looking blocks with 90-degree corners.
  • A TrueType-style glyph in its most basic form (without hinting) consists of a set of contours, and a contour consists of a set of points. Points in opaque contours are specified in clock-wise order and points in transparent contours (e.g. the “hole” in the letter O) are specified in counter-clockwise order.
  • Some letters go a bit below the baseline. The vertical range above the baseline is called the ascent and the range below is called the descent. Most C64 character sets seem to have an ascent of 7 pixels and a descent of 1 pixel.
  • Coordinates in a TrueType glyph are normally scaled such that the glyph will fit inside a 2048×2048 box. If the font has a non-zero descent specified, the Y-coordinate will be negative below the baseline. Assume a typical 8×8 pixel C64 character with a 1 pixel descent. In TrueType-terms each pixel would then be 256 units by 256 units, the lower left corner would be at (0, -256), and the upper right corner would be at (2048, 1792).
  • A contour should consist of as few points as possible in order to reduce complexity. Consecutive points along an axis can be simplified to include only the endpoints without any differences in the rendering of the glyph. Also, the last edge in a contour is implicit. The renderer will simply assume a direct line from the last point back to the first.

This ought to be enough for now. Let’s now try to convert the default C64 ‘A’-glyph to a TrueType-compatible vectorized form. In ASCII ‘A’ is char #65, but on a C64 in upper-case mode the character set is laid out as follows. In this mode there are 128 characters in both their regular form as well as an reverse video version:

Dump of C64 character generator data (upper-case mode)

Dump of C64 character data (upper-case mode)

Each character consists of 8 bytes that each describe a row of 8 bits. The top left pixel of a given character is the most significant bit in the first byte and the bottom right pixel is the least significant bit in the 8th byte. Let’s look at the first 24 bytes of this data:

0x3C, 0x66, 0x6E, 0x6E, 0x60, 0x62, 0x3C, 0x00, 0x18, 0x3C, 0x66, 0x7E, 0x66,  0x66, 0x66, 0x00, 0x7C, 0x66, 0x66, 0x7C, 0x66, 0x66, 0x7C, 0x00

We are interested in the second character, which is at bytes 0x08…0x0F. If we try to show these as binary with one number per line, we get something like this:

0x08: 0x18 = 0b00011000
0x09: 0x3C = 0b00111100
0x0A: 0x66 = 0b01100110
0x0B: 0x7E = 0b01111110
0x0C: 0x66 = 0b01100110
0x0D: 0x66 = 0b01100110
0x0E: 0x66 = 0b01100110
0x0F: 0x00 = 0b00000000

Or, a bit more clear like this:

The letter 'A' on a C64.

The letter ‘A’ on a C64.

This looks like the letter ‘A’. Notice that it doesn’t go below the baseline (i.e. the lower-most row consists entirely of zeroes). But how do we make this into some nice vector contours? An easy way to vectorize a bitmap like this is to simply draw a clockwise box around each opaque pixel. This, however, is very wasteful:

Naive vectorization

We are already getting close. Notice how all edges on the outside are going in a clockwise direction around the letter, and how the edges in the “hole” are going counter-clockwise. This is exactly what we want. But we only want edges on the boundary of the contours. In the picture above we have 28 opaque pixels with 4 edges each, resulting in 112 edges in total. That’s a bit too much. Let’s change the algorithm to skip all edges where the adjacent pixel is opaque. Then it looks something like this:

Unncessary edges removed

Unncessary edges removed (well, most of them..)

Now we are down to 38 edges. But we can do better. Quite a few edges are consecutive along an axis. These can be merged to yield simpler contours.

Almost there...

Almost there…

Now we are down to 20 edges with 16 in the outer clockwise contour and 4 in the inner counter-clockwise contour. Now we only have to scale the coordinates of the edges to fit a TrueType glyph. Remember that everything is supposed to fit within 2048×2048 units, giving a pixel size of 256×256. Then we are done:

Vectorization done!

Vectorization done!

In the TTX format of FontTools, this would look something like the following:

<TTGlyph name="A" xMin="256" yMin="0" xMax="1792" yMax="1792">
  <contour>
    <pt x="256" y="0" on="1"/>
    <pt x="256" y="1280" on="1"/>
    <pt x="512" y="1280" on="1"/>
    <pt x="512" y="1536" on="1"/>
    <pt x="768" y="1536" on="1"/>
    <pt x="768" y="1792" on="1"/>
    <pt x="1280" y="1792" on="1"/>
    <pt x="1280" y="1536" on="1"/>
    <pt x="1536" y="1536" on="1"/>
    <pt x="1536" y="1280" on="1"/>
    <pt x="1792" y="1280" on="1"/>
    <pt x="1792" y="0" on="1"/>
    <pt x="1280" y="0" on="1"/>
    <pt x="1280" y="768" on="1"/>
    <pt x="768" y="768" on="1"/>
    <pt x="768" y="0" on="1"/>
  </contour>
  <contour>
    <pt x="768" y="1024" on="1"/>
    <pt x="1280" y="1024" on="1"/>
    <pt x="1280" y="1280" on="1"/>
    <pt x="768" y="1280" on="1"/>
  </contour>
  <instructions><assembly>
    </assembly></instructions>
</TTGlyph>

Finally, let me give you a sneak peek of what this character looks like in the TrueType font editor FontForge:

Screenshot of FontForge

Please have a look at the complete C64 to TTF python script if you want to know more about this. It can be found at my GitHub account.

Next time, I’ll talk a bit about how to map all the characters from the native C64 format into PETSCII, ASCII, and Unicode.

What does a TrueType font look like on the inside? And how do you make one? (Part I)

So, here the other day (actually a few months ago) I was looking for some retro-looking 8×8 pixel bitmap fonts for a project involving an old flip-dot display at My Friendly Local Hackerspace™. During this search I came across this website with more than 600 character sets from various old Commodore 64 games, applications, and demos. However, they are all in a very raw format where they are basically just partial memory dumps from running C64 games/demos. The author even mentions that he hasn’t found any easy way to convert them to a more useful format like, for example, TrueType. In fact, a Google search indicates that only very few C64 character sets are available for download as TrueType files. Most notable are the ones by Style64. However, they all seem to be more or less converted by hand using various font authoring tools. Well that’s not very elegant…

Somebody ought to do something!

Let’s explore what a TrueType font file looks like on the inside so we can create one ourselves. As I’ll cover later, it turns out to be quite possible to do with a bit of coding effort. In fact we don’t even have to do all the dirty work moving bits around and calculate checksums in a binary Truetype font file ourselves. There is (of course) a nice Python package called TTX/FontTools that can help us out. It encapsulates all the relevant data tables, as they are called, but it’s still fairly close to “the metal” and rather difficult to get started with.

The main use-case for TTX/FontTools seems to be TTX, which is a command-line tool for converting TrueType files into a corresponding XML-format and vice versa. Users can then manipulate a font in various ways using a basic text editor rather than sophisticated font editor programs, where it can be difficult to see what actually happens to the font behind the scene. But for our particular use-case we are not interested in the tool itself, but rather what makes it tick. It is really difficult to find any documentation or tutorials for this package. Especially if one wants to create a new font from scratch instead of just editing an existing one. I’ll get back to this later.

This post is meant as an introduction to the subject. All the dirty details will be fleshed out in upcoming posts. I have implemented a conversion tool in Python and if you can’t wait for the details, you can have a sneak-peek of the code on GitHub.

The problem in a nutshell

OK, so how do we get from a C64 memory dump containing some 8×8 pixel bitmapped characters to a usable TrueType file? The problem can be split into the following sub-problems:

  1. Vectorizing the 8×8 pixel bitmaps into glyph into a set of contours.
  2. Mapping from the native C64 format into ASCII (and Unicode).
  3. Generating the actual TrueType file using TTX/FontTools.

I’ll address these three sub-problems across the next three blog posts.

Useful software:

  • Python – A very nice programming language. Like Perl but less painful and much prettier.
  • TTX/FontTools – Without this package I probably wouldn’t have bothered with TrueType.
  • FontForge – A TrueType font editor. Nice for checking the sanity of font files.

If your operating system is a derivative of Debian Linux (like Ubuntu or Linux Mint) then Python is probably pre-installed and the other two packages are easily installable through the built-in package system. It’s also pretty easy to get it to work in Mac OS X. It probably also works in Windows, but then you’re on your own..

Useful links:

The TrueType file format at a glance

It seems that the file format specification varies a bit depending on who you ask. Traditionally, there are three big players in this field, Microsoft, Apple, and Adobe, and they don’t agree 100% on what a proper TrueType file should look like. I will primarily be using Microsoft’s definition, as Apple’s specification on required tables in the file lacks the “OS/2” table, which is used by Microsoft Windows itself as well as Microsoft Internet Explorer when loading fonts from a website. There are some licensing bits in here that tells Internet Explorer if it’s OK to download and use this font.

Basically, a TrueType file is a set of tables. Each of these tables describe a certain aspect of the font. For my particular purpose I can leave out features like kerning, hinting, ligatures, and all other fancyness that’s useful for working with proportional fonts. My goal here is to convert monospaced 8×8 pixel bitmaps into proper TrueType glyphs. The required tables for a bare-bones font are as follows:

Table nameTable description
cmapCharacter to glyph mapping
glyfGlyph data
headFont header
hheaHorizontal header
hmtxHorizontal metrics
locaIndex to location
maxpMaximum profile
nameNaming table
postPostscript information
OS/2OS/2 and Windows specific metrics

A detailed description of these tables are outside the scope of this post, but it’s all there in the specifications I linked above. As mentioned earlier, I didn’t succeed in finding a tutorial for creating font files from scratch using TTX/FontTools. It requires a deep knowledge of the TrueType format, but it can be done by reading the FontTools code (as well as the specifications) and doing some trial-and-error using the XML output feature. Basically you just need to fill in all the data mentioned in the specification. There are very few shortcuts in the process. But more on this in a later blog post…

In my next post, I’ll discuss the vectorization algorithm for converting a 8×8 pixel bitmap into a set of polygon contours. Stay tuned…