Things I learned About Fonts While Making a Java Font Library (that you didn’t want to know)

A few months ago I was working a bit on Pdf2Dom in my free time which is an open source Java PDF to HTML converter. At some point I bumped into PDFs that were using old and obscure Adobe font formats which we didn’t support quite yet. So I started Googling about old Adobe font formats and found out if I wanted a pure Java implementation I was on my own, kid.

I decided to break off the new font features to a project that goes by the catchy name of FontVerter and I learned a few ugly things about how fonts work.

So here’s some things I learned, some ugly some not, among them is a small part of the real root cause for browser font rendering differences (apologies if the first few points are too basic for you, scroll down to the lower ones where I get deeper into the technical details)

Font formats are nested up to four levels deep now

first_matryoshka_museum_doll_open
The WOFF formats are a web wrapper around formats like Microsoft’s OpenType which was an extension of Apple’s TrueType spec and has the ability to wrap Adobe’s CFF fonts which wraps PostScript Type 1 fonts.

Yep fonts are just like the above picture of cute little Russian nesting dolls except usually more confusing and less cute than them.

I find it rather interesting that newer font specs have just piled onto the existing formats originally created in the 80’s by multiple large tech corporations for so long. I can’t think of another non trivial file format that’s done something like this (though someone will likely tell me I’m wrong and there is one).

What .ttf and .otf file extensions really mean

confused

OpenType is just an extension of the TrueType spec, an OpenType font can use either TrueType outlines or an Adobe CFF font. When CFF is used the extension is always .otf when TrueType outlines are used it’s .ttf. TrueType only supports true type outlines so it’s file endings are (almost) always .ttf. Though I found whether .ttf or .otf is used to be irrelevant to all the major browsers as they figure it out just by looking at the file.

TrueType is the older parent of OpenType so if you have an older actual TrueType spec conforming .ttf font lying around your more likely to run into issues getting it to render in browsers as they’ve stopped caring. Remember the subtle difference between a TrueType font and an OpenType font using TrueType Outlines.

Why is WOFF for web?

web_fonts

WOFF is just a compression wrapper around the old font formats. Before working on this library all I knew about WOFF was that it was meant for the web. Occasionally I wondered if WOFF fonts were inferior to True/OpenType since web usually means stripped down and less features or more lossy but I never had time to Google it.

Nope, WOFF is really just a lossless compression wrapper around standard font formats. WOFF 2.0 uses Google’s new fangled Brotli compression and WOFF 1.0 uses zlib compress2. Brotli compression is superior to the usual GZIP way for web server resources but was adopted much quicker for fonts with WOFF2 than other web resource like images. No idea why that is or why they felt it warranted a another level of font nesting.

Shout out to the amazing WOFF 1 and 2 specs though, their clarity and simplicity is a far cry from the True/OpenType and Adobe specs. Since Google’s sfntly isn’t on Maven Central(SHAME!) I decided to just write my own WOFF1 and 2 code for FontVerter since the spec seemed so simple and clear and indeed implementing them was pretty straightforward.

The specs are full of ‘neat’ optimizations.

magic_trick

Since the original font specs were made back in the day when rendering fonts could still be a CPU intensive task there was a bit of effort in optimizing various parts of the specs to shave off a cycle or two for the renderer or to take a byte off the file size. Modern specs like WOFF still have a few tricks to save a byte or two in the non compressed header area since for web optimization we still care about every last additional bit in the file.

Here’ an example describing an “obscure indexing trick” for a format 4 cmap subtable:

If the idRangeOffset value for the segment is not 0, the mapping of character codes relies on glyphIdArray. The character code offset from startCode is added to the idRangeOffset value. This sum is used as an offset from the current location within idRangeOffset itself to index out the correct glyphIdArray value. This obscure indexing trick works because glyphIdArray immediately follows idRangeOffset in the font file. The C expression that yields the glyph index is:
*(idRangeOffset[i]/2
+ (c – startCount[i])
+ &idRangeOffset[i])

And there’s a few obscure data types they use to shave a byte here and there like variable length encoded integers in WOFF2:

UIntBase128 is a different variable length encoding of unsigned integers, suitable for values up to 232-1. A UIntBase128 encoded number is a sequence of bytes for which the most significant bit is set for all but the last byte, and clear for the last byte. The number itself is base 128 encoded in the lower 7 bits of each byte. Thus, a decoding procedure for a UIntBase128 is: start with value = 0. Consume a byte, setting value = old value times 128 + (byte bitwise-and 127). Repeat last step until the most significant bit of byte is false.

UIntBase128 encoding format allows a possibility of sub-optimal encoding…

There’s also a few places where a font spec tells you to do some calculation based on already stored values to store in a separate table entry and my first thought was usually why didn’t they just have the font renderer do that calculation at run time so the file is slightly smaller. The answer is that it slightly lowers the load on the end user’s already strained CPU which is much more important then the programmer adding a few more lines of code and having to wait longer to generate the font especially in a time when processors like Intel 8088’s were still in use.

The font specs are complex and ambiguous.

complex_looking_math_equations
Anyone who has read parts of the major font specs (besides WOFF) usually agrees they were written horribly and are full of potential ambiguity at every turn. Some but not all of the apparent ambiguousness can be solved be extremely carefully and literally rereading that part of the spec.

It’s like a 6th grade teacher doing a lesson on ambiguity in technical writing and gives you a sheet of instructions to do that wind up being very ambiguous with everyone in the class getting a different final answer.

You also have a number of properties that are repeated in separate tables with differing clarity from the spec on which to read first or what to do if they don’t match.

Most people notice the font rendering differences between Windows and Linux and if you’re really anal about fonts you’ve likely noticed it’s different in separate browsers even. A small part of the reason for that is if you look at the font code for Chrome and Firefox their answers to ambiguity in the spec can differ. I partly enjoyed this discovering of some of the real reasons for font rendering differences with some hands on work but also partly hated it as I just wanted my converted fonts to work already.


They have their own instruction set.

intel_4004 TrueType fonts contain glyphs that are made up of glyph outline paths and a separate glyph program. That program is written in the fonts own special instruction set with the usual pop(0x21), push(0xB0), add, etc. along with ones specific to font rendering. Which means any TTF font renderer needs it’s own virtual machine/interpreter implementation.

FireFox and Chrome also very helpfully have differences between their font hint program interpreters. I had an old TTF font that rendered perfectly in FireFox but not at all in Chrome and give no errors in the console. I traced the issue back down to having something to do with Chrome processing one of the 100+ instructions for certain glyph programs in the font differently and that’s when I got tired of fonts and decided to take a break from FontVerter.

Conclusion

If you’re like me you likely constantly fiddle with what font to use for your text editor or IDE. Consolas or Inconsolata or maybe Source Code Pro? I could never decide. But after working with Fonts on such a low level I no longer care, the default Consolas will do just leave me alone you fonts you. (I did write a blog article about programming fonts recently though, because I’m on a challenge to write a blog post every day and I had nothing to write about that day)

Thank you for reading!!

Image Credits: commons.wikimedia.org/wiki/File:Web_fonts.png, pixabay.com/en/upset-sad-confused-figurine-534103/, pixabay.com/en/hand-playing-card-ace-pik-magic-998957/, commons.wikimedia.org/wiki/File:AMS_Euler_sample_math.svg, commons.wikimedia.org/wiki/File:Intel_4004.jpg

Only cool people share. You do wanna be cool right?Share on Reddit0Share on Facebook0Share on StumbleUpon0Tweet about this on Twitter11

Leave a Reply

Time limit is exhausted. Please reload the CAPTCHA.