Twenty-three years ago, Ken Lunde stood onstage at the Unicode conference–a gathering of people concerned with how every language is presented across the internet, from Swahili to Bengali. Amid a room full of fellow typographical scholars, he outlined a problem faced by a fifth of the world’s population–a fifth that his industry had overlooked. Chinese, Japanese, and Korean–commonly referred to as “Pan-CJK” languages–were all represented by separate fonts.
Maybe that doesn’t sound like a big deal, but that’s only because this article is being typed and read in lovingly supported Latin characters. I can shift between English, Spanish, and German words with brainless efficiency. If I know how to spell the word, it will simply appear as I type it, rendered perfectly on this screen like any other.
Pan-CJK languages are different. Their glyphs stand not as their own letters but their own words. These languages share history but have vastly different visual identities, so each was built as a separate font in a series of events that began in the 1950s with the standardization that led into the computer age. That meant if you were typing in Japanese (kanji) and wanted to reference a word from traditional Chinese characters, you’d have to pull the equivalent of switching from Times New Roman to Helvetica in mid-sentence. It would break spacing on the page and disturb the reader’s flow. It also required installing and switching between multiple fonts when typing, it hurt typographic reliability in web browsers, and, frankly, in the austere world of digital typeface design, it just looked pretty tacky, too.
Fifteen years would pass between Lunde describing this problem onstage and the second time he would call it out–yet again, at the Unicode conference, this time in 2009. No one had successfully fixed it in a fully featured font.
But by this time, Lunde had built a lot of clout at Adobe–he was now a senior computer scientist. And this time, someone in the audience from Google was listening. And an idea was already fermenting in Mountain View: to build one free font to unite all languages on earth.
Over the next two years, Google and Adobe let their lawyers figure out how the deal could work. Together, the companies would develop the ultimate universal pan-CJK typeface. In 2014, they released the first joint effort, Source Han Sans and Noto Sans CJK, by Adobe and Google, respectively. These were the sans serif versions of the fonts, intended for screens and more modern applications. And this month, they released the second chapter: the serif versions of the fonts, Source Han Serif and Noto Serif CJK, for print and more conventional uses.
The design, which forms an addendum to the original work in 2014, is almost unfathomable in scope: 30,000 ideographs (the complex shapes that make up CJK words) manifesting into 450,000 separate glyphs had to be produced, which would allow Source Han Serif to be presented in seven weights, giving it the sort of real flexibility we’ve come to expect in fonts.
To build the system, the team started with Japanese glyphs. That might sound backwards, as Japanese writing actually stems from Chinese forms. The decision, however, was one of pure practicality. Adobe’s most expansive eastern font was Japanese, which offered the team a baseline to build its other characters with minimal manpower.
“Korean effectively follows Japanese conventions for the most part. So by going through the available Japanese glyphs, I was able to map a number of Japanese glyphs for Korean use. We identified a little less than 200 characters that required separate forms for Korean,” Lunde, now a Project Architect at Adobe Type, explains. “It took me like two solid weeks of work to figure that out. It involved taking a fine-tooth comb to something like 7,500 characters.”
But Chinese forms were not so compatible. Ultimately, he learned that thousands of new Chinese glyphs would need to be created by hand. Adobe has a relatively efficient process to do so–watch the embedded video at around 24 seconds and you’ll get a peek at its Ideograph Element Library, which breaks down the strokes of ideographs into discrete parts, allowing them to be tweaked via slider to adjust measurements such as width, height, and curvature. “While this eases the designing process . . . the components still need to be arranged in the proper fashion,” says Lunde. “This is why the project took over two years to complete.”
Now, thanks to Adobe and Google’s joint efforts, the core problems of pan-CJK fonts have finally been solved in an effortful piece of history most people will never even recognize. You can find the fonts on Typekit, Github, Google’s Noto site, and soon Windows.
In turn, one would think that the problem that has haunted Lunde for over two decades has come to a close. However, as he tells Co.Design, that’s not the case. While over “99 point something percent” of commonly used pan-CJK glyphs are covered, he’d like to increase support for Hong Kong-based Chinese, and he imagines that regional written dialects–with their own distinct styles–could stand for more support, too.
But these are academic quibbles to a far bigger hole in the font. While this project covers 30,000 ideographs inside CJK, that’s only a small dent of what’s possible. Unicode has placeholders for 90,000 ideographs–or in other words, Unicode says that there are still 60,000 words missing for CJK to be truly complete. For modern usage, those ideographs are not much of a concern, in the same way that most of us don’t use words like “thou” in colloquial speech. But when it comes to digitizing historic documents of eastern history, for instance, it’s an inevitable necessity: 60,000 new ideographs need to be made by someone.
“Doing this work–because it requires so many characters to be designed–the effort is in years, not months,” says Lunde. “How we approach this? We’re not sure yet.”