|
| 1 | +--- |
| 2 | +title: "GSoC'21: Final Report" |
| 3 | +date: 2021-08-17T17:36:40+05:30 |
| 4 | +draft: false |
| 5 | +categories: ["News", "GSoC"] |
| 6 | +description: "Google Summer of Code 2021: Final Report - Aitik Gupta" |
| 7 | +displayInList: true |
| 8 | +author: Aitik Gupta |
| 9 | + |
| 10 | +resources: |
| 11 | +- name: featuredImage |
| 12 | + src: "AitikGupta_GSoC.png" |
| 13 | + params: |
| 14 | + showOnTop: true |
| 15 | +--- |
| 16 | + |
| 17 | +**<ins>Matplotlib: Revisiting Text/Font Handling</ins>** |
| 18 | + |
| 19 | +Here's a [meme](https://user-images.githubusercontent.com/43996118/129448683-bc136398-afeb-40ac-bbb7-0576757baf3c.jpg) I created, to kick things off for this final report! |
| 20 | +## About Matplotlib |
| 21 | +Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations, which has become a _de-facto Python plotting library_. |
| 22 | + |
| 23 | +Much of the implementation behind its font manager is inspired by [W3C](https://www.w3.org/) compliant algorithms, allowing users to interact with font properties like `font-size`, `font-weight`, `font-family`, etc. |
| 24 | + |
| 25 | +#### However, the way Matplotlib handled fonts and general text layout was not ideal, which is what Summer 2021 was all about. |
| 26 | + |
| 27 | +> By "not ideal", I do not mean that the library has design flaws, but that the design was engineered in the early 2000s, and is now _outdated_. |
| 28 | +
|
| 29 | +(..more on this later) |
| 30 | + |
| 31 | +### About the Project |
| 32 | +(PS: here's [the link](https://docs.google.com/document/d/11PrXKjMHhl0rcQB4p_W9JY_AbPCkYuoTT0t85937nB0/) to my GSoC proposal, if you're interested) |
| 33 | + |
| 34 | +Overall, the project was divided into two major subgoals: |
| 35 | +1. Font Subsetting |
| 36 | +2. Font Fallback |
| 37 | + |
| 38 | +But before we take each of them on, we should get an idea about some basic terminology for fonts (which are a _lot_, and are rightly _confusing_) |
| 39 | + |
| 40 | +The [PR: Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346/files) brings about a bit of clarity about some of these terms. The next section has a linked PR which also explains the types of fonts and how that is relevant to Matplotlib. |
| 41 | +## Font Subsetting |
| 42 | +An easy-to-read guide on Fonts and Matplotlib was created with [PR: [Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450), which is currently live at [Matplotlib's DevDocs](https://matplotlib.org/devdocs/users/fonts.html). |
| 43 | + |
| 44 | +Taking an excerpt from one of my previous blogs (and [the doc](https://matplotlib.org/devdocs/users/fonts.html#subsetting)): |
| 45 | + |
| 46 | +> Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are <ins>required</ins> for a certain array of characters, and embed <ins>only those</ins> within the output. |
| 47 | +
|
| 48 | +PDF, PS/EPS and SVG output document formats are special, as in **the text within them can be <ins>editable</ins>**, i.e, one can copy/search text from documents (for eg, from a PDF file) if the text is editable. |
| 49 | + |
| 50 | +### Matplotlib and Subsetting |
| 51 | +The PDF, PS/EPS and SVG backends used to support font subsetting, _only for a few types_. What that means is, before Summer '21, Matplotlib could generate Type 3 subsets for PDF, PS/EPS backends, but it <ins>*could not*</ins> generate Type 42 / TrueType subsets. |
| 52 | + |
| 53 | +With [PR: Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) merged in, users can expect their PDF/PS/EPS documents to contains subsetted glyphs from the original fonts. |
| 54 | + |
| 55 | +This is especially benefitial for people who wish to use <ins>commercial</ins> (or [CJK](https://en.wikipedia.org/wiki/CJK_characters)) fonts. Licenses for many fonts ***require*** subsetting such that they can’t be trivially copied from the output files generated from Matplotlib. |
| 56 | + |
| 57 | +## Font Fallback |
| 58 | +Matplotlib was designed to work with a single font at runtime. A user _could_ specify a `font.family`, which was supposed to correspond to [CSS](https://www.w3schools.com/cssref/pr_font_font-family.asp) properties, but that was only used to find a _single_ font present on the user's system. |
| 59 | + |
| 60 | +Once that font was found (which is almost always found, since Matplotlib ships with a set of default fonts), all the user text was rendered only through that font. (which used to give out "<ins>tofu</ins>" if a character wasn't found) |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +It might seem like an _outdated_ approach for text rendering, now that we have these concepts like font-fallback, <ins>but these concepts weren't very well discussed in early 2000s</ins>. Even getting a single font to work _was considered a hard engineering problem_. |
| 65 | + |
| 66 | +This was primarily because of the lack of **any standardization** for representation of fonts (Adobe had their own font representation, and so did Apple, Microsoft, etc.) |
| 67 | + |
| 68 | + |
| 69 | +|  |  | |
| 70 | +|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| |
| 71 | +<p align="middle"> |
| 72 | + <ins>Previous</ins> (notice <i>Tofus</i>) VS <ins>After</ins> (CJK font as fallback) |
| 73 | +</p> |
| 74 | + |
| 75 | +To migrate from a font-first approach to a text-first approach, there are multiple steps involved: |
| 76 | + |
| 77 | +### Parsing the whole font family |
| 78 | +The very first (and crucial!) step is to get to a point where we have multiple font paths (ideally individual font files for the whole family). That is achieved with either: |
| 79 | +- [PR: [with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496), or |
| 80 | +- [PR: [without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549) |
| 81 | + |
| 82 | +Quoting one of my [previous](https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/) blogs: |
| 83 | +> Don’t break, a lot at stake! |
| 84 | +
|
| 85 | +My first approach was to change the existing public `findfont` API to incorporate multiple filepaths. Since Matplotlib has a _very huge_ userbase, there's a high chance it would break a chunk of people's workflow: |
| 86 | + |
| 87 | +<p align="center"> |
| 88 | + <img src="https://user-images.githubusercontent.com/43996118/129636132-47b141b3-f149-49b7-b0c0-67c256bd6ee1.png" alt="FamilyParsingFlowChart" width="60%" /> |
| 89 | + First PR (left), Second PR (right) |
| 90 | +</p> |
| 91 | + |
| 92 | +### FT2Font Overhaul |
| 93 | +Once we get a list of font paths, we need to change the internal representation of a "font". Matplotlib has a utility called FT2Font, which is written in C++, and used with wrappers as a Python extension, which in turn is used throughout the backends. For all intents and purposes, it used to mean: ```FT2Font === SingleFont``` (if you're interested, here's a [meme](https://user-images.githubusercontent.com/43996118/128352387-76a3f52a-20fc-4853-b624-0c91844fc785.png) about how FT2Font was named!) |
| 94 | + |
| 95 | +But that is not the case anymore, here's a flowchart to explain what happens now: |
| 96 | +<p align="center"> |
| 97 | + <img src="https://user-images.githubusercontent.com/43996118/129720023-14f5d67f-f279-433f-ad78-e5eccb6c784a.png" alt="FamilyParsingFlowChart" width="100%" /> |
| 98 | + Font-Fallback Algorithm |
| 99 | +</p> |
| 100 | + |
| 101 | +With [PR: Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740), every FT2Font object has a `std::vector<FT2Font *> fallback_list`, which is used for filling the parent cache, as can be seen in the self-explanatory flowchart. |
| 102 | + |
| 103 | +For simplicity, only one type of cache (<ins>character -> FT2Font</ins>) is shown, whereas in actual implementation there's 2 types of caches, one shown above, and another for glyphs (<ins>glyph_id -> FT2Font</ins>). |
| 104 | + |
| 105 | +> Note: Only the parent's APIs are used in some backends, so for each of the individual public functions like `load_glyph`, `load_char`, `get_kerning`, etc., we find the FT2Font object which has that glyph from the parent FT2Font cache! |
| 106 | +
|
| 107 | +### Multi-Font embedding in PDF/PS/EPS |
| 108 | +Now that we have multiple fonts to render a string, we also need to embed them for those special backends (i.e., PDF/PS, etc.). This was done with some patches to specific backends: |
| 109 | +- [PR: Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804) |
| 110 | +- [PR: Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832) |
| 111 | + |
| 112 | +With this, one could create a PDF or a PS/EPS document with multiple fonts which are embedded (and subsetted!). |
| 113 | + |
| 114 | +## Conclusion |
| 115 | +From small contributions to eventually working on a core module of such a huge library, the road was not what I had imagined, and I learnt a lot while designing solutions to these problems. |
| 116 | + |
| 117 | +#### The work I did would eventually end up affecting every single Matplotlib user. |
| 118 | +...since all plots will work their way through the new codepath! |
| 119 | + |
| 120 | +I think that single statement is worth the <ins>whole GSoC project</ins>. |
| 121 | + |
| 122 | +### Pull Request Statistics |
| 123 | +For the sake of statistics (and to make GSoC sound a bit less intimidating), here's a list of contributions I made to Matplotlib <ins>before Summer '21</ins>, most of which are only a few lines of diff: |
| 124 | + |
| 125 | +| Created At | PR Title | Diff | Status | |
| 126 | +|:------------: |------------------------------------------------------------------------------------------------------------------------- |:---------------: |:------: | |
| 127 | +| Nov 2, 2020 | [Expand ScalarMappable.set_array to accept array-like inputs](https://github.com/matplotlib/matplotlib/pull/18870) | (+28 −4) | MERGED | |
| 128 | +| Nov 8, 2020 | [Add overset and underset support for mathtext](https://github.com/matplotlib/matplotlib/pull/18916) | (+71 −0) | MERGED | |
| 129 | +| Nov 14, 2020 | [Strictly increasing check with test coverage for streamplot grid](https://github.com/matplotlib/matplotlib/pull/18947) | (+54 −2) | MERGED | |
| 130 | +| Jan 11, 2021 | [WIP: Add support to edit subplot configurations via textbox](https://github.com/matplotlib/matplotlib/pull/19271) | (+51 −11) | DRAFT | |
| 131 | +| Jan 18, 2021 | [Fix over/under mathtext symbols](https://github.com/matplotlib/matplotlib/pull/19314) | (+7,459 −4,169) | MERGED | |
| 132 | +| Feb 11, 2021 | [Add overset/underset whatsnew entry](https://github.com/matplotlib/matplotlib/pull/19497) | (+28 −17) | MERGED | |
| 133 | +| May 15, 2021 | [Warn user when mathtext font is used for ticks](https://github.com/matplotlib/matplotlib/pull/20235) | (+28 −0) | MERGED | |
| 134 | + |
| 135 | +Here's a list of PRs I opened <ins>during Summer'21</ins>: |
| 136 | +- [Status: ✅] [Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346) |
| 137 | +- [Status: ✅] [Add parse_math in Text and default it False for TextBox](https://github.com/matplotlib/matplotlib/pull/20367) |
| 138 | +- [Status: ✅] [Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) |
| 139 | +- [Status: ✅] [[Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450) |
| 140 | +- [Status: 🚧] [[with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496) |
| 141 | +- [Status: 🚧] [[without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549) |
| 142 | +- [Status: 🚧] [Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740) |
| 143 | +- [Status: 🚧] [Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804) |
| 144 | +- [Status: 🚧] [Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832) |
| 145 | + |
| 146 | + |
| 147 | +## Acknowledgements |
| 148 | +From learning about software engineering fundamentals from [Tom](https://github.com/tacaswell) to learning about nitty-gritty details about font representations from [Jouni](https://github.com/jkseppan); |
| 149 | + |
| 150 | +From learning through [Antony](https://github.com/anntzer)'s patches and pointers to receiving amazing feedback on these blogs from [Hannah](https://github.com/story645), it has been an adventure! 💯 |
| 151 | + |
| 152 | +_Special Mentions: [Frank](https://github.com/sauerburger), [Srijan](https://github.com/srijan-paul) and [Atharva](https://github.com/tfidfwastaken) for their helping hands!_ |
| 153 | + |
| 154 | +And lastly, _you_, the reader; if you've been following my [previous blogs](https://matplotlib.org/matplotblog/categories/gsoc/), or if you've landed at this one directly, I thank you nevertheless. (one last [meme](https://user-images.githubusercontent.com/43996118/126441988-5a2067fd-055e-44e5-86e9-4dddf47abc9d.png), I promise!) |
| 155 | + |
| 156 | +I know I speak for every developer out there, when I say <ins>***it means a lot***</ins> when you choose to look at their journey or their work product; it could as well be a tiny website, or it could be as big as designing a complete library! |
| 157 | + |
| 158 | +<hr> |
| 159 | + |
| 160 | +> I'm grateful to [Maptlotlib](https://matplotlib.org/) (under the parent organisation: [NumFOCUS](https://numfocus.org/)), and of course, [Google Summer of Code](https://summerofcode.withgoogle.com/) for this incredible learning opportunity. |
| 161 | +
|
| 162 | +Farewell, reader! :') |
| 163 | + |
| 164 | +<p align="center"> |
| 165 | + <img src="https://user-images.githubusercontent.com/43996118/118876008-5e6dd580-b90a-11eb-96db-0abc930c6993.png" alt="MatplotlibGSoC" /> |
| 166 | + Consider contributing to Matplotlib (Open Source in general) ❤️ |
| 167 | +</p> |
| 168 | + |
| 169 | +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-final/). |
0 commit comments