Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 663aead

Browse files
authored
Merge pull request #62 from aitikgupta/aitikgupta/gsoc-final
GSoC'21 Final Report: Aitik Gupta
2 parents 2413d13 + e746bc8 commit 663aead

File tree

2 files changed

+169
-0
lines changed

2 files changed

+169
-0
lines changed
Loading
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
title: "GSoC'21: Final Report"
3+
date: 2021-08-17T17:36:40+05:30
4+
draft: false
5+
categories: ["News", "GSoC"]
6+
description: "Google Summer of Code 2021: Final Report - Aitik Gupta"
7+
displayInList: true
8+
author: Aitik Gupta
9+
10+
resources:
11+
- name: featuredImage
12+
src: "AitikGupta_GSoC.png"
13+
params:
14+
showOnTop: true
15+
---
16+
17+
**<ins>Matplotlib: Revisiting Text/Font Handling</ins>**
18+
19+
Here's a [meme](https://user-images.githubusercontent.com/43996118/129448683-bc136398-afeb-40ac-bbb7-0576757baf3c.jpg) I created, to kick things off for this final report!
20+
## About Matplotlib
21+
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations, which has become a _de-facto Python plotting library_.
22+
23+
Much of the implementation behind its font manager is inspired by [W3C](https://www.w3.org/) compliant algorithms, allowing users to interact with font properties like `font-size`, `font-weight`, `font-family`, etc.
24+
25+
#### However, the way Matplotlib handled fonts and general text layout was not ideal, which is what Summer 2021 was all about.
26+
27+
> By "not ideal", I do not mean that the library has design flaws, but that the design was engineered in the early 2000s, and is now _outdated_.
28+
29+
(..more on this later)
30+
31+
### About the Project
32+
(PS: here's [the link](https://docs.google.com/document/d/11PrXKjMHhl0rcQB4p_W9JY_AbPCkYuoTT0t85937nB0/) to my GSoC proposal, if you're interested)
33+
34+
Overall, the project was divided into two major subgoals:
35+
1. Font Subsetting
36+
2. Font Fallback
37+
38+
But before we take each of them on, we should get an idea about some basic terminology for fonts (which are a _lot_, and are rightly _confusing_)
39+
40+
The [PR: Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346/files) brings about a bit of clarity about some of these terms. The next section has a linked PR which also explains the types of fonts and how that is relevant to Matplotlib.
41+
## Font Subsetting
42+
An easy-to-read guide on Fonts and Matplotlib was created with [PR: [Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450), which is currently live at [Matplotlib's DevDocs](https://matplotlib.org/devdocs/users/fonts.html).
43+
44+
Taking an excerpt from one of my previous blogs (and [the doc](https://matplotlib.org/devdocs/users/fonts.html#subsetting)):
45+
46+
> Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are <ins>required</ins> for a certain array of characters, and embed <ins>only those</ins> within the output.
47+
48+
PDF, PS/EPS and SVG output document formats are special, as in **the text within them can be <ins>editable</ins>**, i.e, one can copy/search text from documents (for eg, from a PDF file) if the text is editable.
49+
50+
### Matplotlib and Subsetting
51+
The PDF, PS/EPS and SVG backends used to support font subsetting, _only for a few types_. What that means is, before Summer '21, Matplotlib could generate Type 3 subsets for PDF, PS/EPS backends, but it <ins>*could not*</ins> generate Type 42 / TrueType subsets.
52+
53+
With [PR: Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) merged in, users can expect their PDF/PS/EPS documents to contains subsetted glyphs from the original fonts.
54+
55+
This is especially benefitial for people who wish to use <ins>commercial</ins> (or [CJK](https://en.wikipedia.org/wiki/CJK_characters)) fonts. Licenses for many fonts ***require*** subsetting such that they can’t be trivially copied from the output files generated from Matplotlib.
56+
57+
## Font Fallback
58+
Matplotlib was designed to work with a single font at runtime. A user _could_ specify a `font.family`, which was supposed to correspond to [CSS](https://www.w3schools.com/cssref/pr_font_font-family.asp) properties, but that was only used to find a _single_ font present on the user's system.
59+
60+
Once that font was found (which is almost always found, since Matplotlib ships with a set of default fonts), all the user text was rendered only through that font. (which used to give out "<ins>tofu</ins>" if a character wasn't found)
61+
62+
---
63+
64+
It might seem like an _outdated_ approach for text rendering, now that we have these concepts like font-fallback, <ins>but these concepts weren't very well discussed in early 2000s</ins>. Even getting a single font to work _was considered a hard engineering problem_.
65+
66+
This was primarily because of the lack of **any standardization** for representation of fonts (Adobe had their own font representation, and so did Apple, Microsoft, etc.)
67+
68+
69+
| ![Previous](https://user-images.githubusercontent.com/43996118/128605750-9d76fa4a-ce57-45c6-af23-761334d48ef7.png) | ![After](https://user-images.githubusercontent.com/43996118/128605746-9f79ebeb-c03d-407e-9e27-c3203a210908.png) |
70+
|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
71+
<p align="middle">
72+
<ins>Previous</ins> (notice <i>Tofus</i>) VS <ins>After</ins> (CJK font as fallback)
73+
</p>
74+
75+
To migrate from a font-first approach to a text-first approach, there are multiple steps involved:
76+
77+
### Parsing the whole font family
78+
The very first (and crucial!) step is to get to a point where we have multiple font paths (ideally individual font files for the whole family). That is achieved with either:
79+
- [PR: [with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496), or
80+
- [PR: [without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549)
81+
82+
Quoting one of my [previous](https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/) blogs:
83+
> Don’t break, a lot at stake!
84+
85+
My first approach was to change the existing public `findfont` API to incorporate multiple filepaths. Since Matplotlib has a _very huge_ userbase, there's a high chance it would break a chunk of people's workflow:
86+
87+
<p align="center">
88+
<img src="https://user-images.githubusercontent.com/43996118/129636132-47b141b3-f149-49b7-b0c0-67c256bd6ee1.png" alt="FamilyParsingFlowChart" width="60%" />
89+
First PR (left), Second PR (right)
90+
</p>
91+
92+
### FT2Font Overhaul
93+
Once we get a list of font paths, we need to change the internal representation of a "font". Matplotlib has a utility called FT2Font, which is written in C++, and used with wrappers as a Python extension, which in turn is used throughout the backends. For all intents and purposes, it used to mean: ```FT2Font === SingleFont``` (if you're interested, here's a [meme](https://user-images.githubusercontent.com/43996118/128352387-76a3f52a-20fc-4853-b624-0c91844fc785.png) about how FT2Font was named!)
94+
95+
But that is not the case anymore, here's a flowchart to explain what happens now:
96+
<p align="center">
97+
<img src="https://user-images.githubusercontent.com/43996118/129720023-14f5d67f-f279-433f-ad78-e5eccb6c784a.png" alt="FamilyParsingFlowChart" width="100%" />
98+
Font-Fallback Algorithm
99+
</p>
100+
101+
With [PR: Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740), every FT2Font object has a `std::vector<FT2Font *> fallback_list`, which is used for filling the parent cache, as can be seen in the self-explanatory flowchart.
102+
103+
For simplicity, only one type of cache (<ins>character -> FT2Font</ins>) is shown, whereas in actual implementation there's 2 types of caches, one shown above, and another for glyphs (<ins>glyph_id -> FT2Font</ins>).
104+
105+
> Note: Only the parent's APIs are used in some backends, so for each of the individual public functions like `load_glyph`, `load_char`, `get_kerning`, etc., we find the FT2Font object which has that glyph from the parent FT2Font cache!
106+
107+
### Multi-Font embedding in PDF/PS/EPS
108+
Now that we have multiple fonts to render a string, we also need to embed them for those special backends (i.e., PDF/PS, etc.). This was done with some patches to specific backends:
109+
- [PR: Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804)
110+
- [PR: Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832)
111+
112+
With this, one could create a PDF or a PS/EPS document with multiple fonts which are embedded (and subsetted!).
113+
114+
## Conclusion
115+
From small contributions to eventually working on a core module of such a huge library, the road was not what I had imagined, and I learnt a lot while designing solutions to these problems.
116+
117+
#### The work I did would eventually end up affecting every single Matplotlib user.
118+
...since all plots will work their way through the new codepath!
119+
120+
I think that single statement is worth the <ins>whole GSoC project</ins>.
121+
122+
### Pull Request Statistics
123+
For the sake of statistics (and to make GSoC sound a bit less intimidating), here's a list of contributions I made to Matplotlib <ins>before Summer '21</ins>, most of which are only a few lines of diff:
124+
125+
| Created At | PR Title | Diff | Status |
126+
|:------------: |------------------------------------------------------------------------------------------------------------------------- |:---------------: |:------: |
127+
| Nov 2, 2020 | [Expand ScalarMappable.set_array to accept array-like inputs](https://github.com/matplotlib/matplotlib/pull/18870) | (+28 −4) | MERGED |
128+
| Nov 8, 2020 | [Add overset and underset support for mathtext](https://github.com/matplotlib/matplotlib/pull/18916) | (+71 −0) | MERGED |
129+
| Nov 14, 2020 | [Strictly increasing check with test coverage for streamplot grid](https://github.com/matplotlib/matplotlib/pull/18947) | (+54 −2) | MERGED |
130+
| Jan 11, 2021 | [WIP: Add support to edit subplot configurations via textbox](https://github.com/matplotlib/matplotlib/pull/19271) | (+51 −11) | DRAFT |
131+
| Jan 18, 2021 | [Fix over/under mathtext symbols](https://github.com/matplotlib/matplotlib/pull/19314) | (+7,459 −4,169) | MERGED |
132+
| Feb 11, 2021 | [Add overset/underset whatsnew entry](https://github.com/matplotlib/matplotlib/pull/19497) | (+28 −17) | MERGED |
133+
| May 15, 2021 | [Warn user when mathtext font is used for ticks](https://github.com/matplotlib/matplotlib/pull/20235) | (+28 −0) | MERGED |
134+
135+
Here's a list of PRs I opened <ins>during Summer'21</ins>:
136+
- [Status: ✅] [Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346)
137+
- [Status: ✅] [Add parse_math in Text and default it False for TextBox](https://github.com/matplotlib/matplotlib/pull/20367)
138+
- [Status: ✅] [Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391)
139+
- [Status: ✅] [[Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450)
140+
- [Status: 🚧] [[with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496)
141+
- [Status: 🚧] [[without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549)
142+
- [Status: 🚧] [Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740)
143+
- [Status: 🚧] [Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804)
144+
- [Status: 🚧] [Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832)
145+
146+
147+
## Acknowledgements
148+
From learning about software engineering fundamentals from [Tom](https://github.com/tacaswell) to learning about nitty-gritty details about font representations from [Jouni](https://github.com/jkseppan);
149+
150+
From learning through [Antony](https://github.com/anntzer)'s patches and pointers to receiving amazing feedback on these blogs from [Hannah](https://github.com/story645), it has been an adventure! 💯
151+
152+
_Special Mentions: [Frank](https://github.com/sauerburger), [Srijan](https://github.com/srijan-paul) and [Atharva](https://github.com/tfidfwastaken) for their helping hands!_
153+
154+
And lastly, _you_, the reader; if you've been following my [previous blogs](https://matplotlib.org/matplotblog/categories/gsoc/), or if you've landed at this one directly, I thank you nevertheless. (one last [meme](https://user-images.githubusercontent.com/43996118/126441988-5a2067fd-055e-44e5-86e9-4dddf47abc9d.png), I promise!)
155+
156+
I know I speak for every developer out there, when I say <ins>***it means a lot***</ins> when you choose to look at their journey or their work product; it could as well be a tiny website, or it could be as big as designing a complete library!
157+
158+
<hr>
159+
160+
> I'm grateful to [Maptlotlib](https://matplotlib.org/) (under the parent organisation: [NumFOCUS](https://numfocus.org/)), and of course, [Google Summer of Code](https://summerofcode.withgoogle.com/) for this incredible learning opportunity.
161+
162+
Farewell, reader! :')
163+
164+
<p align="center">
165+
<img src="https://user-images.githubusercontent.com/43996118/118876008-5e6dd580-b90a-11eb-96db-0abc930c6993.png" alt="MatplotlibGSoC" />
166+
Consider contributing to Matplotlib (Open Source in general) ❤️
167+
</p>
168+
169+
#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-final/).

0 commit comments

Comments
 (0)