diff --git a/config.toml b/config.toml index 8883ce1..aabfbc4 100644 --- a/config.toml +++ b/config.toml @@ -3,7 +3,7 @@ languageCode = "en-us" title = "Matplotblog" theme = "aether" canonifyurls = true -paginate = 3 +paginate = 4 [params] head_img = "/mpl_logo.png" @@ -14,3 +14,6 @@ link1_description = "About" [taxonomies] category = "categories" + +[markup.goldmark.renderer] + unsafe = true diff --git a/content/posts/GSoC_2020_Final_Work_Product/index.md b/content/posts/GSoC_2020_Final_Work_Product/index.md new file mode 100644 index 0000000..7922e58 --- /dev/null +++ b/content/posts/GSoC_2020_Final_Work_Product/index.md @@ -0,0 +1,55 @@ +--- +title: "GSoC 2020 Work Product - Baseline Images Problem" +date: 2020-08-16T09:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Final Work Product Report for the Google Summer of Code 2020 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020 is completed. Hurray!! This post discusses about the progress so far in the three months of the coding period from 1 June to 24 August 2020 regarding the project `Baseline Images Problem` under `matplotlib` organisation under the umbrella of `NumFOCUS` organization. + +## Project Details: + +This project helps with the difficulty in adding/modifying tests which require a baseline image. Baseline images are problematic because +- Baseline images cause the repo size to grow rather quickly. +- Baseline images force matplotlib contributors to pin to a somewhat old version of FreeType because nearly every release of FreeType causes tiny rasterization changes that would entail regenerating all baseline images (and thus cause even more repo size growth). + +So, the idea is to not store the baseline images in the repository, instead to create them from the existing tests. + +## Creation of the matplotlib_baseline_images package + +We had created the `matplotlib_baseline_images` package. This package is involved in the sub-wheels directory so that more packages can be added in the same directory, if needed in future. The `matplotlib_baseline_images` package contain baseline images for both `matplotlib` and `mpl_toolkits`. +The package can be installed by using `python3 -mpip install matplotlib_baseline_images`. + +## Creation of the matplotlib baseline image generation flag + +We successfully created the `generate_missing` command line flag for baseline image generation for `matplotlib` and `mpl_toolkits` in the previous months. It was generating the `matplotlib` and the `mpl_toolkits` baseline images initially. Now, we have also modified the existing flow to generate any missing baseline images, which would be fetched from the `master` branch on doing `git pull` or `git checkout -b feature_branch`. + +Now, the image generation on the time of fresh install of matplotlib and the generation of missing baseline images works with the `python3 -pytest lib/matplotlib matplotlib_baseline_image_generation` for the `lib/matplotlib` folder and `python3 -pytest lib/mpl_toolkits matplotlib_baseline_image_generation` for the `lib/mpl_toolkits` folder. + +## Documentation + +We have written documentation explaining the following scenarios: +1. How to generate the baseline images on a fresh install of matplotlib? +2. How to generate the missing baseline images on fetching changes from master? +3. How to install the `matplotlib_baseline_images_package` to be used for testing by the developer? +4. How to intentionally change an image? + +## Links to the work done + +- [Issue](https://github.com/matplotlib/matplotlib/issues/16447) +- [Pull Request](https://github.com/matplotlib/matplotlib/pull/17793) +- [Blog Posts](https://matplotlib.org/matplotblog/categories/gsoc/) + +## Mentors + +- Thomas A Caswell +- Hannah +- Antony Lee + +I am grateful to be part of such a great community. Project is really interesting and challenging :) + +Thanks Thomas, Antony and Hannah for helping me to complete this project. + diff --git a/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png b/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_Final/index.md b/content/posts/GSoC_2021_Final/index.md new file mode 100644 index 0000000..c956f6e --- /dev/null +++ b/content/posts/GSoC_2021_Final/index.md @@ -0,0 +1,169 @@ +--- +title: "GSoC'21: Final Report" +date: 2021-08-17T17:36:40+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Google Summer of Code 2021: Final Report - Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**Matplotlib: Revisiting Text/Font Handling** + +To kick things off for the final report, here's a [meme](https://user-images.githubusercontent.com/43996118/129448683-bc136398-afeb-40ac-bbb7-0576757baf3c.jpg) to nudge about the [previous blogs](https://matplotlib.org/matplotblog/categories/gsoc/). +## About Matplotlib +Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations, which has become a _de-facto Python plotting library_. + +Much of the implementation behind its font manager is inspired by [W3C](https://www.w3.org/) compliant algorithms, allowing users to interact with font properties like `font-size`, `font-weight`, `font-family`, etc. + +#### However, the way Matplotlib handled fonts and general text layout was not ideal, which is what Summer 2021 was all about. + +> By "not ideal", I do not mean that the library has design flaws, but that the design was engineered in the early 2000s, and is now _outdated_. + +(..more on this later) + +### About the Project +(PS: here's [the link](https://docs.google.com/document/d/11PrXKjMHhl0rcQB4p_W9JY_AbPCkYuoTT0t85937nB0/view#heading=h.feg5pv3x59u2) to my GSoC proposal, if you're interested) + +Overall, the project was divided into two major subgoals: +1. Font Subsetting +2. Font Fallback + +But before we take each of them on, we should get an idea about some basic terminology for fonts (which are a _lot_, and are rightly _confusing_) + +The [PR: Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346/files) brings about a bit of clarity about some of these terms. The next section has a linked PR which also explains the types of fonts and how that is relevant to Matplotlib. +## Font Subsetting +An easy-to-read guide on Fonts and Matplotlib was created with [PR: [Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450), which is currently live at [Matplotlib's DevDocs](https://matplotlib.org/devdocs/users/fonts.html). + +Taking an excerpt from one of my previous blogs (and [the doc](https://matplotlib.org/devdocs/users/fonts.html#subsetting)): + +> Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are required for a certain array of characters, and embed only those within the output. + +PDF, PS/EPS and SVG output document formats are special, as in **the text within them can be editable**, i.e, one can copy/search text from documents (for eg, from a PDF file) if the text is editable. + +### Matplotlib and Subsetting +The PDF, PS/EPS and SVG backends used to support font subsetting, _only for a few types_. What that means is, before Summer '21, Matplotlib could generate Type 3 subsets for PDF, PS/EPS backends, but it *could not* generate Type 42 / TrueType subsets. + +With [PR: Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) merged in, users can expect their PDF/PS/EPS documents to contains subsetted glyphs from the original fonts. + +This is especially benefitial for people who wish to use commercial (or [CJK](https://en.wikipedia.org/wiki/CJK_characters)) fonts. Licenses for many fonts ***require*** subsetting such that they can’t be trivially copied from the output files generated from Matplotlib. + +## Font Fallback +Matplotlib was designed to work with a single font at runtime. A user _could_ specify a `font.family`, which was supposed to correspond to [CSS](https://www.w3schools.com/cssref/pr_font_font-family.asp) properties, but that was only used to find a _single_ font present on the user's system. + +Once that font was found (which is almost always found, since Matplotlib ships with a set of default fonts), all the user text was rendered only through that font. (which used to give out "tofu" if a character wasn't found) + +--- + +It might seem like an _outdated_ approach for text rendering, now that we have these concepts like font-fallback, but these concepts weren't very well discussed in early 2000s. Even getting a single font to work _was considered a hard engineering problem_. + +This was primarily because of the lack of **any standardization** for representation of fonts (Adobe had their own font representation, and so did Apple, Microsoft, etc.) + + +| ![Previous](https://user-images.githubusercontent.com/43996118/128605750-9d76fa4a-ce57-45c6-af23-761334d48ef7.png) | ![After](https://user-images.githubusercontent.com/43996118/128605746-9f79ebeb-c03d-407e-9e27-c3203a210908.png) | +|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| +

+ Previous (notice Tofus) VS After (CJK font as fallback) +

+ +To migrate from a font-first approach to a text-first approach, there are multiple steps involved: + +### Parsing the whole font family +The very first (and crucial!) step is to get to a point where we have multiple font paths (ideally individual font files for the whole family). That is achieved with either: +- [PR: [with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496), or +- [PR: [without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549) + +Quoting one of my [previous](https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/) blogs: +> Don’t break, a lot at stake! + +My first approach was to change the existing public `findfont` API to incorporate multiple filepaths. Since Matplotlib has a _very huge_ userbase, there's a high chance it would break a chunk of people's workflow: + +

+ FamilyParsingFlowChart + First PR (left), Second PR (right) +

+ +### FT2Font Overhaul +Once we get a list of font paths, we need to change the internal representation of a "font". Matplotlib has a utility called FT2Font, which is written in C++, and used with wrappers as a Python extension, which in turn is used throughout the backends. For all intents and purposes, it used to mean: ```FT2Font === SingleFont``` (if you're interested, here's a [meme](https://user-images.githubusercontent.com/43996118/128352387-76a3f52a-20fc-4853-b624-0c91844fc785.png) about how FT2Font was named!) + +But that is not the case anymore, here's a flowchart to explain what happens now: +

+ FamilyParsingFlowChart + Font-Fallback Algorithm +

+ +With [PR: Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740), every FT2Font object has a `std::vector fallback_list`, which is used for filling the parent cache, as can be seen in the self-explanatory flowchart. + +For simplicity, only one type of cache (character -> FT2Font) is shown, whereas in actual implementation there's 2 types of caches, one shown above, and another for glyphs (glyph_id -> FT2Font). + +> Note: Only the parent's APIs are used in some backends, so for each of the individual public functions like `load_glyph`, `load_char`, `get_kerning`, etc., we find the FT2Font object which has that glyph from the parent FT2Font cache! + +### Multi-Font embedding in PDF/PS/EPS +Now that we have multiple fonts to render a string, we also need to embed them for those special backends (i.e., PDF/PS, etc.). This was done with some patches to specific backends: +- [PR: Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804) +- [PR: Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832) + +With this, one could create a PDF or a PS/EPS document with multiple fonts which are embedded (and subsetted!). + +## Conclusion +From small contributions to eventually working on a core module of such a huge library, the road was not what I had imagined, and I learnt a lot while designing solutions to these problems. + +#### The work I did would eventually end up affecting every single Matplotlib user. +...since all plots will work their way through the new codepath! + +I think that single statement is worth the whole GSoC project. + +### Pull Request Statistics +For the sake of statistics (and to make GSoC sound a bit less intimidating), here's a list of contributions I made to Matplotlib before Summer '21, most of which are only a few lines of diff: + +| Created At | PR Title | Diff | Status | +|:------------: |------------------------------------------------------------------------------------------------------------------------- |:---------------: |:------: | +| Nov 2, 2020 | [Expand ScalarMappable.set_array to accept array-like inputs](https://github.com/matplotlib/matplotlib/pull/18870) | (+28 −4) | MERGED | +| Nov 8, 2020 | [Add overset and underset support for mathtext](https://github.com/matplotlib/matplotlib/pull/18916) | (+71 −0) | MERGED | +| Nov 14, 2020 | [Strictly increasing check with test coverage for streamplot grid](https://github.com/matplotlib/matplotlib/pull/18947) | (+54 −2) | MERGED | +| Jan 11, 2021 | [WIP: Add support to edit subplot configurations via textbox](https://github.com/matplotlib/matplotlib/pull/19271) | (+51 −11) | DRAFT | +| Jan 18, 2021 | [Fix over/under mathtext symbols](https://github.com/matplotlib/matplotlib/pull/19314) | (+7,459 −4,169) | MERGED | +| Feb 11, 2021 | [Add overset/underset whatsnew entry](https://github.com/matplotlib/matplotlib/pull/19497) | (+28 −17) | MERGED | +| May 15, 2021 | [Warn user when mathtext font is used for ticks](https://github.com/matplotlib/matplotlib/pull/20235) | (+28 −0) | MERGED | + +Here's a list of PRs I opened during Summer'21: +- [Status: ✅] [Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346) +- [Status: ✅] [Add parse_math in Text and default it False for TextBox](https://github.com/matplotlib/matplotlib/pull/20367) +- [Status: ✅] [Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) +- [Status: ✅] [[Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450) +- [Status: 🚧] [[with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496) +- [Status: 🚧] [[without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549) +- [Status: 🚧] [Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740) +- [Status: 🚧] [Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804) +- [Status: 🚧] [Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832) + + +## Acknowledgements +From learning about software engineering fundamentals from [Tom](https://github.com/tacaswell) to learning about nitty-gritty details about font representations from [Jouni](https://github.com/jkseppan); + +From learning through [Antony](https://github.com/anntzer)'s patches and pointers to receiving amazing feedback on these blogs from [Hannah](https://github.com/story645), it has been an adventure! 💯 + +_Special Mentions: [Frank](https://github.com/sauerburger), [Srijan](https://github.com/srijan-paul) and [Atharva](https://github.com/tfidfwastaken) for their helping hands!_ + +And lastly, _you_, the reader; if you've been following my [previous blogs](https://matplotlib.org/matplotblog/categories/gsoc/), or if you've landed at this one directly, I thank you nevertheless. (one last [meme](https://user-images.githubusercontent.com/43996118/126441988-5a2067fd-055e-44e5-86e9-4dddf47abc9d.png), I promise!) + +I know I speak for every developer out there, when I say ***it means a lot*** when you choose to look at their journey or their work product; it could as well be a tiny website, or it could be as big as designing a complete library! + +
+ +> I'm grateful to [Maptlotlib](https://matplotlib.org/) (under the parent organisation: [NumFOCUS](https://numfocus.org/)), and of course, [Google Summer of Code](https://summerofcode.withgoogle.com/) for this incredible learning opportunity. + +Farewell, reader! :') + +

+ MatplotlibGSoC + Consider contributing to Matplotlib (Open Source in general) ❤️ +

+ +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-final/). diff --git a/content/posts/GSoC_2021_Introduction/AitikGupta_GSoC.png b/content/posts/GSoC_2021_Introduction/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_Introduction/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_Introduction/index.md b/content/posts/GSoC_2021_Introduction/index.md new file mode 100644 index 0000000..dcc586d --- /dev/null +++ b/content/posts/GSoC_2021_Introduction/index.md @@ -0,0 +1,92 @@ +--- +title: "Aitik Gupta joins as a Student Developer under GSoC'21" +date: 2021-05-19T20:03:57+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Introduction about Aitik Gupta, Google Summer of Code 2021 Intern under the parent organisation: NumFOCUS" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**The day of result, was a very, very long day.** + +With this small writeup, I intend to talk about everything before _that day_, my experiences, my journey, and the role of Matplotlib throughout! + +## About Me +I am a third-year undergraduate student currently pursuing a Dual Degree (B.Tech + M.Tech) in Information Technology at Indian Institute of Information Technology, Gwalior. + +During my sophomore year, my interests started expanding in the domain of Machine Learning, where I learnt about various amazing open-source libraries like *NumPy*, *SciPy*, *pandas*, and *Matplotlib*! Gradually, in my third year, I explored the field of Computer Vision during my internship at a startup, where a big chunk of my work was to integrate their native C++ codebase to Android via JNI calls. + +To actuate my learnings from the internship, I worked upon my own research along with a [friend from my university](https://linkedin.com/in/aaditagarwal). The paper was accepted in CoDS-COMAD’21 and is published at ACM Digital Library. ([Link](https://dl.acm.org/doi/abs/10.1145/3430984.3430986), if anyone's interested) + +During this period, I also picked up the knack for open-source and started glaring at various issues (and pull requests) in libraries, including OpenCV [[contributions](https://github.com/opencv/opencv/issues?q=author%3Aaitikgupta+)] and NumPy [[contributions](https://github.com/numpy/numpy/issues?q=author%3Aaitikgupta+)]. + +I quickly got involved in Matplotlib’s community; it was very welcoming and beginner-friendly. + +**Fun fact: Its dev call was the very first I attended with people from all around the world!** + +## First Contributions +We all mess up, my [very first PR](https://github.com/opencv/opencv/pull/18440) to an organisation like OpenCV went horrible, till date, it looks like this: +![OpenCV_PR](https://user-images.githubusercontent.com/43996118/118848259-35d6e300-b8ec-11eb-8cdc-387e9f5a37a3.png) + +In all honesty, I added a single commit with only a few lines of diff. +> However, I pulled all the changes from upstream `master` to my working branch, whereas the PR was to be made on `3.4` branch. + +I'm sure I could've done tons of things to solve it, but at that time I couldn't do anything - imagine the anxiety! + +At this point when I look back at those fumbled PRs, I feel like they were important for my learning process. + +**Fun Fact: Because of one of these initial contributions, I got a shiny little badge [[Mars 2020 Helicopter Contributor](https://github.com/readme/nasa-ingenuity-helicopter)] on GitHub!** + + + + +## Getting started with Matplotlib +It was around initial weeks of November last year, I was scanning through `Good First Issue` and `New Feature` labels, I realised a pattern - most Mathtext related issues were unattended. + +To make it simple, Mathtext is a part of Matplotlib which parses mathematical expressions and provides TeX-like outputs, for example: + + +I scanned the related source code to try to figure out how to solve those Mathtext issues. Eventually, with the help of maintainers reviewing the PRs and a lot of verbose discussions on GitHub issues/pull requests and on the [Gitter](https://gitter.im/matplotlib/matplotlib) channel, I was able to get my initial PRs merged! + +## Learning throughout the process +Most of us use libraries without understanding the underlining structure of them, which sometimes can cause downstream bugs! + +While I was studying Matplotlib's architecture, I figured that I could use the same ideology for one of my [own projects](https://aitikgupta.github.io/swi-ml/)! + +Matplotlib uses a global dictionary-like object named as `rcParams`, I used a smaller interface, similar to rcParams, in [swi-ml](https://pypi.org/project/swi-ml/) - a small Python library I wrote, implementing a subset of ML algorithms, with a switchable backend. + + +## Where does GSoC fit? +It was around January, I had a conversation with one of the maintainers (hey [Antony](https://github.com/anntzer)!) about the long-list of issues with the current ways of handling texts/fonts in the library. + +After compiling them into an order, after few tweaks from maintainers, [GSoC Idea-List](https://github.com/matplotlib/matplotlib/wiki/GSOC-2021-ideas) for Matplotlib was born. And so did my journey of building a strong proposal! + +## About the Project +#### Proposal Link: [Google Docs](https://docs.google.com/document/d/11PrXKjMHhl0rcQB4p_W9JY_AbPCkYuoTT0t85937nB0/edit?usp=sharing) (will stay alive after GSoC), [GSoC Website](https://storage.googleapis.com/summerofcode-prod.appspot.com/gsoc/core_project/doc/6319153410998272_1617936740_GSoC_Proposal_-_Matplotlib.pdf?Expires=1621539234&GoogleAccessId=summerofcode-prod%40appspot.gserviceaccount.com&Signature=QU8uSdPnXpa%2FooDtzVnzclz809LHjh9eU7Y7iR%2FH1NM32CBgzBO4%2FFbMeDmMsoic91B%2BKrPZEljzGt%2Fx9jtQeCR9X4O53JJLPVjw9Bg%2Fzb2YKjGzDk0oFMRPXjg9ct%2BV58PD6f4De1ucqARLtHGjis5jhK1W08LNiHAo88NB6BaL8Q5hqcTBgunLytTNBJh5lW2kD8eR2WeENnW9HdIe53aCdyxJkYpkgILJRoNLCvp111AJGC3RLYba9VKeU6w2CdrumPfRP45FX6fJlrKnClvxyf5VHo3uIjA3fGNWIQKwGgcd1ocGuFN3YnDTS4xkX3uiNplwTM4aGLQNhtrMqA%3D%3D) (not so sure) + +### Revisiting Text/Font Handling +The aim of the project is divided into 3 subgoals: + +1. **Font-Fallback**: A redesigned text-first font interface - essentially parsing all family before rendering a "tofu". + + *(similar to specifying font-family in CSS!)* +2. **Font Subsetting**: Every exported PS/PDF would contain embedded glyphs subsetted from the whole font. + + *(imagine a plot with just a single letter "a", would you like it if the PDF you exported from Matplotlib to embed the whole font file within it?)* + +3. Most mpl backends would use the unified TeX exporting mechanism + +**Mentors** [Thomas A Caswell](https://github.com/tacaswell), [Antony Lee](https://github.com/anntzer), [Hannah](https://github.com/story645). + +Thanks a lot for spending time reading the blog! I'll be back with my progress in subsequent posts. + + +##### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-intro/)! + diff --git a/content/posts/GSoC_2021_MidTerm/AitikGupta_GSoC.png b/content/posts/GSoC_2021_MidTerm/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_MidTerm/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_MidTerm/index.md b/content/posts/GSoC_2021_MidTerm/index.md new file mode 100644 index 0000000..dece87c --- /dev/null +++ b/content/posts/GSoC_2021_MidTerm/index.md @@ -0,0 +1,88 @@ +--- +title: "GSoC'21: Mid-Term Progress" +date: 2021-07-02T08:32:05+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Mid-Term Progress with Google Summer of Code 2021 project under NumFOCUS: Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**"Aitik, how is your GSoC going?"** + +Well, it's been a while since I last wrote. But I wasn't spending time watching _Loki_ either! (that's a lie.) + +During this period the project took on some interesting (and stressful) curves, which I intend to talk about in this small writeup. +## New Mentor! +The first week of coding period, and I met one of my new mentors, [Jouni](https://github.com/jkseppan). Without him, along with [Tom](https://github.com/tacaswell) and [Antony](https://github.com/anntzer), the project wouldn't have moved _an inch_. + +It was initially Jouni's [PR](https://github.com/matplotlib/matplotlib/pull/18143) which was my starting point of the first milestone in my proposal, Font Subsetting. + +## What is Font Subsetting anyway? +As was proposed by Tom, a good way to understand something is to document your journey along the way! (well, that's what GSoC wants us to follow anyway right?) + +Taking an excerpt from one of the paragraphs I wrote [here](https://github.com/matplotlib/matplotlib/blob/a94f52121cea4194a5d6f6fc94eafdfb03394628/doc/users/fonts.rst#subsetting): +> Font Subsetting can be used before generating documents, to embed only the _required_ glyphs within the documents. Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are required for a certain array of characters, and embed only those within the output. + +Now this may seem straightforward, right? +#### Wrong. +The glyph programs can call their own subprograms, for example, characters like `ä` could be composed by calling subprograms for `a` and `¨`; or `→` could be composed by a program that changes the display matrix and calls the subprogram for `←`. + +Since the subsetter has to find out _all such subprograms_ being called by _every glyph_ included in the subset, this is a generally difficult problem! + +Something which one of my mentors said which _really_ stuck with me: +> Matplotlib isn't a font library, and shouldn't try to be one. + +It's really easy to fall into the trap of trying to do _everything_ within your own project, which ends up rather _hurting_ itself. + +Since this holds true even for Matplotlib, it uses external dependencies like [FreeType](https://www.freetype.org/), [ttconv](https://github.com/sandflow/ttconv), and newly proposed [fontTools](https://github.com/fonttools/fonttools) to handle font subsetting, embedding, rendering, and related stuff. + +PS: If that font stuff didn't make sense, I would recommend going through a friendly tutorial I wrote, which is all about [Matplotlib and Fonts](https://matplotlib.org/stable/users/fonts.html)! +## Unexpected Complications +Matplotlib uses an external dependency `ttconv` which was initially forked into Matplotlib's repository **in 2003**! +> ttconv was a standalone commandline utility for converting TrueType fonts to subsetted Type 3 fonts (among other features) written in 1995, which Matplotlib forked in order to make it work as a library. + +Over the time, there were a lot of issues with it which were either hard to fix, or didn't attract a lot of attention. (See the above paragraph for a valid reason) + +One major utility which is still used is `convert_ttf_to_ps`, which takes a _font path_ as input and converts it into a Type 3 or Type 42 PostScript font, which can be embedded within PS/EPS output documents. The guide I wrote ([link](https://matplotlib.org/stable/users/fonts.html)) contains decent descriptions, the differences between these type of fonts, etc. + +#### So we need to convert that _font path_ input to a _font buffer_ input. +Why do we need to? Type 42 subsetting isn't really supported by ttconv, so we use a new dependency called fontTools, whose 'full-time job' is to subset Type 42 fonts for us (among other things). + +> It provides us with a font buffer, however ttconv expects a font path to embed that font + +Easily enough, this can be done by Python's `tempfile.NamedTemporaryFile`: +```python +with tempfile.NamedTemporaryFile(suffix=".ttf") as tmp: + # fontdata is the subsetted buffer + # returned from fontTools + tmp.write(fontdata.getvalue()) + + # TODO: allow convert_ttf_to_ps + # to input file objects (BytesIO) + convert_ttf_to_ps( + os.fsencode(tmp.name), + fh, + fonttype, + glyph_ids, + ) +``` + +***But this is far from a clean API; in terms of separation of \*reading\* the file from \*parsing\* the data.*** + +What we _ideally_ want is to pass the buffer down to `convert_ttf_to_ps`, and modify the embedding code of `ttconv` (written in C++). And _here_ we come across a lot of unexplored codebase, _which wasn't touched a lot ever since it was forked_. + +Funnily enough, just yesterday, after spending a lot of quality time, me and my mentors figured out that the **whole logging system of ttconv was broken**, all because of a single debugging function. 🥲 + +
+ +This is still an ongoing problem that we need to tackle over the coming weeks, hopefully by the next time I write one of these blogs, it gets resolved! + +Again, thanks a ton for spending time reading these blogs. :D +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-mid/). diff --git a/content/posts/GSoC_2021_PreQuarter/AitikGupta_GSoC.png b/content/posts/GSoC_2021_PreQuarter/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_PreQuarter/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_PreQuarter/index.md b/content/posts/GSoC_2021_PreQuarter/index.md new file mode 100644 index 0000000..292495a --- /dev/null +++ b/content/posts/GSoC_2021_PreQuarter/index.md @@ -0,0 +1,92 @@ +--- +title: "GSoC'21: Pre-Quarter Progress" +date: 2021-07-19T07:32:05+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Pre-Quarter Progress with Google Summer of Code 2021 project under NumFOCUS: Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**“Well? Did you get it working?!”** + +Before I answer that question, if you're missing the context, check out my [previous blog](https://matplotlib.org/matplotblog/posts/gsoc_2021_midterm/)'s last few lines.. promise it won't take you more than 30 seconds to get the whole problem! + +With this short writeup, I intend to talk about _what_ we did and _why_ we did, what we did. XD + +## Ostrich Algorithm +Ring any bells? Remember OS (Operating Systems)? It's one of the core CS subjects which I bunked then and regret now. (╥﹏╥) + +The [wikipedia page](https://en.wikipedia.org/wiki/Ostrich_algorithm) has a 2-liner explaination if you have no idea what's an Ostrich Algorithm.. but I know most of y'all won't bother clicking it XD, so here goes: +> Ostrich algorithm is a strategy of ignoring potential problems by "sticking one's head in the sand and pretending there is no problem" + +An important thing to note: it is used when it is more **cost-effective** to _allow the problem to occur than to attempt its prevention_. + +As you might've guessed by now, we ultimately ended up with the *not-so-clean* API (more on this later). + +## What was the problem? +The highest level overview of the problem was: + +``` +❌ fontTools -> buffer -> ttconv_with_buffer +✅ fontTools -> buffer -> tempfile -> ttconv_with_file +``` +The first approach created corrupted outputs, however the second approach worked fine. A point to note here would be that *Method 1* is better in terms of separation of *reading* the file from *parsing* the data. + +1. [fontTools](https://github.com/fonttools/fonttools) handles the Type42 subsetting for us, whereas [ttconv](https://github.com/matplotlib/matplotlib/tree/master/extern/ttconv) handles the embedding. +2. `ttconv_with_buffer` is a modification to the original `ttconv_with_file`; that allows it to input a file buffer instead of a file-path + +You might be tempted to say: +> "Well, `ttconv_with_buffer` must be wrongly modified, duh." + +Logically, yes. `ttconv` was designed to work with a file-path and not a file-object (buffer), and modifying a codebase **written in 1998** turned out to be a larger pain than we anticipated. +#### It came to a point where one of my mentors decided to implement everything in Python! +He even did, but the efforts to get it to production / or to fix `ttconv` embedding were ⋙ to just get on with the second method. That damn ostrich really helped us get out of that debugging hell. 🙃 +## Font Fallback - initial steps +Finally, we're onto the second subgoal for the summer: [Font Fallback](https://www.w3schools.com/css/css_font_fallbacks.asp)! + +To give an idea about how things work right now: +1. User asks Matplotlib to use certain font families, specified by: +```python +matplotlib.rcParams["font-family"] = ["list", "of", "font", "families"] +``` +2. This list is used to search for available fonts on a user's system. +3. However, in current (and previous) versions of Matplotlib: +> As soon as a font is found by iterating the font-family, **all text** is rendered by that _and only that_ font. + +You can immediately see the problems with this approach; using the same font for every character will not render any glyph which isn't present in that font, and will instead spit out a square rectangle called "tofu" (read the first line [here](https://www.google.com/get/noto/)). + +And that is exactly the first milestone! That is, parsing the _entire list_ of font families to get an intermediate representation of a multi-font interface. +## Don't break, a lot at stake! +Imagine if you had the superpower to change Python standard library's internal functions, _without_ consulting anybody. Let's say you wanted to write a solution by hooking in and changing, let's say `str("dumb")` implementation by returning: +```ipython +>>> str("dumb") +["d", "u", "m", "b"] +``` +Pretty "dumb", right? xD + +For your usecase it might work fine, but it would also mean breaking the _entire_ Python userbase' workflow, not to mention the 1000000+ libraries that depend on the original functionality. + +On a similar note, Matplotlib has a public API known as `findfont(prop: str)`, which when given a string (or [FontProperties](https://matplotlib.org/stable/api/font_manager_api.html#matplotlib.font_manager.FontProperties)) finds you a font that best matches the given properties in your system. + +It is used throughout the library, as well as at multiple other places, including downstream libraries. Being naive as I was, I changed this function signature and submitted the [PR](https://github.com/matplotlib/matplotlib/pull/20496). 🥲 + +Had an insightful discussion about this with my mentors, and soon enough raised the [other PR](https://github.com/matplotlib/matplotlib/pull/20549), which didn't touch the `findfont` API at all. + +--- + +One last thing to note: Even if we do complete the first milestone, we wouldn't be done yet, since this is just parsing the entire list to get multiple fonts.. + +We still need to migrate the library's internal implementation from **font-first** to **text-first**! + + +But that's for later, for now: +![OnceAgainThankingYou](https://user-images.githubusercontent.com/43996118/126441988-5a2067fd-055e-44e5-86e9-4dddf47abc9d.png) + +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-pre-quarter/). diff --git a/content/posts/GSoC_2021_Quarter/AitikGupta_GSoC.png b/content/posts/GSoC_2021_Quarter/AitikGupta_GSoC.png new file mode 100644 index 0000000..6a0fb71 Binary files /dev/null and b/content/posts/GSoC_2021_Quarter/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_Quarter/index.md b/content/posts/GSoC_2021_Quarter/index.md new file mode 100644 index 0000000..128779e --- /dev/null +++ b/content/posts/GSoC_2021_Quarter/index.md @@ -0,0 +1,144 @@ +--- +title: "GSoC'21: Quarter Progress" +date: 2021-08-03T18:48:00+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Quarter Progress with Google Summer of Code 2021 project under NumFOCUS: Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**“Matplotlib, I want 多个汉字 in between my text.”** + +Let's say you asked Matplotlib to render a plot with some label containing 多个汉字 (multiple Chinese characters) in between your English text. + +Or conversely, let's say you use a Chinese font with Matplotlib, but you had English text in between (which is quite common). + +> Assumption: the Chinese font doesn't have those English glyphs, and vice versa + +With this short writeup, I'll talk about how does a migration from a font-first to a text-first approach in Matplotlib looks like, which ideally solves the above problem. +### Have the fonts? +Logically, the very first step to solving this would be to ask whether you _have_ multiple fonts, right? + +Matplotlib doesn't ship [CJK](https://en.wikipedia.org/wiki/List_of_CJK_fonts) (Chinese Japanese Korean) fonts, which ideally contains these Chinese glyphs. It does try to cover most grounds with the [default font](https://matplotlib.org/stable/users/dflt_style_changes.html#normal-text) it ships with, however. + +So if you don't have a font to render your Chinese characters, go ahead and install one! Matplotlib will find your installed fonts (after rebuilding the cache, that is). +### Parse the fonts +This is where things get interesting, and what my [previous writeup](https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/) was all about.. + +> Parsing the whole family to get multiple fonts for given font properties + +## FT2Font Magic! +To give you an idea about how things used to work for Matplotlib: +1. A single font was chosen _at draw time_ + (fixed: re [previous writeup]((https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/))) +2. Every character displayed in your document was rendered by only that font + (partially fixed: re _this writeup_) + +> FT2Font is a matplotlib-to-font module, which provides high-level Python API to interact with a _single font's operations_ like read/draw/extract/etc. + +Being written in C++, the module needs wrappers around it to be converted into a [Python extension](https://docs.python.org/3/extending/extending.html) using Python's C-API. + +> It allows us to use C++ functions directly from Python! + +So wherever you see a use of font within the library (by library I mean the readable Python codebase XD), you could have derived that: +``` +FT2Font === SingleFont +``` + +Things are be a bit different now however.. +## Designing a multi-font system +FT2Font is basically itself a wrapper around a library called [FreeType](https://www.freetype.org/), which is a freely available software library to render fonts. + +

+

+ FT2Font Naming +
How FT2Font was named
+
+

+ +In my initial proposal.. while looking around how FT2Font is structured, I figured: +``` +Oh, looks like all we need are Faces! +``` +> If you don't know what faces/glyphs/ligatures are, head over to why [Text Hates You](https://gankra.github.io/blah/text-hates-you/). I can guarantee you'll definitely enjoy some real life examples of why text rendering is hard. 🥲 + +Anyway, if you already know what Faces are, it might strike you: + +If we already have all the faces we need from multiple fonts (let's say we created a child of FT2Font.. which only tracks the faces for its families), we should be able to render everything from that parent FT2Font right? + +As I later figured out while finding segfaults in implementing this design: +``` +Each FT2Font is linked to a single FT_Library object! +``` + +If you tried to load the face/glyph/character (basically anything) from a different FT2Font object.. you'll run into serious segfaults. (because one object linked to an `FT_Library` can't really access another object which has it's own `FT_Library`) +```cpp +// face is linked to FT2Font; which is +// linked to a single FT_Library object +FT_Face face = this->get_face(); +FT_Get_Glyph(face->glyph, &placeholder); // works like a charm + +// somehow get another FT2Font's face +FT_Face family_face = this->get_family_member()->get_face(); +FT_Get_Glyph(family_face->glyph, &placeholder); // segfaults! +``` + +Realizing this took a good amount of time! After this I quickly came up with a recursive approach, wherein we: +1. Create a list of FT2Font objects within Python, and pass it down to FT2Font +2. FT2Font will hold pointers to its families via a \ + `std::vector fallback_list` +3. Find if the character we want is available in the current font + 1. If the character is available, use that FT2Font to render that character + 2. If the character isn't found, go to step 3 again, but now iterate through the `fallback_list` +4. That's it! + +A quick overhaul of the above piece of code^ +```cpp +bool ft_get_glyph(FT_Glyph &placeholder) { + FT_Error not_found = FT_Get_Glyph(this->get_face(), &placeholder); + if (not_found) return False; + else return True; +} + +// within driver code +for (uint i=0; ift_get_glyph(placeholder); + if (was_found) break; +} +``` + +With the idea surrounding this implementation, the [Agg backend](https://matplotlib.org/stable/api/backend_agg_api.html) is able to render a document (either through GUI, or a PNG) with multiple fonts! + +

+

+ ChineseInBetween +
PNG straight outta Matplotlib!
+
+

+ +## Python C-API is hard, at first! +I've spent days at Python C-API's [argument doc](https://docs.python.org/3/c-api/arg.html), and it's hard to get what you need at first, ngl. + +But, with the help of some amazing people in the GSoC community ([@srijan-paul](https://srijan-paul.github.io/), [@atharvaraykar](https://atharvaraykar.me/)) and amazing mentors, blockers begone! + +## So are we done? +Oh no. XD + +Things work just fine for the Agg backend, but to generate a PDF/PS/SVG with multiple fonts is another story altogether! I think I'll save that for later. + +

+

+ ThankYouDwight +
If you've been following the progress so far, mayn you're awesome!
+
+

+ +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-quarter/). diff --git a/content/posts/GSoC_Coding_Phase_Blog_4/index.md b/content/posts/GSoC_Coding_Phase_Blog_4/index.md new file mode 100644 index 0000000..cb4d8f5 --- /dev/null +++ b/content/posts/GSoC_Coding_Phase_Blog_4/index.md @@ -0,0 +1,37 @@ +--- +title: "GSoC Coding Phase 2 Blog 2" +date: 2020-07-23T19:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Progress Report for the second half of the Google Summer of Code 2020 Phase 2 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020's second evaluation is about to complete. Now we are about to start with the final coding phase. This post discusses about the progress so far in the last two weeks of the second coding period from 13 July to 26 July 2020. + +## Modular approach towards removal of matplotlib baseline images + +We have divided the work in two parts as discussed in the [previous blog](https://matplotlib.org/matplotblog/posts/gsoc_coding_phase_blog_3/). The first part is the generation of the baseline images discussed below. The second part is the modification of the baseline images. The modification part will be implemented in the last phase of the Google Summer of Code 2020. + +## Generation of the matplotlib baseline images + + Now, we have started removing the use of the `matplotlib_baseline_images` package. After the changes proposed in the [previous PR](https://github.com/matplotlib/matplotlib/pull/17557), the developer will have no baseline images on fresh install of matplotlib. So, the developer would need to generate matplotlib baseline images locally to get started with the testing part of the mpl. +The images can be generated by the image comparison tests with use of `matplotlib_baseline_image_generation` flag from the command line. Once these images are generated for the first time, then they can be used as the baseline images for the later times for comparison. This is the main principle adopted. + +## Completion of the generation of images for the matplotlib directory + +We successfully created the `matplotlib_baseline_image_generation` flag in the beginning of the second evaluation but images were not created in the `baseline images` directory inside the `matplotlib` and `mpl_toolkits` directories, instead they were created in the `result_images` directory. So, we implemented this functionality. The images are created in the `lib/matplotlib/tests/baseline_images` directory directly now in the baseline image generation step. The baseline image generation step uses `python3 -mpytest lib/matplotlib --matplotlib_baseline_image_generation` command. Later on, running the pytests with `python3 -mpytest lib/matplotlib` will start the image comparison. + +Right now, the matplotlib_baseline_image_generation flag works for the matplotlib directory. We are trying to achieve the same functionality for the mpl_toolkits directory. + +## Future Goals + +Once the generation of the baseline images for `mpl_toolkits` directory is completed in the [current PR](https://github.com/matplotlib/matplotlib/pull/17793), we will move to the modification of the baseline images in the third coding phase. The addition of new baseline image and deletion of the old baseline image will also be implemented in the last phase of GSoC. Modification of baseline images will be further divided into two sub tasks: addition of new baseline image and the deletion of the previous baseline image. + + +## Daily Meet-ups + +Monday to Thursday meeting initiated at [11:00pm IST](https://everytimezone.com/) via Zoom. Meeting notes are present at HackMD. + +I am grateful to be part of such a great community. Project is really interesting and challenging :) Thanks Thomas, Antony and Hannah for helping me so far. \ No newline at end of file diff --git a/content/posts/GSoC_Coding_Phase_Blog_5/index.md b/content/posts/GSoC_Coding_Phase_Blog_5/index.md new file mode 100644 index 0000000..af5ee3a --- /dev/null +++ b/content/posts/GSoC_Coding_Phase_Blog_5/index.md @@ -0,0 +1,40 @@ +--- +title: "GSoC Coding Phase 3 Blog 1" +date: 2020-08-08T09:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Progress Report for the first half of the Google Summer of Code 2020 Phase 3 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020's second evaluation is completed. I passed!!! Hurray! Now we are in the mid way of the last evaluation. This post discusses about the progress so far in the first two weeks of the third coding period from 26 July to 9 August 2020. + +## Completion of the modification logic for the matplotlib_baseline_images package + +We successfully created the `matplotlib_baseline_image_generation` command line flag for baseline image generation for `matplotlib` and `mpl_toolkits` in the previous months. It was generating the matplotlib and the matplotlib toolkit baseline images successfully. Now, we modified the existing flow to generate any missing baseline images, which would be fetched from the `master` branch on doing `git pull` or `git checkout -b feature_branch`. + +We initially thought of creating a command line flag `generate_baseline_images_for_test "test_a,test_b"`, but later on analysis of the approach, we came to the conclusion that the developer will not know about the test names to be given along with the flag. So, we tried to generate the missing images by `generate_missing` without the test names. This worked successfully. + +## Adopting reusability and Do not Repeat Yourself (DRY) Principles + +Later, we refactored the `matplot_baseline_image_generation` and `generate_missing` command line flags to single command line flag `matplotlib_baseline_image_generation` as the logic was similar for both of them. Now, the image generation on the time of fresh install of matplotlib and the generation of missing baseline images works with the `python3 -pytest lib/matplotlib matplotlib_baseline_image_generation` for the `lib/matplotlib` folder and `python3 -pytest lib/mpl_toolkits matplotlib_baseline_image_generation` for the `lib/mpl_toolkits` folder. + +## Writing the documentation + +We have written documentation explaining the following scenarios: +1. How to generate the baseline images on a fresh install of matplotlib? +2. How to generate the missing baseline images on fetching changes from master? +3. How to install the `matplotlib_baseline_images_package` to be used for testing by the developer? +4. How to intentionally change an image? + +## Refactoring and improving the code quality before merging + +Right now, we are trying to refactor the code and maintain git clean history. The [current PR](https://github.com/matplotlib/matplotlib/pull/17793) is under review. I am working on the suggested changes. We are trying to merge this :) + +## Daily Meet-ups + +Monday to Thursday meeting initiated at [11:00pm IST](https://everytimezone.com/) via Zoom. Meeting notes are present at HackMD. + +I am grateful to be part of such a great community. Project is really interesting and challenging :) Thanks Thomas, Antony and Hannah for helping me so far. + \ No newline at end of file diff --git a/content/posts/book/book-cover.png b/content/posts/book/book-cover.png new file mode 100644 index 0000000..443f56a Binary files /dev/null and b/content/posts/book/book-cover.png differ diff --git a/content/posts/book/book-gallery.png b/content/posts/book/book-gallery.png new file mode 100644 index 0000000..18b0255 Binary files /dev/null and b/content/posts/book/book-gallery.png differ diff --git a/content/posts/book/book.png b/content/posts/book/book.png new file mode 100644 index 0000000..74983e2 Binary files /dev/null and b/content/posts/book/book.png differ diff --git a/content/posts/book/index.md b/content/posts/book/index.md new file mode 100644 index 0000000..f04aefe --- /dev/null +++ b/content/posts/book/index.md @@ -0,0 +1,26 @@ +--- +title: "Newly released open access book" +date: 2021-11-15T14:26:51+01:00 +draft: false +description: "New open access book released" +categories: ["News"] +displayInList: true +author: Nicolas P. Rougier +resources: +- name: featuredImage + src: "book-cover.png" + params: + description: "Book cover" + showOnTop: true +--- + +It's my great pleasure to announce that I've finished my book on matplotlib and it is now freely available at [www.labri.fr/perso/nrougier/scientific-visualization.html](https://www.labri.fr/perso/nrougier/scientific-visualization.html) while sources for the book are hosted at [github.com/rougier/scientific-visualization-book](https://github.com/rougier/scientific-visualization-book). + +## Abstract + +The Python scientific visualisation landscape is huge. It is composed of a myriad of tools, ranging from the most versatile and widely used down to the more specialised and confidential. Some of these tools are community based while others are developed by companies. Some are made specifically for the web, others are for the desktop only, some deal with 3D and large data, while others target flawless 2D rendering. In this landscape, Matplotlib has a very special place. It is a versatile and powerful library that allows you to design very high quality figures, suitable for scientific publishing. It also offers a simple and intuitive interface as well as an object oriented architecture that allows you to tweak anything within a figure. Finally, it can be used as a regular graphic library in order to design non‐scientific figures. This book is organized into four parts. The first part considers the fundamental principles of the Matplotlib library. This includes reviewing the different parts that constitute a figure, the different coordinate systems, the available scales and projections, and we’ll also introduce a few concepts related to typography and colors. The second part is dedicated to the actual design of a figure. After introducing some simple rules for generating better figures, we’ll then go on to explain the Matplotlib defaults and styling system before diving on into figure layout organization. We’ll then explore the different types of plot available and see how a figure can be ornamented with different elements. The third part is dedicated to more advanced concepts, namely 3D figures, optimization & animation. The fourth and final part is a collection of showcases. + +### Book gallery + +![](book-gallery.png) + diff --git a/content/posts/codeswitching-visualization/.ipynb_checkpoints/index-checkpoint.md b/content/posts/codeswitching-visualization/.ipynb_checkpoints/index-checkpoint.md new file mode 100644 index 0000000..040f522 --- /dev/null +++ b/content/posts/codeswitching-visualization/.ipynb_checkpoints/index-checkpoint.md @@ -0,0 +1,153 @@ +--- +title: "Visualizing Code-Switching with Step Charts in Matplotlib" +date: 2020-08-25T12:33:20-07:00 +description: "Learn how to easily create step charts through examining the multilingualism of pop group WayV" +categories: ["tutorials", "graphs"] +author: J (a.k.a. WayV Subs & Translations) +displayInList: true +draft: false + +resources: +- name: featuredImage + src: "Image1.png" + params: + showOnTop: false + +--- + +![](Image1.png) + +# Introduction + +Code-switching is the practice of alternating between two or more languages in the context of a single conversation, either consciously or unconsciously. As someone who grew up bilingual and is currently learning other languages, I find code-switching a fascinating facet of communication from not only a purely linguistic perspective, but also a social one. In particular, I've personally found that code-switching often helps build a sense of community and familiarity in a group and that the unique ways in which speakers code-switch with each other greatly contribute to shaping group dynamics. + +This is something that's evident in seven-member pop boy group WayV. Aside from their discography, artistry, and group chemistry, WayV is well-known among fans and many non-fans alike for their multilingualism and code-switching, which many fans have affectionately coined as "WayV language." Every member in the group is fluent in both Mandarin and Korean, and at least one member in the group is fluent in one or more of the following: English, Cantonese, Thai, Wenzhounese, and German. It's an impressive trait that's become a trademark of WayV as they've quickly drawn a global audience since their debut in January 2019. Their multilingualism is reflected in their music as well. On top of their regular album releases in Mandarin, WayV has also released singles in Korean and English, with their latest single "Bad Alive (English Ver.)" being a mix of English, Korean, and Mandarin. + +As an independent translator who translates WayV content into English, I've become keenly aware of the true extent and rate of WayV's code-switching when communicating with each other. In a lot of their content, WayV frequently switches between three or more languages every couple of seconds, a phenomenon that can make translating quite challenging at times, but also extremely rewarding and fun. I wanted to be able to present this aspect of WayV in a way that would both highlight their linguistic skills and present this dimension of their group dynamic in a more concrete, quantitative, and visually intuitive manner, beyond just stating that "they code-switch a lot." This prompted me to make step charts - perfect for displaying data that changes at irregular intervals but remains constant between the changes - in hopes of enriching the viewer's experience and helping make a potentially abstract concept more understandable and readily consumable. With a step chart, it becomes more apparent to the viewer the extent of how a group communicates, and cross-sections of the graph allow a rudimentary look into how multilinguals influence each other in code-switching. + +# Tutorial +This tutorial on creating step charts uses one of WayV's livestreams as an example. There were four members in this livestream and a total of eight languages/dialects spoken. I will go through the basic steps of creating a step chart that depicts the frequency of code-switching for just one member. A full code chunk that shows how to layer two or more step chart lines in one graph to depict code-switching for multiple members can be found near the end. + +## Dataset +First, we import the required libaries and load the data into a Pandas dataframe. + + import pandas as pd + import matplotlib.pyplot as plt + import seaborn as sns + +This dataset includes the timestamp of every switch (in seconds) and the language of switch for one speaker. + + df_h = pd.read_csv("WayVHendery.csv") + HENDERY = df_h.reset_index() + HENDERY.head() + + +| index | time | lang | +| ---- |----|----| +| 0 | 2 | ENG | +| 1 | 3 | KOR | +| 2 | 10 | ENG | +| 3 | 13 | MAND| +| 4 | 15 | ENG | + + +## Plotting +With the dataset loaded, we can now set up our graph in terms of determining the size of the figure, dpi, font size, and axes limits. We can also play around with the aesthetics, such as modifying the colors of our plot. These few simple steps easily transform the default all-white graph into a more visually appealing one. + +### Without Customization + fig, ax = plt.subplots(figsize = (20,12)) + +![](fig1.png) + +### With Customization + + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + fig, ax = plt.subplots(figsize = (20,12), dpi = 300) + + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + +![](fig2.png) + + + + +Following this, we can make our step chart line easily with matplotlib.pyplot.step, in which we plot the x and y values and determine the text of the legend, color of the step chart line, and width of the step chart line. + + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + +![](fig3.png) + +## Labeling +Of course, we want to know not only how many switches there were and when they occurred, but also to what language the member switched. For this, we can write a for loop that labels each switch with its respective language as recorded in our dataset. + + for x,y,z in zip(HENDERY["time"], HENDERY["index"], HENDERY["lang"]): + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + +![](fig4.png) + +## Final Touches +Now add a title, save the graph, and there you have it! + + plt.title("WayV Livestream Code-Switching", fontsize = 35) + + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +Below is the complete code for layering step chart lines for multiple speakers in one graph. You can see how easy it is to take the code for visualizing the code-switching of one speaker and adapt it to visualizing that of multiple speakers. In addition, you can see that I've intentionally left the title blank so I can incorporate external graphic adjustments after I created the chart in Matplotlib, such as the addition of my social media handle and the use of a specific font I wanted, which you can see in the final graph. With visualizations being all about communicating information, I believe using Matplotlib in conjunction with simple elements of graphic design can be another way to make whatever you're presenting that little bit more effective and personal, especially when you're doing so on social media platforms. + +## Complete Code for Step Chart of Multiple Speakers + + + # Initialize graph color and size + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + + fig, ax = plt.subplots(figsize = (20,12), dpi = 120) + + # Set up axes and labels + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + + # Layer step charts for each speaker + ax.step(YANGYANG.time, YANGYANG.index, label = "YANGYANG", color = "firebrick", linewidth = 4) + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + ax.step(TEN.time, TEN.index, label = "TEN", color = "mediumpurple", linewidth = 4) + ax.step(KUN.time, KUN.index, label = "KUN", color = "mediumblue", linewidth = 4) + + # Add legend + ax.legend(fontsize = 17) + + # Label each data point with the language switch + for i in (KUN, TEN, HENDERY, YANGYANG): #for each dataset + for x,y,z in zip(i["time"], i["index"], i["lang"]): #looping within the dataset + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + + # Add title (blank to leave room for external graphics) + plt.title("\n\n", fontsize = 35) + + # Save figure + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +![](Image1.png) +Languages/dialects: Korean (KOR), English (ENG), Mandarin (MAND), German (GER), Cantonese (CANT), Hokkien (HOKK), Teochew (TEO), Thai (THAI) + +186 total switches! That's approximately one code-switch in the group every 2.95 seconds. + +And voilà! There you have it: a brief guide on how to make step charts. While I utilized step charts here to visualize code-switching, you can use them to visualize whatever data you would like. Please feel free to contact me [here](https://twitter.com/WayVSubs2019) if you have any questions or comments. I hope you enjoyed this tutorial, and thank you so much for reading! \ No newline at end of file diff --git a/content/posts/codeswitching-visualization/Image1.png b/content/posts/codeswitching-visualization/Image1.png new file mode 100644 index 0000000..9329c0e Binary files /dev/null and b/content/posts/codeswitching-visualization/Image1.png differ diff --git a/content/posts/codeswitching-visualization/Image3.png b/content/posts/codeswitching-visualization/Image3.png new file mode 100644 index 0000000..9329c0e Binary files /dev/null and b/content/posts/codeswitching-visualization/Image3.png differ diff --git a/content/posts/codeswitching-visualization/fig1.png b/content/posts/codeswitching-visualization/fig1.png new file mode 100644 index 0000000..4fc9754 Binary files /dev/null and b/content/posts/codeswitching-visualization/fig1.png differ diff --git a/content/posts/codeswitching-visualization/fig2.png b/content/posts/codeswitching-visualization/fig2.png new file mode 100644 index 0000000..124f26e Binary files /dev/null and b/content/posts/codeswitching-visualization/fig2.png differ diff --git a/content/posts/codeswitching-visualization/fig3.png b/content/posts/codeswitching-visualization/fig3.png new file mode 100644 index 0000000..f4848f1 Binary files /dev/null and b/content/posts/codeswitching-visualization/fig3.png differ diff --git a/content/posts/codeswitching-visualization/fig4.png b/content/posts/codeswitching-visualization/fig4.png new file mode 100644 index 0000000..d5026b3 Binary files /dev/null and b/content/posts/codeswitching-visualization/fig4.png differ diff --git a/content/posts/codeswitching-visualization/fig5.png b/content/posts/codeswitching-visualization/fig5.png new file mode 100644 index 0000000..d0d5d5a Binary files /dev/null and b/content/posts/codeswitching-visualization/fig5.png differ diff --git a/content/posts/codeswitching-visualization/index.md b/content/posts/codeswitching-visualization/index.md new file mode 100644 index 0000000..5c91817 --- /dev/null +++ b/content/posts/codeswitching-visualization/index.md @@ -0,0 +1,153 @@ +--- +title: "Visualizing Code-Switching with Step Charts" +date: 2020-09-26T19:41:21-07:00 +description: "Learn how to easily create step charts through examining the multilingualism of pop group WayV" +categories: ["tutorials", "graphs"] +author: J (a.k.a. WayV Subs & Translations) +displayInList: true +draft: false + +resources: +- name: featuredImage + src: "Image1.png" + params: + showOnTop: false + +--- + +![](Image1.png) + +# Introduction + +Code-switching is the practice of alternating between two or more languages in the context of a single conversation, either consciously or unconsciously. As someone who grew up bilingual and is currently learning other languages, I find code-switching a fascinating facet of communication from not only a purely linguistic perspective, but also a social one. In particular, I've personally found that code-switching often helps build a sense of community and familiarity in a group and that the unique ways in which speakers code-switch with each other greatly contribute to shaping group dynamics. + +This is something that's evident in seven-member pop boy group WayV. Aside from their discography, artistry, and group chemistry, WayV is well-known among fans and many non-fans alike for their multilingualism and code-switching, which many fans have affectionately coined as "WayV language." Every member in the group is fluent in both Mandarin and Korean, and at least one member in the group is fluent in one or more of the following: English, Cantonese, Thai, Wenzhounese, and German. It's an impressive trait that's become a trademark of WayV as they've quickly drawn a global audience since their debut in January 2019. Their multilingualism is reflected in their music as well. On top of their regular album releases in Mandarin, WayV has also released singles in Korean and English, with their latest single "Bad Alive (English Ver.)" being a mix of English, Korean, and Mandarin. + +As an independent translator who translates WayV content into English, I've become keenly aware of the true extent and rate of WayV's code-switching when communicating with each other. In a lot of their content, WayV frequently switches between three or more languages every couple of seconds, a phenomenon that can make translating quite challenging at times, but also extremely rewarding and fun. I wanted to be able to present this aspect of WayV in a way that would both highlight their linguistic skills and present this dimension of their group dynamic in a more concrete, quantitative, and visually intuitive manner, beyond just stating that "they code-switch a lot." This prompted me to make step charts - perfect for displaying data that changes at irregular intervals but remains constant between the changes - in hopes of enriching the viewer's experience and helping make a potentially abstract concept more understandable and readily consumable. With a step chart, it becomes more apparent to the viewer the extent of how a group communicates, and cross-sections of the graph allow a rudimentary look into how multilinguals influence each other in code-switching. + +# Tutorial +This tutorial on creating step charts uses one of WayV's livestreams as an example. There were four members in this livestream and a total of eight languages/dialects spoken. I will go through the basic steps of creating a step chart that depicts the frequency of code-switching for just one member. A full code chunk that shows how to layer two or more step chart lines in one graph to depict code-switching for multiple members can be found near the end. + +## Dataset +First, we import the required libraries and load the data into a Pandas dataframe. + + import pandas as pd + import matplotlib.pyplot as plt + import seaborn as sns + +This dataset includes the timestamp of every switch (in seconds) and the language of switch for one speaker. + + df_h = pd.read_csv("WayVHendery.csv") + HENDERY = df_h.reset_index() + HENDERY.head() + + +| index | time | lang | +| ---- |----|----| +| 0 | 2 | ENG | +| 1 | 3 | KOR | +| 2 | 10 | ENG | +| 3 | 13 | MAND| +| 4 | 15 | ENG | + + +## Plotting +With the dataset loaded, we can now set up our graph in terms of determining the size of the figure, dpi, font size, and axes limits. We can also play around with the aesthetics, such as modifying the colors of our plot. These few simple steps easily transform the default all-white graph into a more visually appealing one. + +### Without Customization + fig, ax = plt.subplots(figsize = (20,12)) + +![](fig1.png) + +### With Customization + + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + fig, ax = plt.subplots(figsize = (20,12), dpi = 300) + + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + +![](fig2.png) + + + + +Following this, we can make our step chart line easily with matplotlib.pyplot.step, in which we plot the x and y values and determine the text of the legend, color of the step chart line, and width of the step chart line. + + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + +![](fig3.png) + +## Labeling +Of course, we want to know not only how many switches there were and when they occurred, but also to what language the member switched. For this, we can write a for loop that labels each switch with its respective language as recorded in our dataset. + + for x,y,z in zip(HENDERY["time"], HENDERY["index"], HENDERY["lang"]): + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + +![](fig4.png) + +## Final Touches +Now add a title, save the graph, and there you have it! + + plt.title("WayV Livestream Code-Switching", fontsize = 35) + + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +Below is the complete code for layering step chart lines for multiple speakers in one graph. You can see how easy it is to take the code for visualizing the code-switching of one speaker and adapt it to visualizing that of multiple speakers. In addition, you can see that I've intentionally left the title blank so I can incorporate external graphic adjustments after I created the chart in Matplotlib, such as the addition of my social media handle and the use of a specific font I wanted, which you can see in the final graph. With visualizations being all about communicating information, I believe using Matplotlib in conjunction with simple elements of graphic design can be another way to make whatever you're presenting that little bit more effective and personal, especially when you're doing so on social media platforms. + +## Complete Code for Step Chart of Multiple Speakers + + + # Initialize graph color and size + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + + fig, ax = plt.subplots(figsize = (20,12), dpi = 120) + + # Set up axes and labels + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + + # Layer step charts for each speaker + ax.step(YANGYANG.time, YANGYANG.index, label = "YANGYANG", color = "firebrick", linewidth = 4) + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + ax.step(TEN.time, TEN.index, label = "TEN", color = "mediumpurple", linewidth = 4) + ax.step(KUN.time, KUN.index, label = "KUN", color = "mediumblue", linewidth = 4) + + # Add legend + ax.legend(fontsize = 17) + + # Label each data point with the language switch + for i in (KUN, TEN, HENDERY, YANGYANG): #for each dataset + for x,y,z in zip(i["time"], i["index"], i["lang"]): #looping within the dataset + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + + # Add title (blank to leave room for external graphics) + plt.title("\n\n", fontsize = 35) + + # Save figure + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +![](Image1.png) +Languages/dialects: Korean (KOR), English (ENG), Mandarin (MAND), German (GER), Cantonese (CANT), Hokkien (HOKK), Teochew (TEO), Thai (THAI) + +186 total switches! That's approximately one code-switch in the group every 2.95 seconds. + +And voilà! There you have it: a brief guide on how to make step charts. While I utilized step charts here to visualize code-switching, you can use them to visualize whatever data you would like. Please feel free to contact me [here](https://twitter.com/WayVSubs2019) if you have any questions or comments. I hope you enjoyed this tutorial, and thank you so much for reading! diff --git a/content/posts/gsod-developing-matplotlib-entry-paths/index.md b/content/posts/gsod-developing-matplotlib-entry-paths/index.md new file mode 100644 index 0000000..3294ffe --- /dev/null +++ b/content/posts/gsod-developing-matplotlib-entry-paths/index.md @@ -0,0 +1,61 @@ +--- +title: "GSoD: Developing Matplotlib Entry Paths" +date: 2020-12-08T08:16:42-08:00 +draft: false +description: "This is my first post contribution to Matplotlib." +categories: ["GSoD"] +displayInList: true +author: Jerome Villegas +--- + +# Introduction + +This year’s Google Season of Docs (GSoD) provided me the opportunity to work with the open source organization, Matplotlib. In early summer, I submitted my proposal of Developing Matplotlib Entry Paths with the goal of improving the documentation with an alternative approach to writing. + +I had set out to identify with users more by providing real world contexts to examples and programming. My purpose was to lower the barrier of entry for others to begin using the Python library with an expository approach. I focused on aligning with users based on consistent derived purposes and a foundation of task-based empathy. + +The project began during the community bonding phase with learning the fundamentals of building documentation and working with open source code. I later generated usability testing surveys to the community and consolidated findings. From these results, I developed two new documents for merging into the Matplotlib repository, a Getting Started introductory tutorial and a lean Style Guide for the documentation. + +# Project Report + +Throughout this year’s Season of Docs with Matplotlib, I learned a great deal about working on open source projects, provided contributions of surveying communities and interviewing subject matter experts in documentation usability testing, and produced a comprehensive introductory guide for improving entry-level content with an initiative style guide section. + +As a new user to Git and GitHub, I had a learning curve in getting started with building documentation locally on my machine. Working with cloning repositories and familiarizing myself with commits and pull requests took the bulk of the first few weeks on this project. However, with experiencing errors and troubleshooting broken branches, it was excellent to be able to lean on my mentors for resolving these issues. Platforms like Gitter, Zoom, and HackMD were key in keeping communication timely and concise. I was fortunate to be able to get in touch with the team to help me as soon as I had problems. + +With programming, I was not a completely fresh face to Python and Matplotlib. However, installing the library from the source and breaking down functionality to core essentials helped me grow in my understanding of not only the fundamentals, but also the terminology. Tackling everything through my own experience of using Python and then also having suggestions and advice from the development team accelerated the ideas and implementations I aimed to work towards. + +New formats and standards with reStructuredText files and Sphinx compatibility were unfamiliar avenues to me at first. In building documentation and reading through already written content, I adapted to making the most of the features available with the ideas I had for writing material suited for users new to Matplotlib. Making use of tables and code examples embedded allowed me to be more flexible in visual layout and navigation. + +During the beginning stages of the project, I was able to incorporate usability testing for the current documentation. By reaching out to communities on Twitter, Reddit, and various Slack channels, I compiled and consolidated findings that helped shape the language and focus of new content to create. I summarized and shared the community’s responses in addition to separate informational interviews conducted with subject matter experts in my location. These data points helped in justifying and supporting decisions for the scope and direction of the language and content. + +At the end of the project, I completed our agreed upon expectations for the documentation. The focused goal consisted of a Getting Started tutorial to introduce and give context to Matplotlib for new users. In addition, through the documentation as well as the meetings with the community, we acknowledged a missing element of a Style Guide. Though a comprehensive document for the entire library was out of the scope of the project, I put together, in conjunction with the featured task, a lean version that serves as a foundational resource for writing Matplotlib documentation. + +The two sections are part of a current pull request to merge into Matplotlib’s repository. I have already worked through smaller changes to the content and am working with the community in moving forward with the process. + +# Conclusion + +This Season of Docs proposal began as a vision of ideals I hoped to share and work towards with an organization and has become a technical writing experience full of growth and camaraderie. I am pleased with the progress I had made and cannot thank the team enough for the leadership and mentorship they provided. It is fulfilling and rewarding to both appreciate and be appreciated within a team. + +In addition, the opportunity put together by the team at Google to foster collaboration among skilled contributors cannot be understated. Highlighting the accomplishments of these new teams raises the bar for the open source community. + +# Details + +## Acknowledgements + +Special thanks to Emily Hsu, Joe McEwen, and Smriti Singh for their time and responses, fellow Matplotlib Season of Docs writer Bruno Beltran for his insight and guidance, and the Matplotlib development team mentors Tim, Tom, and Hannah for their patience, support, and approachability for helping a new technical writer like me with my own Getting Started. + +## External Links + +- [Getting Started GSoD Pull Request](https://github.com/matplotlib/matplotlib/pull/18873) +- [Matplotlib User Survey](https://docs.google.com/forms/d/e/1FAIpQLSfPX13wXNOV5LM4OoHUYT3xtSZzVQ6I3ZA4cvz5P6DKuph4aw/viewform?usp=sf_link) +- [User Survey Responses](https://docs.google.com/spreadsheets/d/1z_bAu7hG-IgtFkM5uPezkUHQvi6gsWKxoDnh0Hz1K5U/edit?usp=sharing) +- [User Survey Open Questions](https://docs.google.com/spreadsheets/d/15EzVNmWVn2SjCUBc-Kt5Y0_entLgvWRMRYy8syt_-Xg/edit?usp=sharing) +- [HackMD GSoD Meeting Agenda](https://hackmd.io/cSNb2JhrSo26zJGag3bvLg) + +## About Me + +My name is [Jerome Villegas](https://www.linkedin.com/in/jeromefuertevillegas/) and I'm a technical writer based in Seattle. I've been in education and education-adjacent fields for several years before transitioning to the industry of technical communication. My career has taken me to Taiwan to teach English and work in publishing, then to New York City to work in higher education, and back to Seattle where I worked at a private school. + +Since leaving my job, I've taken to supporting my family while studying technical writing at the University of Washington and supplementing the knowledge with learning programming on the side. Along with a former classmate, the two of us have worked with the UX writing community in the Pacific Northwest. We host interview sessions, moderate sessions at conferences, and generate content analyzing trends and patterns in UX/tech writing. + +In telling people what I've got going on in my life, you can find work I've done at my [personal site](https://jeromefvillegas.wordpress.com) and see what we're up to at [shift J](https://teamshiftj.wordpress.com). Thanks for reading! \ No newline at end of file diff --git a/content/posts/how-to-contribute/index.md b/content/posts/how-to-contribute/index.md index f50d8b9..ee06082 100644 --- a/content/posts/how-to-contribute/index.md +++ b/content/posts/how-to-contribute/index.md @@ -16,12 +16,22 @@ resources: Matplotblog relies on your contributions to it. We want to showcase all the amazing projects that make use of Matplotlib. In this post, we will see which steps you have to follow to add a post to our blog. -To manage your contributions, we will use [Git pull requests](https://yangsu.github.io/pull-request-tutorial/). So, if you have not done it already, you first need to clone [our Git repository](https://github.com/matplotlib/matplotblog), by typing the following in a terminal window: +To manage your contributions, we will use [Git pull requests](https://yangsu.github.io/pull-request-tutorial/). So, if you have not done it already, you first need to fork and clone [our Git repository](https://github.com/matplotlib/matplotblog), by clicking on the Fork button on the top right corner of the Github page, and then type the following in a terminal window: ``` -git clone https://github.com/matplotlib/matplotblog.git +git clone git@github.com:[USERNAME]/matplotblog.git +``` +where [USERNAME] should be replaced by your Github username. You now have to make sure that if you reuse this forked repository, it is up to date with the main Matplotblog repository. To do so, type the following: +``` +git remote add upstream https://github.com/matplotlib/matplotblog.git +``` + +You should now create a new branch, which will contain your changes. First, checkout the master: +``` +git checkout master +git merge upstream/master ``` -Then, you should create a new branch, which will contain your changes. +and then create a new branch and check it out: ``` cd matplotblog @@ -83,11 +93,18 @@ hugo server ``` Then open the browser and visit [http://localhost:1313/matplotblog](http://localhost:1313/matplotblog) to make sure your post appears in the homepage. If you spot errors or something that you want to tune, go back to your index.md file and modify it. -When your post is ready to go, you can add it to the repository, commit and push the changes to your branch: +When your post is ready to go, you can add it to your local repository, commit and push the changes to your branch: ``` git add content/posts/my-fancy-title git commit -m "Added new blog post" git push ``` -Finally, submit a pull request to have our admins review your contribution and merge it to the master repository. That is it folks! +Finally, submit a **pull request** to have our admins review your contribution and merge it to the master repository. To do so, type the following: +``` +git checkout post-my-fancy-title +git rebase master +``` +and then go to the page for your fork on GitHub, select your development branch, and click the pull request button. Your pull request will automatically track the changes on your development branch and update. Further info on the pull request process are available [here](https://docs.github.com/en/enterprise/2.16/user/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork). + +That is it folks! diff --git a/content/posts/how-to-create-custom-tables/0_example.png b/content/posts/how-to-create-custom-tables/0_example.png new file mode 100644 index 0000000..e3434de Binary files /dev/null and b/content/posts/how-to-create-custom-tables/0_example.png differ diff --git a/content/posts/how-to-create-custom-tables/1_coordinate_space.png b/content/posts/how-to-create-custom-tables/1_coordinate_space.png new file mode 100644 index 0000000..d96312a Binary files /dev/null and b/content/posts/how-to-create-custom-tables/1_coordinate_space.png differ diff --git a/content/posts/how-to-create-custom-tables/2_adding_data.png b/content/posts/how-to-create-custom-tables/2_adding_data.png new file mode 100644 index 0000000..07af6a5 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/2_adding_data.png differ diff --git a/content/posts/how-to-create-custom-tables/3_headers.png b/content/posts/how-to-create-custom-tables/3_headers.png new file mode 100644 index 0000000..1ba7039 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/3_headers.png differ diff --git a/content/posts/how-to-create-custom-tables/4_gridlines.png b/content/posts/how-to-create-custom-tables/4_gridlines.png new file mode 100644 index 0000000..c6f3a99 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/4_gridlines.png differ diff --git a/content/posts/how-to-create-custom-tables/5_highlight_column.png b/content/posts/how-to-create-custom-tables/5_highlight_column.png new file mode 100644 index 0000000..f01d64b Binary files /dev/null and b/content/posts/how-to-create-custom-tables/5_highlight_column.png differ diff --git a/content/posts/how-to-create-custom-tables/6_hide_axis.png b/content/posts/how-to-create-custom-tables/6_hide_axis.png new file mode 100644 index 0000000..d0db672 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/6_hide_axis.png differ diff --git a/content/posts/how-to-create-custom-tables/6_title.png b/content/posts/how-to-create-custom-tables/6_title.png new file mode 100644 index 0000000..36a15ee Binary files /dev/null and b/content/posts/how-to-create-custom-tables/6_title.png differ diff --git a/content/posts/how-to-create-custom-tables/7_floating_axes.png b/content/posts/how-to-create-custom-tables/7_floating_axes.png new file mode 100644 index 0000000..4500f3c Binary files /dev/null and b/content/posts/how-to-create-custom-tables/7_floating_axes.png differ diff --git a/content/posts/how-to-create-custom-tables/8_sparklines.png b/content/posts/how-to-create-custom-tables/8_sparklines.png new file mode 100644 index 0000000..4830d7a Binary files /dev/null and b/content/posts/how-to-create-custom-tables/8_sparklines.png differ diff --git a/content/posts/how-to-create-custom-tables/header.jpeg b/content/posts/how-to-create-custom-tables/header.jpeg new file mode 100644 index 0000000..e21ee70 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/header.jpeg differ diff --git a/content/posts/how-to-create-custom-tables/index.md b/content/posts/how-to-create-custom-tables/index.md new file mode 100644 index 0000000..5fa935f --- /dev/null +++ b/content/posts/how-to-create-custom-tables/index.md @@ -0,0 +1,235 @@ +--- +title: "How to create custom tables" +date: 2022-03-11T11:10:06Z +draft: false +description: A tutorial on how to create custom tables in Matplotlib which allow for flexible design and customization. +categories: ["tutorials"] +displayInList: true +author: Tim Bayer +resources: +- name: featuredImage + src: "header.jpeg" + params: + description: "header pic" + showOnTop: true +--- + +# Introduction + +This tutorial will teach you how to create custom tables in Matplotlib, which are extremely flexible in terms of the design and layout. You’ll hopefully see that the code is very straightforward! In fact, the main methods we will be using are `ax.text()` and `ax.plot()`. + +I want to give a lot of credit to [Todd Whitehead](https://twitter.com/CrumpledJumper) who has created these types of tables for various Basketball teams and players. His approach to tables is nothing short of fantastic due to the simplicity in design and how he manages to effectively communicate data to his audience. I was very much inspired by his approach and wanted to be able to achieve something similar in Matplotlib. + +Before I begin with the tutorial, I wanted to go through the logic behind my approach as I think it's valuable and transferable to other visualizations (and tools!). + +With that, I would like you to **think of tables as highly structured and organized scatterplots**. Let me explain why: for me, scatterplots are the most fundamental chart type (regardless of tool). + +![Scatterplots](scatterplots.png) + +For example `ax.plot()` automatically "connects the dots" to form a line chart or `ax.bar()` automatically "draws rectangles" across a set of coordinates. Very often (again regardless of tool) we may not always see this process happening. The point is, it is useful to think of any chart as a scatterplot or simply as a collection of shapes based on xy coordinates. This logic / thought process can unlock a ton of *custom* charts as the only thing you need are the coordinates (which can be mathematically computed). + +With that in mind, we can move on to tables! So rather than plotting rectangles or circles we want to plot text and gridlines in a highly organized manner. + +We will aim to create a table like this, which I have posted on Twitter [here](https://twitter.com/TimBayer93/status/1476926897850359809). Note, the only elements added outside of Matplotlib are the fancy arrows and their descriptions. + +![Example](0_example.png) + + +# Creating a custom table + +Importing required libraries. + +```python +import matplotlib as mpl +import matplotlib.patches as patches +from matplotlib import pyplot as plt +``` + +First, we will need to set up a coordinate space - I like two approaches: +1. working with the standard Matplotlib 0-1 scale (on both the x- and y-axis) or +2. an index system based on row / column numbers (this is what I will use here) + +I want to create a coordinate space for a table containing 6 columns and 10 rows - this means (similar to pandas row/column indices) each row will have an index between 0-9 and each column will have an index between 0-6 (this is technically 1 more column than what we defined but one of the columns with a lot of text will span two column “indices”) + +```python +# first, we'll create a new figure and axis object +fig, ax = plt.subplots(figsize=(8,6)) + +# set the number of rows and cols for our table +rows = 10 +cols = 6 + +# create a coordinate system based on the number of rows/columns +# adding a bit of padding on bottom (-1), top (1), right (0.5) +ax.set_ylim(-1, rows + 1) +ax.set_xlim(0, cols + .5) +``` + +![Empty Coordinate Space](1_coordinate_space.png) + +Now, the data we want to plot is sports (football) data. We have information about 10 players and some values against a number of different metrics (which will form our columns) such as goals, shots, passes etc. + +```python +# sample data +data = [ + {'id': 'player10', 'shots': 1, 'passes': 79, 'goals': 0, 'assists': 1}, + {'id': 'player9', 'shots': 2, 'passes': 72, 'goals': 0, 'assists': 1}, + {'id': 'player8', 'shots': 3, 'passes': 47, 'goals': 0, 'assists': 0}, + {'id': 'player7', 'shots': 4, 'passes': 99, 'goals': 0, 'assists': 5}, + {'id': 'player6', 'shots': 5, 'passes': 84, 'goals': 1, 'assists': 4}, + {'id': 'player5', 'shots': 6, 'passes': 56, 'goals': 2, 'assists': 0}, + {'id': 'player4', 'shots': 7, 'passes': 67, 'goals': 0, 'assists': 3}, + {'id': 'player3', 'shots': 8, 'passes': 91, 'goals': 1, 'assists': 1}, + {'id': 'player2', 'shots': 9, 'passes': 75, 'goals': 3, 'assists': 2}, + {'id': 'player1', 'shots': 10, 'passes': 70, 'goals': 4, 'assists': 0} +] +``` + +Next, we will start plotting the table (as a structured scatterplot). I did promise that the code will be very simple, less than 10 lines really, here it is: + + +```python +# from the sample data, each dict in the list represents one row +# each key in the dict represents a column +for row in range(rows): + # extract the row data from the list + d = data[row] + + # the y (row) coordinate is based on the row index (loop) + # the x (column) coordinate is defined based on the order I want to display the data in + + # player name column + ax.text(x=.5, y=row, s=d['id'], va='center', ha='left') + # shots column - this is my "main" column, hence bold text + ax.text(x=2, y=row, s=d['shots'], va='center', ha='right', weight='bold') + # passes column + ax.text(x=3, y=row, s=d['passes'], va='center', ha='right') + # goals column + ax.text(x=4, y=row, s=d['goals'], va='center', ha='right') + # assists column + ax.text(x=5, y=row, s=d['assists'], va='center', ha='right') +``` + +![Adding data](2_adding_data.png) + +As you can see, we are starting to get a basic wireframe of our table. Let's add column headers to further make this *scatterplot* look like a table. + +```python +# Add column headers +# plot them at height y=9.75 to decrease the space to the +# first data row (you'll see why later) +ax.text(.5, 9.75, 'Player', weight='bold', ha='left') +ax.text(2, 9.75, 'Shots', weight='bold', ha='right') +ax.text(3, 9.75, 'Passes', weight='bold', ha='right') +ax.text(4, 9.75, 'Goals', weight='bold', ha='right') +ax.text(5, 9.75, 'Assists', weight='bold', ha='right') +ax.text(6, 9.75, 'Special\nColumn', weight='bold', ha='right', va='bottom') +``` + +![Adding Headers](3_headers.png) + + +# Formatting our table + +The rows and columns of our table are now done. The only thing that is left to do is formatting - much of this is personal choice. The following elements I think are generally useful when it comes to good table design (more research [here](https://www.storytellingwithdata.com/blog/2019/10/29/how-i-improved-the-table)): + +Gridlines: Some level of gridlines are useful (less is more). Generally some guidance to help the audience trace their eyes or fingers across the screen can be helpful (this way we can *group* items too by drawing gridlines around them). + +```python +for row in range(rows): + ax.plot( + [0, cols + 1], + [row -.5, row - .5], + ls=':', + lw='.5', + c='grey' + ) + +# add a main header divider +# remember that we plotted the header row slightly closer to the first data row +# this helps to visually separate the header row from the data rows +# each data row is 1 unit in height, thus bringing the header closer to our +# gridline gives it a distinctive difference. +ax.plot([0, cols + 1], [9.5, 9.5], lw='.5', c='black') +``` + +![Adding Gridlines](4_gridlines.png) + +Another important element for tables in my opinion is highlighting the *key* data points. We already bolded the values that are in the "Shots" column but we can further shade this column to give it further importance to our readers. + +```python +# highlight the column we are sorting by +# using a rectangle patch +rect = patches.Rectangle( + (1.5, -.5), # bottom left starting position (x,y) + .65, # width + 10, # height + ec='none', + fc='grey', + alpha=.2, + zorder=-1 +) +ax.add_patch(rect) +``` + +![Highlight column](5_highlight_column.png) + +We're almost there. The magic piece is `ax.axis(‘off’)`. This hides the axis, axis ticks, labels and everything “attached” to the axes, which means our table now looks like a clean table! + +```python +ax.axis('off') +``` + +![Hide axis](6_hide_axis.png) + +Adding a title is also straightforward. + +```python +ax.set_title( + 'A title for our table!', + loc='left', + fontsize=18, + weight='bold' +) +``` + +![Title](6_title.png) + +# Bonus: Adding special columns + +Finally, if you wish to add images, sparklines, or other custom shapes and patterns then we can do this too. + +To achieve this we will create new floating axes using `fig.add_axes()` to create a new set of floating axes based on the figure coordinates (this is different to our axes coordinate system!). + +Remember that figure coordinates by default are between 0 and 1. [0,0] is the bottom left corner of the entire figure. If you’re unfamiliar with the differences between a figure and axes then check out [Matplotlib's Anatomy of a Figure](https://matplotlib.org/stable/gallery/showcase/anatomy.html) for further details. + +```python +newaxes = [] +for row in range(rows): + # offset each new axes by a set amount depending on the row + # this is probably the most fiddly aspect (TODO: some neater way to automate this) + newaxes.append( + fig.add_axes([.75, .725 - (row*.063), .12, .06]) + ) +``` + +You can see below what these *floating* axes will look like (I say floating because they’re on top of our main axis object). The only tricky thing is figuring out the xy (figure) coordinates for these. + +These *floating* axes behave like any other Matplotlib axes. Therefore, we have access to the same methods such as ax.bar(), ax.plot(), patches, etc. Importantly, each axis has its own independent coordinate system. We can format them as we wish. + +![Floating axes](7_floating_axes.png) + +```python +# plot dummy data as a sparkline for illustration purposes +# you can plot _anything_ here, images, patches, etc. +newaxes[0].plot([0, 1, 2, 3], [1, 2, 0, 2], c='black') +newaxes[0].set_ylim(-1, 3) + +# once again, the key is to hide the axis! +newaxes[0].axis('off') +``` + +![Sparklines](8_sparklines.png) + +That’s it, custom tables in Matplotlib. I did promise very simple code and an ultra-flexible design in terms of what you want / need. You can adjust sizes, colors and pretty much anything with this approach and all you need is simply a loop that plots text in a structured and organized manner. I hope you found it useful. Link to a Google Colab notebook with the code is [here](https://colab.research.google.com/drive/1JshATKxjs7NWz2U8Oy6xOJaLgjldC1CW) + diff --git a/content/posts/how-to-create-custom-tables/scatterplots.png b/content/posts/how-to-create-custom-tables/scatterplots.png new file mode 100644 index 0000000..5e3da1e Binary files /dev/null and b/content/posts/how-to-create-custom-tables/scatterplots.png differ diff --git a/content/posts/ipcc-sr15/IPCC-SR15-cover.jpg b/content/posts/ipcc-sr15/IPCC-SR15-cover.jpg new file mode 100644 index 0000000..56d2092 Binary files /dev/null and b/content/posts/ipcc-sr15/IPCC-SR15-cover.jpg differ diff --git a/content/posts/ipcc-sr15/index.md b/content/posts/ipcc-sr15/index.md new file mode 100644 index 0000000..4d1df3f --- /dev/null +++ b/content/posts/ipcc-sr15/index.md @@ -0,0 +1,96 @@ +--- +title: "Figures in the IPCC Special Report on Global Warming of 1.5°C (SR15)" +date: 2020-12-31T08:32:45+01:00 +draft: false +description: | + Many figures in the IPCC SR15 were generated using Matplotlib. + The data and open-source notebooks were published to increase the transparency and reproducibility of the analysis. +categories: ["academia", "tutorials"] +displayInList: true +author: Daniel Huppmann + +resources: +- name: featuredImage + src: "IPCC-SR15-cover.jpg" + params: + description: "Cover page of the IPCC SR15" + showOnTop: false + +--- + +## Background + +
+ + +
+ Cover of the IPCC SR15
+
+ +The IPCC's *Special Report on Global Warming of 1.5°C* (SR15), published in October 2018, +presented the latest research on anthropogenic climate change. +It was written in response to the 2015 UNFCCC's "Paris Agreement" of + +> holding the increase in the global average temperature to well below 2 °C +> above pre-industrial levels and to pursue efforts to limit the temperature increase to 1.5 °C [...]". + +cf. [Article 2.1.a of the Paris Agreement](https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement) + +As part of the SR15 assessment, an ensemble of quantitative, model-based scenarios +was compiled to underpin the scientific analysis. +Many of the headline statements widely reported by media +are based on this scenario ensemble, including the finding that + +> global net anthropogenic CO2 emissions decline by ~45% from 2010 levels by 2030 + +in all pathways limiting global warming to 1.5°C +(cf. [statement C.1](https://www.ipcc.ch/sr15/chapter/spm/) in the *Summary For Policymakers*). + +## Open-source notebooks for transparency and reproducibility of the assessment + +When preparing the SR15, the authors wanted to go beyond previous reports +not just regarding the scientific rigor and scope of the analysis, +but also establish new standards in terms of openness, transparency and reproducibility. + +The scenario ensemble was made accessible via an interactive *IAMC 1.5°C Scenario Explorer* +([link](http://data.ene.iiasa.ac.at/iamc-1.5c-explorer/#/workspaces)) in line with the +[FAIR principles for scientific data management and stewardship](https://www.go-fair.org/fair-principles/). +The process for compiling, validating and analyzing the scenario ensemble +was described in an open-access manuscript published in *Nature Climate Change* +(doi: [10.1038/s41558-018-0317-4](https://doi.org/10.1038/s41558-018-0317-4)). + +In addition, the Jupyter notebooks generating many of the headline statements, +tables and figures (using Matplotlib) were released under an open-source license +to facilitate a better understanding of the analysis +and enable reuse for subsequent research. +The notebooks are available in [rendered format](https://data.ene.iiasa.ac.at/sr15_scenario_analysis) +and on [GitHub](https://github.com/iiasa/ipcc_sr15_scenario_analysis). + +
+ +
+ Figure 2.4 of the IPCC SR15, showing the range of assumptions of socio-economic drivers
+ across the IAMC 1.5°C Scenario Ensemble
+ Drawn with Matplotlib, source code available here +
+
+ +
+ +
+ Figure 2.15 of the IPCC SR15, showing the primary energy development in illustrative pathways
+ Drawn with Matplotlib, source code available here +
+
+ +## A package for scenario analysis & visualization + +To facilitate reusability of the scripts and plotting utilities +developed for the SR15 analysis, we started the open-source Python package **pyam** +as a toolbox for working with scenarios from integrated-assessment and energy system models. + +The package is a wrapper for [pandas](https://pandas.pydata.org) and Matplotlib +geared for several data formats commonly used in energy modelling. +[Read the docs!](https://pyam-iamc.readthedocs.io) + + diff --git a/content/posts/ipcc-sr15/pyam-header.png b/content/posts/ipcc-sr15/pyam-header.png new file mode 100644 index 0000000..e1a67a7 Binary files /dev/null and b/content/posts/ipcc-sr15/pyam-header.png differ diff --git a/content/posts/ipcc-sr15/sr15-fig2.15.png b/content/posts/ipcc-sr15/sr15-fig2.15.png new file mode 100644 index 0000000..1e52d6f Binary files /dev/null and b/content/posts/ipcc-sr15/sr15-fig2.15.png differ diff --git a/content/posts/ipcc-sr15/sr15-fig2.4.png b/content/posts/ipcc-sr15/sr15-fig2.4.png new file mode 100644 index 0000000..4634846 Binary files /dev/null and b/content/posts/ipcc-sr15/sr15-fig2.4.png differ diff --git a/content/posts/pyplot-vs-object-oriented-interface/index.md b/content/posts/pyplot-vs-object-oriented-interface/index.md index 5dc7dd2..290fc0f 100644 --- a/content/posts/pyplot-vs-object-oriented-interface/index.md +++ b/content/posts/pyplot-vs-object-oriented-interface/index.md @@ -60,7 +60,7 @@ This interface shares a lot of similarities in syntax and methodology with MATLA import matplotlib.pyplot as plt plt.figure(figsize=(9,7), dpi=100) -plt.plot(distance,'bo-') +plt.plot(time,distance,'bo-') plt.xlabel("Time") plt.ylabel("Distance") plt.legend(["Distance"]) @@ -76,7 +76,7 @@ The plot shows how much distance was covered by the free-falling object with eac ```python plt.figure(figsize=(9,7), dpi=100) -plt.plot(velocity,'go-') +plt.plot(time, velocity,'go-') plt.xlabel("Time") plt.ylabel("Velocity") plt.legend(["Velocity"]) @@ -94,8 +94,8 @@ Let's try to see what kind of plot we get when we plot both distance and velocit ```python plt.figure(figsize=(9,7), dpi=100) -plt.plot(velocity,'g-') -plt.plot(distance,'b-') +plt.plot(time, velocity,'g-') +plt.plot(time, distance,'b-') plt.ylabel("Distance and Velocity") plt.xlabel("Time") plt.legend(["Distance", "Velocity"]) diff --git a/content/posts/python-graph-gallery.com/.DS_Store b/content/posts/python-graph-gallery.com/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/content/posts/python-graph-gallery.com/.DS_Store differ diff --git a/content/posts/python-graph-gallery.com/annotations.png b/content/posts/python-graph-gallery.com/annotations.png new file mode 100644 index 0000000..8c22959 Binary files /dev/null and b/content/posts/python-graph-gallery.com/annotations.png differ diff --git a/content/posts/python-graph-gallery.com/boxplot.png b/content/posts/python-graph-gallery.com/boxplot.png new file mode 100644 index 0000000..59e0051 Binary files /dev/null and b/content/posts/python-graph-gallery.com/boxplot.png differ diff --git a/content/posts/python-graph-gallery.com/home-page-overview.png b/content/posts/python-graph-gallery.com/home-page-overview.png new file mode 100644 index 0000000..6616f9b Binary files /dev/null and b/content/posts/python-graph-gallery.com/home-page-overview.png differ diff --git a/content/posts/python-graph-gallery.com/index.md b/content/posts/python-graph-gallery.com/index.md new file mode 100644 index 0000000..a902975 --- /dev/null +++ b/content/posts/python-graph-gallery.com/index.md @@ -0,0 +1,70 @@ +--- +title: "The Python Graph Gallery: hundreds of python charts with reproducible code." +date: 2021-07-24T14:06:57+02:00 +draft: false +description: "The Python Graph Gallery is a website that displays hundreds of chart examples made with python. It goes from very basic to highly customized examples and is based on common viz libraries like matplotlib, seaborn or plotly." +categories: ["tutorials", "graphs"] +displayInList: true +author: Yan Holtz +resources: +- name: featuredImage + src: "home-page-overview.png" + params: + description: "An overview of the gallery homepage" + showOnTop: false +--- + +Data visualization is a key step in a data science pipeline. [Python](https://www.python.org) offers great possibilities when it comes to representing some data graphically, but it can be hard and time-consuming to create the appropriate chart. + +The [Python Graph Gallery](https://www.python-graph-gallery.com) is here to help. It displays many examples, always providing the reproducible code. It allows to build the desired chart in minutes. + +# About 400 charts in 40 sections + +The gallery currently provides more than [400 chart examples](https://www.python-graph-gallery.com/all-charts/). Those examples are organized in 40 sections, one for each chart types: [scatterplot](https://www.python-graph-gallery.com/scatter-plot/), [boxplot](https://www.python-graph-gallery.com/boxplot/), [barplot](https://www.python-graph-gallery.com/barplot/), [treemap](https://www.python-graph-gallery.com/treemap/) and so on. Those chart types are organized in 7 big families as suggested by [data-to-viz.com](https://www.data-to-viz.com): one for each visualization purpose. + +It is important to note that not only the most common chart types are covered. Lesser known charts like [chord diagrams](https://www.python-graph-gallery.com/chord-diagram/), [streamgraphs](https://www.python-graph-gallery.com/streamchart/) or [bubble maps](https://www.python-graph-gallery.com/bubble-map/) are also available. + +![overview of the python graph gallery sections](sections-overview.png) + +# Master the basics + +Each section always starts with some very basic examples. It allows to understand how to build a chart type in a few seconds. Hopefully applying the same technique on another dataset will thus be very quick. + +For instance, the [scatterplot section](https://www.python-graph-gallery.com/scatter-plot/) starts with this [matplotlib](https://matplotlib.org/) example. It shows how to create a dataset with [pandas](https://pandas.pydata.org/) and plot it with the `plot()` function. The main graph argument like `linestyle` and `marker` are described to make sure the code is understandable. + +[_blogpost overview_:](https://www.python-graph-gallery.com/130-basic-matplotlib-scatterplot) + +![a basic scatterplot example](scatterplot-example.png) + +# Matplotlib customization + +The gallery uses several libraries like [seaborn](https://www.python-graph-gallery.com/seaborn/) or [plotly](https://www.python-graph-gallery.com/plotly/) to produce its charts, but is mainly focus on matplotlib. Matplotlib comes with great flexibility and allows to build any kind of chart without limits. + +A [whole page](https://www.python-graph-gallery.com/matplotlib/) is dedicated to matplotlib. It describes how to solve recurring issues like customizing [axes](https://www.python-graph-gallery.com/191-custom-axis-on-matplotlib-chart) or [titles](https://www.python-graph-gallery.com/190-custom-matplotlib-title), adding [annotations](https://www.python-graph-gallery.com/193-annotate-matplotlib-chart) (see below) or even using [custom fonts](https://www.python-graph-gallery.com/custom-fonts-in-matplotlib). + +![annotation examples](annotations.png) + +The gallery is also full of non-straightforward examples. For instance, it has a [tutorial](https://www.python-graph-gallery.com/streamchart-basic-matplotlib) explaining how to build a streamchart with matplotlib. It is based on the `stackplot()` function and adds some smoothing to it: + +![stream chart with python and matplotlib](streamchart.png) + +Last but not least, the gallery also displays some publication ready charts. They usually involve a lot of matplotlib code, but showcase the fine grain control one has over a plot. + +Here is an example with a post inspired by [Tuo Wang](https://www.r-graph-gallery.com/web-violinplot-with-ggstatsplot.html)'s work for the tidyTuesday project. (Code translated from R available [here](https://www.python-graph-gallery.com/web-ggbetweenstats-with-matplotlib)) + +![python violin and boxplot example](boxplot.png) + + +# Contributing + +The python graph gallery is an ever growing project. It is open-source, with all its related code hosted on [github](https://github.com/holtzy/The-Python-Graph-Gallery). + +Contributions are very welcome to the gallery. Each blogpost is just a jupyter notebook so suggestion should be very easy to do through issues or pull requests! + +# Conclusion + +The [python graph gallery](https://www.python-graph-gallery.com) is a project developed by [Yan Holtz](https://www.yan-holtz.com) in his free time. It can help you improve your technical skills when it comes to visualizing data with python. + +The gallery belongs to an ecosystem of educative websites. [Data to viz](https://www.data-to-viz.com) describes best practices in data visualization, the [R](https://www.r-graph-gallery.com), [python](https://www.python-graph-gallery.com) and [d3.js](https://www.d3-graph-gallery.com) graph galleries provide technical help to build charts with the 3 most common tools. + +For any question regarding the project, please say hi on twitter at [@R_Graph_Gallery](https://twitter.com/R_Graph_Gallery)! diff --git a/content/posts/python-graph-gallery.com/scatterplot-example.png b/content/posts/python-graph-gallery.com/scatterplot-example.png new file mode 100644 index 0000000..99d0869 Binary files /dev/null and b/content/posts/python-graph-gallery.com/scatterplot-example.png differ diff --git a/content/posts/python-graph-gallery.com/sections-overview.png b/content/posts/python-graph-gallery.com/sections-overview.png new file mode 100644 index 0000000..7a0da60 Binary files /dev/null and b/content/posts/python-graph-gallery.com/sections-overview.png differ diff --git a/content/posts/python-graph-gallery.com/streamchart.png b/content/posts/python-graph-gallery.com/streamchart.png new file mode 100644 index 0000000..1990a51 Binary files /dev/null and b/content/posts/python-graph-gallery.com/streamchart.png differ diff --git a/content/posts/stellar-chart-alternative-radar-chart/index.md b/content/posts/stellar-chart-alternative-radar-chart/index.md new file mode 100644 index 0000000..55e9cd2 --- /dev/null +++ b/content/posts/stellar-chart-alternative-radar-chart/index.md @@ -0,0 +1,182 @@ +--- +title: "Stellar Chart, a Type of Chart to Be on Your Radar" +date: 2021-01-10T20:29:40Z +draft: false +description: "Learn how to create a simple stellar chart, an alternative to the radar chart." +categories: ["tutorials"] +displayInList: true +author: João Palmeiro +resources: + - name: featuredImage + src: "stellar_chart.png" + params: + description: "example of a stellar chart" + showOnTop: false +--- + +In May 2020, Alexandre Morin-Chassé published a blog post about the **stellar chart**. This type of chart is an (approximately) direct alternative to the **radar chart** (also known as web, spider, star, or cobweb chart) — you can read more about this chart [here](https://medium.com/nightingale/the-stellar-chart-an-elegant-alternative-to-radar-charts-ae6a6931a28e). + +![Comparison of a radar chart and a stellar chart](radar_stellar_chart.png) + +In this tutorial, we will see how we can create a quick-and-dirty stellar chart. First of all, let's get the necessary modules/libraries, as well as prepare a dummy dataset (with just a single record). + +```python +from itertools import chain, zip_longest +from math import ceil, pi + +import matplotlib.pyplot as plt + +data = [ + ("V1", 8), + ("V2", 10), + ("V3", 9), + ("V4", 12), + ("V5", 6), + ("V6", 14), + ("V7", 15), + ("V8", 25), +] +``` + +We will also need some helper functions, namely a function to round up to the nearest 10 (`round_up()`) and a function to join two sequences (`even_odd_merge()`). In the latter, the values of the first sequence (a list or a tuple, basically) will fill the even positions and the values of the second the odd ones. + +```python +def round_up(value): + """ + >>> round_up(25) + 30 + """ + return int(ceil(value / 10.0)) * 10 + + +def even_odd_merge(even, odd, filter_none=True): + """ + >>> list(even_odd_merge([1,3], [2,4])) + [1, 2, 3, 4] + """ + if filter_none: + return filter(None.__ne__, chain.from_iterable(zip_longest(even, odd))) + + return chain.from_iterable(zip_longest(even, odd)) +``` + +That said, to plot `data` on a stellar chart, we need to apply some transformations, as well as calculate some auxiliary values. So, let's start by creating a function (`prepare_angles()`) to calculate the angle of each axis on the chart (`N` corresponds to the number of variables to be plotted). + +```python +def prepare_angles(N): + angles = [n / N * 2 * pi for n in range(N)] + + # Repeat the first angle to close the circle + angles += angles[:1] + + return angles +``` + +Next, we need a function (`prepare_data()`) responsible for adjusting the original data (`data`) and separating it into several easy-to-use objects. + +```python +def prepare_data(data): + labels = [d[0] for d in data] # Variable names + values = [d[1] for d in data] + + # Repeat the first value to close the circle + values += values[:1] + + N = len(labels) + angles = prepare_angles(N) + + return labels, values, angles, N +``` + +Lastly, for this specific type of chart, we require a function (`prepare_stellar_aux_data()`) that, from the previously calculated angles, prepares two lists of auxiliary values: a list of **intermediate angles** for each pair of angles (`stellar_angles`) and a list of small **constant values** (`stellar_values`), which will act as the values of the variables to be plotted in order to achieve the **star-like shape** intended for the stellar chart. + +```python +def prepare_stellar_aux_data(angles, ymax, N): + angle_midpoint = pi / N + + stellar_angles = [angle + angle_midpoint for angle in angles[:-1]] + stellar_values = [0.05 * ymax] * N + + return stellar_angles, stellar_values +``` + +At this point, we already have all the necessary _ingredients_ for the stellar chart, so let's move on to the Matplotlib side of this tutorial. In terms of **aesthetics**, we can rely on a function (`draw_peripherals()`) designed for this specific purpose (feel free to customize it!). + +```python +def draw_peripherals(ax, labels, angles, ymax, outer_color, inner_color): + # X-axis + ax.set_xticks(angles[:-1]) + ax.set_xticklabels(labels, color=outer_color, size=8) + + # Y-axis + ax.set_yticks(range(10, ymax, 10)) + ax.set_yticklabels(range(10, ymax, 10), color=inner_color, size=7) + ax.set_ylim(0, ymax) + ax.set_rlabel_position(0) + + # Both axes + ax.set_axisbelow(True) + + # Boundary line + ax.spines["polar"].set_color(outer_color) + + # Grid lines + ax.xaxis.grid(True, color=inner_color, linestyle="-") + ax.yaxis.grid(True, color=inner_color, linestyle="-") +``` + +To **plot the data** and orchestrate (almost) all the steps necessary to have a stellar chart, we just need one last function: `draw_stellar()`. + +```python +def draw_stellar( + ax, + labels, + values, + angles, + N, + shape_color="tab:blue", + outer_color="slategrey", + inner_color="lightgrey", +): + # Limit the Y-axis according to the data to be plotted + ymax = round_up(max(values)) + + # Get the lists of angles and variable values + # with the necessary auxiliary values injected + stellar_angles, stellar_values = prepare_stellar_aux_data(angles, ymax, N) + all_angles = list(even_odd_merge(angles, stellar_angles)) + all_values = list(even_odd_merge(values, stellar_values)) + + # Apply the desired style to the figure elements + draw_peripherals(ax, labels, angles, ymax, outer_color, inner_color) + + # Draw (and fill) the star-shaped outer line/area + ax.plot( + all_angles, + all_values, + linewidth=1, + linestyle="solid", + solid_joinstyle="round", + color=shape_color, + ) + + ax.fill(all_angles, all_values, shape_color) + + # Add a small hole in the center of the chart + ax.plot(0, 0, marker="o", color="white", markersize=3) +``` + +Finally, let's get our chart on a _blank canvas_ (figure). + +```python +fig = plt.figure(dpi=100) +ax = fig.add_subplot(111, polar=True) # Don't forget the projection! + +draw_stellar(ax, *prepare_data(data)) + +plt.show() +``` + +![Example of a stellar chart](stellar_chart.png) + +It's done! Right now, you have an example of a stellar chart and the boilerplate code to add this type of chart to your _repertoire_. If you end up creating your own stellar charts, feel free to share them with the _world_ (and [me](https://twitter.com/joaompalmeiro)!). I hope this tutorial was useful and interesting for you! diff --git a/content/posts/stellar-chart-alternative-radar-chart/radar_stellar_chart.png b/content/posts/stellar-chart-alternative-radar-chart/radar_stellar_chart.png new file mode 100644 index 0000000..6699cc7 Binary files /dev/null and b/content/posts/stellar-chart-alternative-radar-chart/radar_stellar_chart.png differ diff --git a/content/posts/stellar-chart-alternative-radar-chart/stellar_chart.png b/content/posts/stellar-chart-alternative-radar-chart/stellar_chart.png new file mode 100644 index 0000000..1f73871 Binary files /dev/null and b/content/posts/stellar-chart-alternative-radar-chart/stellar_chart.png differ diff --git a/content/posts/unc-biol222/fox.png b/content/posts/unc-biol222/fox.png new file mode 100644 index 0000000..5d8307d Binary files /dev/null and b/content/posts/unc-biol222/fox.png differ diff --git a/content/posts/unc-biol222/index.md b/content/posts/unc-biol222/index.md new file mode 100644 index 0000000..24640da --- /dev/null +++ b/content/posts/unc-biol222/index.md @@ -0,0 +1,218 @@ +--- +title: "Art from UNC BIOL222" +date: 2021-11-19T08:46:00-08:00 +draft: false +description: "UNC BIOL222: Art created with Matplotlib" +categories: ["art", "academia"] +displayInList: true +author: Joseph Lucas +resources: +- name: featuredImage + src: "fox.png" + params: + description: "Emily Foster's Fox" + showOnTop: true +--- + +As part of the University of North Carolina BIOL222 class, [Dr. Catherine Kehl](https://twitter.com/tylikcat) asked her students to "use `matplotlib.pyplot` to make art." BIOL222 is Introduction to Programming, aimed at students with no programming background. The emphasis is on practical, hands-on active learning. + +The students completed the assignment with festive enthusiasm around Halloween. Here are some great examples: + +Harris Davis showed an affinity for pumpkins, opting to go 3D! +![3D Pumpkin](pumpkin.png) +```python +# get library for 3d plotting +from mpl_toolkits.mplot3d import Axes3D + +# make a pumpkin :) +rho = np.linspace(0, 3*np.pi,32) +theta, phi = np.meshgrid(rho, rho) +r, R = .5, .5 +X = (R + r * np.cos(phi)) * np.cos(theta) +Y = (R + r * np.cos(phi)) * np.sin(theta) +Z = r * np.sin(phi) + +# make the stem +theta1 = np.linspace(0,2*np.pi,90) +r1 = np.linspace(0,3,50) +T1, R1 = np.meshgrid(theta1, r1) +X1 = R1 * .5*np.sin(T1) +Y1 = R1 * .5*np.cos(T1) +Z1 = -(np.sqrt(X1**2 + Y1**2) - .7) +Z1[Z1 < .3] = np.nan +Z1[Z1 > .7] = np.nan + +# Display the pumpkin & stem +fig = plt.figure() +ax = fig.gca(projection = '3d') +ax.set_xlim3d(-1, 1) +ax.set_ylim3d(-1, 1) +ax.set_zlim3d(-1, 1) +ax.plot_surface(X, Y, Z, color = 'tab:orange', rstride = 1, cstride = 1) +ax.plot_surface(X1, Y1, Z1, color = 'tab:green', rstride = 1, cstride = 1) +plt.show() +``` + +Bryce Desantis stuck to the biological theme and demonstrated [fractal](https://en.wikipedia.org/wiki/Fractal) art. +![Bryce Fern](leaf.png) +```python +import numpy as np +import matplotlib.pyplot as plt + +#Barnsley's Fern - Fractal; en.wikipedia.org/wiki/Barnsley_… + +#functions for each part of fern: +#stem +def stem(x,y): + return (0, 0.16*y) +#smaller leaflets +def smallLeaf(x,y): + return (0.85*x + 0.04*y, -0.04*x + 0.85*y + 1.6) +#large left leaflets +def leftLarge(x,y): + return (0.2*x - 0.26*y, 0.23*x + 0.22*y + 1.6) +#large right leftlets +def rightLarge(x,y): + return (-0.15*x + 0.28*y, 0.26*x + 0.24*y + 0.44) +componentFunctions = [stem, smallLeaf, leftLarge, rightLarge] + +# number of data points and frequencies for parts of fern generated: +#lists with all 75000 datapoints +datapoints = 75000 +x, y = 0, 0 +datapointsX = [] +datapointsY = [] +#For 75,000 datapoints +for n in range(datapoints): + FrequencyFunction = np.random.choice(componentFunctions, p=[0.01, 0.85, 0.07, 0.07]) + x, y = FrequencyFunction(x,y) + datapointsX.append(x) + datapointsY.append(y) + +#Scatter plot & scaled down to 0.1 to show more definition: +plt.scatter(datapointsX,datapointsY,s=0.1, color='g') +#Title of Figure +plt.title("Barnsley's Fern - Assignment 3") +#Changing background color +ax = plt.axes() +ax.set_facecolor("#d8d7bf") +``` + +Grace Bell got a little trippy with this rotationally semetric art. It's pretty cool how she captured mouse events. It reminds us of a flower. What do you see? +![Rotations](rotations.png) +```python +import matplotlib.pyplot as plt +from matplotlib.tri import Triangulation +from matplotlib.patches import Polygon +import numpy as np + +#I found this sample code online and manipulated it to make the art piece! +#was interested in because it combined what we used for functions as well as what we used for plotting with (x,y) +def update_polygon(tri): + if tri == -1: + points = [0, 0, 0] + else: + points = triang.triangles[tri] + xs = triang.x[points] + ys = triang.y[points] + polygon.set_xy(np.column_stack([xs, ys])) + +def on_mouse_move(event): + if event.inaxes is None: + tri = -1 + else: + tri = trifinder(event.xdata, event.ydata) + update_polygon(tri) + ax.set_title(f'In triangle {tri}') + event.canvas.draw() +#this is the info that creates the angles +n_angles = 14 +n_radii = 7 +min_radius = 0.1 #the radius of the middle circle can move with this variable +radii = np.linspace(min_radius, 0.95, n_radii) +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles +x = (radii*np.cos(angles)).flatten() +y = (radii*np.sin(angles)).flatten() +triang = Triangulation(x, y) +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) + +trifinder = triang.get_trifinder() + +fig, ax = plt.subplots(subplot_kw={'aspect': 'equal'}) +ax.triplot(triang, 'y+-') #made the color of the plot yellow and there are "+" for the data points but you can't really see them because of the lines crossing +polygon = Polygon([[0, 0], [0, 0]], facecolor='y') +update_polygon(-1) +ax.add_patch(polygon) +fig.canvas.mpl_connect('motion_notify_event', on_mouse_move) +plt.show() +``` + +As a bonus, did you like that fox in the banner? That was created (and well documented) by Emily Foster! +```python +import numpy as np +import matplotlib.pyplot as plt + +plt.axis('off') + +#head +xhead = np.arange(-50,50,0.1) +yhead = -0.007*(xhead*xhead) + 100 + +plt.plot(xhead, yhead, 'darkorange') + +#outer ears +xearL = np.arange(-45.8,-9,0.1) +yearL = -0.08*(xearL*xearL) -4*xearL + 70 + +xearR = np.arange(9,45.8,0.1) +yearR = -0.08*(xearR*xearR) + 4*xearR + 70 + +plt.plot(xearL, yearL, 'black') +plt.plot(xearR, yearR, 'black') + +#inner ears +xinL = np.arange(-41.1,-13.7,0.1) +yinL = -0.08*(xinL*xinL) -4*xinL + 59 + +xinR = np.arange(13.7,41.1,0.1) +yinR = -0.08*(xinR*xinR) + 4*xinR + 59 + +plt.plot(xinL, yinL, 'salmon') +plt.plot(xinR, yinR, 'salmon') + +# bottom of face +xfaceL = np.arange(-49.6,-14,0.1) +xfaceR = np.arange(14,49.3,0.1) +xfaceM = np.arange(-14,14,0.1) + +plt.plot(xfaceL, abs(xfaceL), 'darkorange') +plt.plot(xfaceR, abs(xfaceR), 'darkorange') +plt.plot(xfaceM, abs(xfaceM), 'black') + +#nose +xnose = np.arange(-14,14,0.1) +ynose = -0.03*(xnose*xnose) + 20 + +plt.plot(xnose, ynose, 'black') + +#whiskers +xwhiskR = [50, 70, 55, 70, 55, 70, 49.3] +xwhiskL = [-50, -70, -55, -70, -55, -70, -49.3] +ywhisk = [82.6, 85, 70, 65, 60, 45, 49.3] + +plt.plot(xwhiskR, ywhisk, 'darkorange') +plt.plot(xwhiskL, ywhisk, 'darkorange') + +#eyes +plt.plot(20,60, color = 'black', marker = 'o', markersize = 15) +plt.plot(-20,60,color = 'black', marker = 'o', markersize = 15) + +plt.plot(22,62, color = 'white', marker = 'o', markersize = 6) +plt.plot(-18,62,color = 'white', marker = 'o', markersize = 6) +``` + +We look forward to seeing these students continue in their plotting and scientific adventures! \ No newline at end of file diff --git a/content/posts/unc-biol222/leaf.png b/content/posts/unc-biol222/leaf.png new file mode 100644 index 0000000..448b82d Binary files /dev/null and b/content/posts/unc-biol222/leaf.png differ diff --git a/content/posts/unc-biol222/pumpkin.png b/content/posts/unc-biol222/pumpkin.png new file mode 100644 index 0000000..76eeaf7 Binary files /dev/null and b/content/posts/unc-biol222/pumpkin.png differ diff --git a/content/posts/unc-biol222/rotations.png b/content/posts/unc-biol222/rotations.png new file mode 100644 index 0000000..dd9c045 Binary files /dev/null and b/content/posts/unc-biol222/rotations.png differ diff --git a/content/posts/visualising-usage-using-batteries/Liverpool.png b/content/posts/visualising-usage-using-batteries/Liverpool.png new file mode 100644 index 0000000..4114444 Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/Liverpool.png differ diff --git a/content/posts/visualising-usage-using-batteries/Liverpool_Usage_Chart.png b/content/posts/visualising-usage-using-batteries/Liverpool_Usage_Chart.png new file mode 100644 index 0000000..b7586e4 Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/Liverpool_Usage_Chart.png differ diff --git a/content/posts/visualising-usage-using-batteries/battery.png b/content/posts/visualising-usage-using-batteries/battery.png new file mode 100644 index 0000000..4a131c1 Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/battery.png differ diff --git a/content/posts/visualising-usage-using-batteries/data.PNG b/content/posts/visualising-usage-using-batteries/data.PNG new file mode 100644 index 0000000..89f472e Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/data.PNG differ diff --git a/content/posts/visualising-usage-using-batteries/head_data.PNG b/content/posts/visualising-usage-using-batteries/head_data.PNG new file mode 100644 index 0000000..3f6d6cb Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/head_data.PNG differ diff --git a/content/posts/visualising-usage-using-batteries/index.md b/content/posts/visualising-usage-using-batteries/index.md new file mode 100644 index 0000000..b056466 --- /dev/null +++ b/content/posts/visualising-usage-using-batteries/index.md @@ -0,0 +1,220 @@ +--- +title: "Battery Charts - Visualise usage rates & more" +date: 2021-08-19T16:52:58+05:30 +draft: false +description: A tutorial on how to show usage rates and more using batteries +categories: ["tutorials"] +displayInList: true +author: Rithwik Rajendran +resources: +- name: featuredImage + src: "Liverpool_Usage_Chart.png" + params: + description: "my image description" + showOnTop: true + +--- + +# Introduction + +I have been creating common visualisations like scatter plots, bar charts, beeswarms etc. for a while and thought about doing something different. Since I'm an avid football fan, I thought of ideas to represent players' usage or involvement over a period (a season, a couple of seasons). I have seen some cool visualisations like donuts which depict usage and I wanted to make something different and simple to understand. I thought about representing batteries as a form of player usage and it made a lot of sense. + +For players who have been barely used (played fewer minutes) show a ***large amount of battery*** present since they have enough energy left in the tank. And for heavily used players, do the opposite i.e. show ***drained or less amount of battery*** + +So, what is the purpose of a battery chart? You can use it to show usage, consumption, involvement, fatigue etc. (anything usage related). + +The image below is a sample view of how a battery would look in our figure, although a single battery isn't exactly what we are going to recreate in this tutorial. + +![A sample visualisation](battery.png) + +# Tutorial + +Before jumping on to the tutorial, I would like to make it known that the function can be tweaked to fit accordingly depending on the number of subplots or any other size parameter. Coming to the figure we are going to plot, there are a series of steps that is to be considered which we will follow one by one. The following are those steps:- + +1. Outlining what we are going to plot +2. Import necessary libraries +3. Write a function to draw the battery + - This is the function that will be called to plot the battery chart +4. Read the data and plot the chart accordingly + - We will demonstrate it with an example + + +## Plot Outline + +What is our use case? + +- We are given a dataset where we have data of Liverpool's players and their minutes played in the last 2 seasons (for whichever club they for played in that time period). We will use this data for our visualisation. +- The final visualisation is the featured image of this blog post. We will navigate step-by-step as to how we'll create the visualisation. + +## Importing Libraries + +The first and foremost part is to import the essential libraries so that we can leverage the functions within. In this case, we will import the libraries we need. + +```python +import pandas as pd +import matplotlib.pyplot as plt +from matplotlib.path import Path +from matplotlib.patches import FancyBboxPatch, PathPatch, Wedge +``` + +The functions imported from `matplotlib.path` and `matplotlib.patches` will be used to draw lines, rectangles, boxes and so on to display the battery as it is. + +## Drawing the Battery - A function + +The next part is to define a function named `draw_battery()`, which will be used to draw the battery. Later on, we will call this function by specifying certain parameters to build the figure as we require. The following below is the code to build the battery - + +```python +def draw_battery(fig, ax, percentage=0, bat_ec="grey", + tip_fc="none", tip_ec="grey", + bol_fc="#fdfdfd", bol_ec="grey", invert_perc=False): + ''' + Parameters + ---------- + fig : figure + The figure object for the plot + ax : axes + The axes/axis variable of the figure. + percentage : int, optional + This is the battery percentage - size of the fill. The default is 0. + bat_ec : str, optional + The edge color of the battery/cell. The default is "grey". + tip_fc : str, optional + The fill/face color of the tip of battery. The default is "none". + tip_ec : str, optional + The edge color of the tip of battery. The default is "grey". + bol_fc : str, optional + The fill/face color of the lighning bolt. The default is "#fdfdfd". + bol_ec : str, optional + The edge color of the lighning bolt. The default is "grey". + invert_perc : bool, optional + A flag to invert the percentage shown inside the battery. The default is False + + Returns + ------- + None. + + ''' + try: + fig.set_size_inches((15,15)) + ax.set(xlim=(0, 20), ylim=(0, 5)) + ax.axis("off") + if invert_perc == True: + percentage = 100 - percentage + # color options - #fc3d2e red & #53d069 green & #f5c54e yellow + bat_fc = "#fc3d2e" if percentage <= 20 else "#53d069" if percentage >= 80 else "#f5c54e" + + ''' + Static battery and tip of battery + ''' + battery = FancyBboxPatch((5, 2.1), 10, 0.8, + "round, pad=0.2, rounding_size=0.5", + fc="none", ec=bat_ec, fill=True, + ls="-", lw=1.5) + tip = Wedge((15.35, 2.5), 0.2, 270, 90, fc="none", + ec=bat_ec, fill=True, + ls="-", lw=3) + ax.add_artist(battery) + ax.add_artist(tip) + + ''' + Filling the battery cell with the data + ''' + filler = FancyBboxPatch((5.1, 2.13), (percentage/10)-0.2, 0.74, + "round, pad=0.2, rounding_size=0.5", + fc=bat_fc, ec=bat_fc, fill=True, + ls="-", lw=0) + ax.add_artist(filler) + + ''' + Adding a lightning bolt in the centre of the cell + ''' + verts = [ + (10.5, 3.1), #top + (8.5, 2.4), #left + (9.5, 2.4), #left mid + (9, 1.9), #bottom + (11, 2.6), #right + (10, 2.6), #right mid + (10.5, 3.1), #top + ] + + codes = [ + Path.MOVETO, + Path.LINETO, + Path.LINETO, + Path.LINETO, + Path.LINETO, + Path.LINETO, + Path.CLOSEPOLY, + ] + path = Path(verts, codes) + bolt = PathPatch(path, fc=bol_fc, + ec=bol_ec, lw=1.5) + ax.add_artist(bolt) + except Exception as e: + import traceback + print("EXCEPTION FOUND!!! SAFELY EXITING!!! Find the details below:") + traceback.print_exc() + +``` + +## Reading the Data + +Once we have created the API or function, we can now implement the same. And for that, we need to feed in required data. In our example, we have a dataset that has the list of Liverpool players and the minutes they have played in the past two seasons. The data was collected from Football Reference aka FBRef. + +We use the read excel function in the pandas library to read our dataset that is stored as an excel file. + +```python +data = pd.read_excel("Liverpool Minutes Played.xlsx") +``` + +Now, let us have a look at how the data looks by listing out the first five rows of our dataset - + +```python +data.head() +``` +![The first 5 rows of our dataset](head_data.PNG) + +## Plotting our data + +Now that everything is ready, we go ahead and plot the data. We have 25 players in our dataset, so a 5 x 5 figure is the one to go for. We'll also add some headers and set the colors accordingly. + +```python +fig, ax = plt.subplots(5, 5, figsize=(5, 5)) +facecolor = "#00001a" +fig.set_facecolor(facecolor) +fig.text(0.35, 0.95, "Liverpool: Player Usage/Involvement", color="white", size=18, fontname="Libre Baskerville", fontweight="bold") +fig.text(0.25, 0.92, "Data from 19/20 and 20/21 | Battery percentage indicate usage | less battery = played more/ more involved", color="white", size=12, fontname="Libre Baskerville") +``` + +We have now now filled in appropriate headers, figure size etc. The next step is to plot all the axes i.e. batteries for each and every player. `p` is the variable used to iterate through the dataframe and fetch each players data. The `draw_battery()` function call will obviously plot the battery. We also add the required labels along with that - player name and usage rate/percentage in this case. + +```python +p = 0 #The variable that'll iterate through each row of the dataframe (for every player) +for i in range(0, 5): + for j in range(0, 5): + ax[i, j].text(10, 4, str(data.iloc[p, 0]), color="white", size=14, fontname="Lora", va='center', ha='center') + ax[i, j].set_facecolor(facecolor) + draw_battery(fig, ax[i, j], round(data.iloc[p, 8]), invert_perc=True) + ''' + Add the battery percentage as text if a label is required + ''' + ax[i, j].text(5, 0.9, "Usage - "+ str(int(100 - round(data.iloc[p, 8]))) + "%", fontsize=12, color="white") + p += 1 +``` + +Now that everything is almost done, we do some final touchup and this is a completely optional part anyway. Since the visualisation is focused on Liverpool players, I add Liverpool's logo and also add my watermark. Also, crediting the data source/provider is more of an ethical habit, so we go ahead and do that as well before displaying the plot. + +```python +liv = Image.open('Liverpool.png', 'r') +liv = liv.resize((80, 80)) +liv = np.array(liv).astype(np.float) / 255 +fig.figimage(liv, 30, 890) +fig.text(0.11, 0.08, "viz: Rithwik Rajendran/@rithwikrajendra", color="lightgrey", size=14, fontname="Lora") +fig.text(0.8, 0.08, "data: FBRef/Statsbomb", color="lightgrey", size=14, fontname="Lora") +plt.show() +``` + +So, we have the plot below. You can customise the design as you want in the `draw_battery()` function - change size, colours, shapes etc + +![Usage_Chart_Liverpool](Liverpool_Usage_Chart.png) diff --git a/make_logo.py b/make_logo.py index 1ba67a5..1f62aa7 100644 --- a/make_logo.py +++ b/make_logo.py @@ -1,9 +1,8 @@ import numpy as np -import matplotlib as mpl import matplotlib.pyplot as plt import matplotlib.cm as cm import matplotlib.font_manager -from matplotlib.patches import Circle, Rectangle, PathPatch +from matplotlib.patches import Rectangle, PathPatch from matplotlib.textpath import TextPath import matplotlib.transforms as mtrans @@ -131,6 +130,7 @@ def make_logo(height_px, lw_bars, lw_grid, lw_border, rgrid, with_text=False): return fig, ax + make_logo(height_px=110, lw_bars=0.7, lw_grid=0.5, lw_border=1, rgrid=[1, 3, 5, 7], with_text=True) plt.savefig("mpl_logo.png")