Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 970f3e4

Browse files
author
Dariusz
committed
Added new blog post
1 parent 505cd04 commit 970f3e4

File tree

6 files changed

+185
-0
lines changed

6 files changed

+185
-0
lines changed
Loading
Loading
Loading
Loading
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
---
2+
title: "Matplotlib in Data Driven SEO"
3+
date: 2019-12-04T17:23:24+01:00
4+
description: "At Whites Agency we analyze big unstuctured data to increases client's online visibility. We share our story of how we used Matplotlib to present the complicated data in a simple and reader-friendly way."
5+
categories: ["industry"]
6+
draft: false
7+
displayInMenu: false
8+
displayInList: true
9+
author: Whites Agency
10+
11+
resources:
12+
- name: featuredImage
13+
src: "featureImage.png"
14+
params:
15+
showOnTop: false
16+
---
17+
![Other visualization projects at Whites Agency.](fig4.jpg)
18+
19+
[Whites Agency](https://whites.agency/) is a digital marketing partner with a strong focus on Data-driven SEO. What makes us unique is our own technology & a team of self-driven specialists. Our data scientists are passionate about using AI solutions that maximize SEO success for the clients. We run many Big Data analyses which enables us to find the most accurate optimization opportunities leading to higher positions in Google. Some of the researches we've carried out so far concentrated on Google ranking factors in [eCommerce](https://whites.agency/blog/seo-analysis-in-ecommerce-with-the-use-of-big-data/), [football category](https://whites.agency/blog/a-70-increase-in-conversion-rate-within-a-month-due-to-data-driven-seo-a-case-study-of-the-football-website-sporticos/) and [clothing market](https://whites.agency/blog/what-everyone-in-the-fashion-industry-should-know-about-seo-in-new-zealand-big-data-analysis/). We also perform comparative analysis of large scale [website speed performance](https://whites.agency/blog/google-lighthouse-study-seo-ranking-factors-in-ecommerce-vs-news/).
20+
21+
[White Dog Technology](https://whites.agency/our-technology/) is our response to frequent changes in Google's algorithm and the development of AI solutions. Our work got recognized and therefore we are a beneficient of research and industrial grant that allows us to develop new functions of our technology. Moreover, we are the authors of White Crow Technology - an automated tool for researching and analyzing Google's paid campaigns. Our data-driven SEO based on real data rather than old-fashioned practices ensures an optimal decision-making process for the clients we work with.
22+
23+
# Data Visualization in Python using matplotlib
24+
Majority of cases we are dealing with right now focus on data harvesting and analysis. In blogging, data presentation plays an important part and since the beginning, we needed a tool that would allow us to experiment with different forms of visualizations. Because our organization is Python driven, matplotlib was a straightforward choice for us. It is a mature project that offers flexibility and control. Among other features, matplotlib figures can be easily exported not only to raster graphic formats (png, jpg) but also to vector ones (SVG, PDF, EPS), creating high-quality images that can be embedded in HTML code, LaTeX or utilized by graphic designers. In one of our projects, matplotlib was a part of the Python processing pipeline that automatically generated pdf summaries from an HTML template for individual clients.
25+
26+
![Data Visualization Pipeline at Whites Agency](fig1.png)
27+
28+
We choose two cases that demonstrate how matplotlib is used in our organization. Every data visualization project has the same core presented in the figure above, where data is loaded from the database, processed in pandas or PySpark and finally visualized with matplotlib. In each case we set up a global style, which is the basis for all figures (overwritten if necessary):
29+
```
30+
import matplotlib.pyplot as plt
31+
from cycler import cycler
32+
33+
colors = ['#00b2b8', '#fa5e00', '#404040', '#78A3B3', '#008F8F', '#ADC9D6']
34+
35+
plt.rc('axes', grid=True, labelcolor='k', linewidth=0.8, edgecolor='#696969',
36+
labelweight='medium', labelsize=18)
37+
plt.rc('axes.spines', left=False, right=False, top=False, bottom=True)
38+
plt.rc('axes.formatter', use_mathtext=True)
39+
40+
plt.rcParams['axes.prop_cycle'] = cycler('color', colors)
41+
42+
plt.rc('grid', alpha=1.0, color='#B2B2B2', linestyle='dotted', linewidth=1.0)
43+
plt.rc('xtick.major', top=False, width=0.8, size=8.0)
44+
plt.rc('ytick', left=False, color='k')
45+
plt.rcParams['xtick.color'] = 'k'
46+
plt.rc('font',family='Montserrat')
47+
plt.rcParams['font.weight'] = 'medium'
48+
plt.rcParams['xtick.labelsize'] = 13
49+
plt.rcParams['ytick.labelsize'] = 13
50+
plt.rcParams['lines.linewidth'] = 2.0
51+
```
52+
## Case 1: Website Speed Performance
53+
Our R&D department analyzed a set of 10,000 potential customer intent phrases from ​​the “Electronics” (eCommerce) and “News” domains (5000 phrases each). Using our own White Dog technology, we scraped data from the Google ranking in a specific location (London, United Kingdom) both for mobile and desktop results.
54+
Based on those data, we distinguished TOP 20 results that appeared in SERPs. Then each page was audited with the [Google Lighthouse tool](https://developers.google.com/web/tools/lighthouse). Google Lighthouse is an open-source, automated tool for improving the quality of web pages. A single sample from our analysis which shows variations of *Time to First Byte* (TTFB) as a function of Google position (grouped in threes) is presented below. TTFB measures the time it takes for a user's browser to receive the first byte of page content. Regardless of the device, TTFB score is the lowest for websites that occurred in TOP 3 positions. The difference is significant, especially between TOP 3 and 4-6 results.
55+
56+
![Time to first byte from Lighthouse study performed at Whites Agency.](fig2.png)
57+
58+
The figure above uses `fill_between` function from matplotlib library to draw colored shade that represents the 40-60th percentile range. A simple line plot with circle markers denotes the median (50th percentile). X-axis labels were assigned manually. The whole style is wrapped into a custom function that allows us to reproduce the whole figure in a single line of code. A sample of our code is presented below:
59+
60+
```
61+
import matplotlib.pyplot as plt
62+
from matplotlib.colors import LinearSegmentedColormap
63+
64+
# --------------------------------------------
65+
# Set double column layout
66+
# --------------------------------------------
67+
fig, axx = plt.subplots(figsize=(20,6), ncols=2)
68+
69+
# --------------------------------------------
70+
# Plot 50th percentile
71+
# --------------------------------------------
72+
line_kws = {
73+
'lw': 4.0,
74+
'marker': 'o',
75+
'ms': 9,
76+
'markerfacecolor': 'w',
77+
'markeredgewidth': 2,
78+
'c': '#00b2b8'
79+
}
80+
81+
# just demonstration
82+
axx[0].plot(x, y, label='Electronics', **line_kws)
83+
84+
# --------------------------------------------
85+
# Plot 40-60th percentile
86+
# --------------------------------------------
87+
# make color lighter
88+
cmap = LinearSegmentedColormap.from_list('whites', ['#FFFFFF', '#00b2b8'])
89+
90+
# just demonstration
91+
axx[0].fill_between(
92+
x, yl, yu,
93+
color=cmap(0.5),
94+
label='_nolegend_'
95+
)
96+
97+
# ---------------------------------------------
98+
# Add x-axis labels
99+
# ---------------------------------------------
100+
# done automatically
101+
xtick_labels = ['1-3','4-6','7-9','10-12','13-15','16-18','19-20']
102+
for ax in axx:
103+
ax.set_xticklabels(xtick_labels)
104+
105+
# ----------------------------------------------
106+
# Export figure
107+
# ----------------------------------------------
108+
fig.savefig("lighthouse.png", bbox_inches='tight', dpi=250)
109+
```
110+
111+
## Case 2: Google Ads ranking
112+
Our R&D department looked into paid campaigns (Ads) for more than 7600 queries focused around the travel category in Poland [Available only in [Polish](https://agencjawhites.pl/aktualnosci/ponad-1000-graczy-walczy-o-polskiego-turyste-w-wyszukiwarce-google/)] in Google Search. We scraped the first page in Google and analyzed the ads that were present. At the moment of writing this post, each result can have up to 4 ads at the top and up to 3 ads at the bottom. Each ad belongs to some domain and has a headline, description, and optional extensions. Below we present TOP 25 domains with the highest visibility on desktop computers. The Y-axis shows the name of a domain and the X-axis indicates how many times a domain appeared in an ad. We repeated the study 3 times and aggregated the counts. That is why the scale is much larger than 7600. In this project, the type of plot below allows us to summarize different brands' ads campaign strategies and their advertising market shares. For example, *itaka* and *wakacje* have the strongest presence both on mobile and desktop and most of their ads appear at the top. The *neckermann* positions itself are very high, but most of their ads appear at the bottom of search results.
113+
114+
![TOP 25 domains with the highest visibility on desktop computers.](fig3.png)
115+
116+
The figure above is a standard horizontal bar plot that can be reproduced with `barh` function in matplotlib. Each y-tick has 4 different pieces (see legend). We also added automatically generated count numbers at the end of each bar for better readability. The code snippet is shown below:
117+
118+
```
119+
import matplotlib.pyplot as plt
120+
import matplotlib.patches as mpatches
121+
from matplotlib.colors import LinearSegmentedColormap, PowerNorm
122+
123+
# -----------------------------
124+
# Set default colors
125+
# -----------------------------
126+
blues = LinearSegmentedColormap.from_list(name='WhitesBlues', colors=['#FFFFFF', '#00B3B8'], gamma=1.0)
127+
oranges = LinearSegmentedColormap.from_list(name='WhitesOranges', colors=['#FFFFFF', '#FB5E01'], gamma=1.0)
128+
129+
# colors
130+
desktop_top = blues(1.0)
131+
desktop_bottom = oranges(1.0)
132+
mobile_top = blues(0.5)
133+
mobile_bottom = oranges(0.5)
134+
135+
# -----------------------------
136+
# Prepare Figure
137+
# -----------------------------
138+
fig, ax = plt.subplots(figsize=(10,15))
139+
ax.grid(False)
140+
141+
# -----------------------------
142+
# Plot bars
143+
# -----------------------------
144+
# just demonstration
145+
146+
for name in yticklabels:
147+
# tmp_desktop - DataFrame with desktop data
148+
# tmp_mobile - DataFrame with mobile data
149+
150+
ax.barh(cnt, tmp_desktop['top'], color=desktop_top, height=0.9)
151+
ax.barh(cnt, tmp_desktop['bottom'], left=tmp_desktop['top'], color=desktop_bottom, height=0.9)
152+
# text counter
153+
ax.text(tmp_desktop['all']+100, cnt, "%d" % tmp_desktop['all'], horizontalalignment='left',
154+
verticalalignment='center', fontsize=10)
155+
156+
ax.barh(cnt-1, tmp_mobile['top'], color=mobile_top, height=0.9)
157+
ax.barh(cnt-1, tmp_mobile['bottom'], left=tmp_mobile['top'], color=mobile_bottom, height=0.9)
158+
ax.text(tmp_mobile['all']+100, cnt-1, "%d" % tmp_mobile['all'], horizontalalignment='left',
159+
verticalalignment='center', fontsize=10)
160+
161+
162+
yticks.append(cnt)
163+
164+
cnt = cnt - 2.5
165+
166+
# -----------------------------
167+
# set labels
168+
# -----------------------------
169+
ax.set_yticks(yticks)
170+
ax.set_yticklabels(yticklabels)
171+
172+
# -----------------------------
173+
# Add legend manually
174+
# -----------------------------
175+
legend_elements = [
176+
mpatches.Patch(color=desktop_top, label='desktop top'),
177+
mpatches.Patch(color=desktop_bottom, label='desktop bottom'),
178+
mpatches.Patch(color=mobile_top, label='mobile top'),
179+
mpatches.Patch(color=mobile_bottom, label='mobile bottom')
180+
]
181+
182+
ax.legend(handles=legend_elements, fontsize=15)
183+
```
184+
# Summary
185+
The matplotlib library meets our needs in terms of visual capabilities and flexibility. It allows us to create standard plots in a single line of code, as well as experiment with different forms of graphs thanks to its lower level features. Thanks to opportunities offered by matplotlib we may present the complicated data in a simple and reader-friendly way.

0 commit comments

Comments
 (0)