Space and Control in Soccer
Space and Control in Soccer
In many team sports, the ability to control and generate space in dangerous areas on
the pitch is crucial for the success of a team. This holds, in particular, for soccer. In
this study, we revisit ideas from Fernandez and Bornn (2018) who introduced interesting
space-related quantities including pitch control (PC) and pitch value. We identify influence
of the player on the pitch with the movements of the player and turn their concepts into
data-driven quantities that give rise to a variety of different applications. Furthermore,
we devise a novel space generation measure to visualize the strategies of the team
and player. We provide empirical evidence for the usefulness of our contribution and
showcase our approach in the context of game analyses.
Keywords: soccer (football), movement model, motion model, pitch control, soccer analytics
1. INTRODUCTION
An important aspect when analyzing soccer games is how much space on the soccer pitch is
controlled by teams and players at any point during a game. While, in general, control is a rather
Edited by:
flexible term in soccer and includes the ability of a player to control the ball or the ability of a team
Matthias Kempe,
University of Groningen, Netherlands
to control possession, we focus on spatial control, that is, control of areas on the pitch. This concept
has been introduced by Taki et al. (1996) who developed the concept of a dominant region of a player
Reviewed by:
that defines the area on the pitch that is controlled by that player. That is, a player is expected to
Robert Rein,
German Sport University Cologne, reach any point in her dominant region before any other player. These regions are derived from the
Germany so-called motion or movement models that are able to predict whether a player can reach a certain
Arnold Baca, point on the pitch in a given time.
University of Vienna, Austria Dominant regions have the advantage that they can be visualized by partitioning a soccer
*Correspondence: pitch into areas around players that they have control over and can thereby be easily interpreted.
Ulf Brefeld Interpretability is a key factor to empower non-technical staff, such as coaches or game analysts, to
[email protected] understand data-driven results and turn them into actionable insights. Therefore, dominant regions
have been frequently used as the basis for research questions on higher-levels, such as the evaluation
Specialty section: of passes or spatial pressure (Taki and Hasegawa, 2000; Gudmundsson and Wolle, 2014; Ueda et al.,
This article was submitted to 2014; Horton et al., 2017; Brefeld et al., 2019). In this line of work, Fernandez and Bornn (2018)
Sports Science, Technology and
understand control on the pitch as a continuous spatial quantity. That is, instead of assigning every
Engineering,
a section of the journal
point on the pitch to exactly one team, they compute a value that measures how much control
Frontiers in Sports and Active Living a team has over a position. Their concepts are intuitive and interpretable but suffer from a too
coarse player influence model. Our first contribution is to remedy this limitation by incorporating
Received: 04 March 2021
Accepted: 26 May 2021
data-driven movement models as the underlying motion model (Brefeld et al., 2019). Secondly, we
Published: 16 July 2021 provide empirical results showing that the data-driven approach leads to realistic measurements of
Citation:
space. Thirdly, we propose new metrics for passers and pass receivers on the basis of data-driven
Martens F, Dick U and Brefeld U quantification of space.
(2021) Space and Control in Soccer. Empirical results are computed on positional data from 54 Bundesliga games from season
Front. Sports Act. Living 3:676179. 2017/18. We show that identifying the influence with movements of the player leads to high
doi: 10.3389/fspor.2021.676179 correlations with quantifiable outcomes such as shots on target, expected goals, and the market
Frontiers in Sports and Active Living | www.frontiersin.org 1 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
value of players. Finally, we showcase the benefit of the usefulness (2017) estimate the quality of a pass by using a prediction model
of our approach on the example of opponent analysis. that, among other features, uses features based on dominant
The remainder is structured as follows. Section 2 reviews regions to learn a human rating of observed passes. Some of those
related work, and section 3 introduces basic player influence features also use a measure of defensive pressure that, based on
models. Section 4 details our approaches to quantify space, and dominant regions, estimate whether defending players are able to
section 5 presents a novel space generation metric together with put the passing player under enough spatial pressure to influence
empirical findings. Section 6 concludes. the outcome of the pass. A similar concept was used in Taki
and Hasegawa (2000) who also measure spatial pressure based
on dominant regions. Ueda et al. (2014) analyze defensive and
2. RELATED WORK offensive positioning depending on the location where the ball
was acquired. For pitch control (PC) introduced by Fernandez
Dominant regions are studied in many publications. A general and Bornn (2018), however, such evaluations are missing so far.
definition refers to dominant regions of a player as the region Several other approaches that model the movements of
on a pitch which can be reached by this very player before any the player exist, however. Recently, models that make use of
other one (Gudmundsson and Horton, 2017). Taki et al. (1996) reinforcement learning and deep learning techniques led to
first introduced this concept based on a simple motion model impressive results such as the study of Le et al. (2017) on deep
that incorporates the acceleration and direction of a player. imitation learning, who show that the movements of the player
Their approach constitutes a significant improvement to simple can be predicted over time periods up to several seconds. Dick
Voronoi region models (Taki et al., 1996; Taki and Hasegawa, and Brefeld (2019) use reinforcement learning in combination
2000), which simply credit space to the closest available player, with deep convolution networks to predict the dangerousness
ignoring running direction, or speed. Further improvements to of an offensive situation. Their model is purely data-driven and
this basic model are presented by Fujimura and Sugihara (2005) works without any expert or prior knowledge. The drawback of
who include a resistive force to bound the, otherwise infinite, such methods, however, is their lack of interpretability that makes
acceleration as in Taki et al. (1996). By contrast, Brefeld et al. it hard for experts to take actions on these insights. This is an
(2019) introduce a purely data-driven probabilistic movement issue that, for example, Mortensen and Bornn (2019) attempt to
model using sampled trajectories of each individual player. The tackle by modeling the movements of the player in basketball
model can be used to derive densities of locations of player and with Markov transitions as Poisson point processes.
convex hulls for all reachable points on the pitch for a predefined Other studies are based on similar ideas. For basketball,
time window that again can be translated to dominant regions. Franks et al. (2015) take a similar approach to rate shots based on
The previously mentioned approaches treat control as a binary spatiotemporal features of defending players. Link et al. (2016)
variable such that every location is either controlled by one or also include distances of the players to the goal to quantify
the other team. Fernandez and Bornn (2018) also rate controlled the dangerousness of offensive actions, and Hobbs et al. (2018)
areas on the field but propose a continuous measure of control use the notion of defensive disruption as a measure of how
that is based on the influence of each player on a given point far defenders deviate from their preferred positions in similar
on the pitch at a given time. They use a general Gaussian situations and compute transition values for the offensive team.
influence model, in which the covariance matrix of each bi-
variate Gaussian is defined by the velocity vector of a player’s and
her distance to the ball. Further, the authors value space on the 3. INFLUENCE OF PLAYER ON THE PITCH
pitch itself. Clearly, occupied zones that are close to the goal of the
opponent are of higher value than open and unoccupied space in 3.1. Data
the center of the pitch (Link et al., 2016). The authors rate areas The data that are used in this study are provided by a European
that are usually controlled by defensive players given a certain top-flight soccer league. The data include 54 Bundesliga games
location of the ball. They use this concept to measure how well from season 2017/18. The data stems from two main sources:
players are able to occupy and gain space during a game. In fact, (i) tracking the player and ball position and (ii) event data.
they empirically show, albeit using only data from a single game, The former is automatically captured from video footage at 25
that top players such as Lionel Messi or Andres Iniesta are able to frames per second by the data provider. At each frame, the (x, y)
actively occupy higher valued space than others (Fernandez and coordinates of all 22 players plus the ball are listed. The event
Bornn, 2018). However, the analysis does not involve movement data consist of manually recorded in-game events such as passes,
models or movement characteristics of an individual player; shots, and tacklings etc. Such events are collected by human
individual differences such as maximum speed, acceleration, and observers who tag each event and enrich them with additional,
agility are ignored. Similarly to the approaches mentioned above, event-specific information such as passing player, pass receiver,
the proposed model is not quantitatively evaluated. and shot success. For both data sources, (x, y) coordinates relate
Dominant regions are used to analyze different aspects of to a pitch size of 105×68 m. The center of the pitch is always at the
soccer. Some studies use dominant regions to evaluate passes. origin (0, 0), and positions are scaled to a [−52.5, 52.5] range on
Taki and Hasegawa (2000) and Nakanishi et al. (2010) estimate the x-axis and to a [−34, 34] range on the y-axis. The timestamps
the success of a pass along a straight line by measuring whether of the two data sets need to be aligned so that instances from both
it ends in the dominant region of the receiver. Horton et al. sources can be processed together.
Frontiers in Sports and Active Living | www.frontiersin.org 2 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
FIGURE 1 | Player influence models according to Fernandez and Bornn (2018). The ball is visualized in green. (Left) Player standing with the ball. (Right) Player
moving away from the ball.
3.2. Gaussian Influence Models with θ = atan2(yti − ytiδ , xti − xtiδ ). Finally, the scaling matrix Vit
An analysis of space and control requires a model of a of the determines the area of the distribution by
influence of a player on the current situation of the game, that is,
2 !
the spatial and temporal configuration on the pitch. Fernandez vit
rti + rti vmax
and Bornn (2018) model the influence of a player by a bivariate
0
normal distribution to quantify the amount of control at a Vit = 2
2 !
vit
position p ∈ R2 for a player i at position pit and time t, rti − rti
vmax
0 2
1 1
fti (p) = q exp − (p − µit )T (6 it )−1 (p − µit ) . where radius ri depends on the Euclidean distance between a
(2π)2 |6 i | 2
t player pit and the ball pbt . By referring to expert knowledge, the
authors restrict ri to be in a range of [4, 10] meters. This function1
is shown in Figure 2. The quantity vmax is the maximum speed of
The mean µit of fti is given by the position of the player and his all the players. We refer to Fernandez and Bornn (2018) for more
velocity vector vit using details on r and vmax . Figure 1 shows two exemplary situations
to illustrate how velocity and distance to the ball affect the shape
1 i of the Gaussian. Note that this approach ignores movement
µit = pit + ·v
2 t capabilities of an individual, e.g., agility and acceleration.
Frontiers in Sports and Active Living | www.frontiersin.org 3 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
([7, 14)), running ([14, 20)), and sprinting (≥ 20). Every triplet
in the same bin is then summarized by a non-parametric
kernel density estimation (KDE)2 with Gaussian kernel as it
seems to be a good fit for the resulting endpoint distributions.
The bandwidths of the kernels are optimized using Bayesian
optimization (Brochu et al., 2010; Snoek et al., 2012; Srinivas
et al., 2012). We denote the resulting probability density by
Pi1 (p|pitδ , pit , vti ). The measure Pi1 computes the probability
density that player i can reach position p in time 1 from position
pitδ with initial velocity vti .
Figure 3 shows an example: Trajectories of players are
projected into a new coordinate system such that every trajectory
starts in the origin with an initial movement along the x-axis.
The endpoints of the trajectories are then stored for the actual
initial velocity and time window. Depending on the application,
FIGURE 2 | A function that maps the distance d to the influence radius r i .
the point distribution can be either used directly or approximated
by its convex hull. We refer to Brefeld et al. (2019) for details on
the computation of data-driven movement models.
Frontiers in Sports and Active Living | www.frontiersin.org 4 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
FIGURE 3 | Data-driven movement models. (Left) The initial position pt is mapped to the origin such that the initial direction of movement vt follows the x-axis.
Position pt1 marks the end of the trajectory. Position ptδ is required to estimate initial velocity. (Center) Resulting point cloud. (Right) Smoothed movement model
using density estimation.
predictive accuracies, we suggest to learn more sophisticated (possibly non-linear) 5 The influence of the player also depends on the position of the ball pb . For
functions to estimate the time horizons using additional features like actual notational simplicity in the notation, this is omitted and is implicitly included by
player/ball positions and/or velocity vectors. the time t and the positions of all actors at that time.
Frontiers in Sports and Active Living | www.frontiersin.org 5 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
FIGURE 5 | Pitch control (PC) for the proposed approach (Left) and baseline (Right).
time t whereas PCt (p) = 1 means that the offensive team controls
that area.
Figure 5 compares the resulting PC with the original
formulation in Fernandez and Bornn (2018) (baseline) on an
exemplary situation. The red team plays from left to right. The
ball is currently at the green cross and being passed to the purple
cross where the red striker scores with a volley shot. The figure
reveals the main differences between both the approaches. The
influence areas of the baseline are much larger and cover a great
deal of the pitch. By contrast, influence areas computed with
the data-driven movement model are much smaller, especially
when a player is close to the ball. Note that the location of the
purple cross has a PC value of –0.21 for the baseline, while the
proposed approach clearly reflects the known outcome of this
scoring possession by a PC value of 0.57. Since the red player
is already in possession of the ball and moreover able to pass it FIGURE 6 | PC for final passes before a shot was made.
on to the striker, the data-driven model delivers a more realistic
interpretation of control on the pitch.
To confirm this impression, we aim to conduct an experiment
on all 289 successful ball possession phases in the data. concepts from Fernandez and Bornn (2018) to compute the
Throughout this analysis, we define a possession to be successful value of a position. The underlying idea is that defensive players
if it ends with a shot at the goal. We focus on sequences with intuitively cover highly valuable space. Obviously, defensive
at least three successful passes because the vast majority of players do not position themselves perfectly in every situation,
possessions with fewer passes are rather chaotic and, e.g., consist e.g., to prevent through-passes. But we argue that such individual
of a series of headers after a goal-kick. Analog to the example mistakes are exceptions and that defenders usually cover the
pd
above, we collected the pitch control PCt for the attacking team important areas on the pitch in similar situations, hence,
at the pass destination pd and at the time t the final pass was with a sufficiently high number of training situations that a
made before the attacker shots at the goal. Figure 6 compares the model should be able to generalize well and predict the high
results of both models. For our data-driven approach, in nearly valued space.
75% of the cases the attacking team has a positive PC before We thus aim to learn influence areas for a defending team
the pass receiver is able to take a shot. This follows the intuition from historic data given the ball position at that time pbt . This
that the attacking team must have created some space to realize will be referred to as defensive influence (DI). The observed DI
the shot at the target. Using the baseline model, however, the on point pj is the sum of influences of all players in the defensive
observed PC values do not allow for an informed guess on the team A at time t,
known outcome of these situations.
Frontiers in Sports and Active Living | www.frontiersin.org 6 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
FIGURE 7 | Pitch value using the learned function fnθ . The ball is located at the green cross, and the attacking team plays from left to right. (Left) data-driven.
(Right) baseline.
FIGURE 8 | PC (left column), pitch value (center column), and space quality (right column) for baseline (top row) and proposed approach (bottom row).
Analogous to PC, we define the maximum amount of DI to be Fernandez and Bornn (2018), we propose to learn fnθ with a
one. Using this definition, the pitch value is defined as feed-forward neural network (FNN) by minimizing the mean
squared error,
−−→
j pg ||
pj || p 2 pj
X pj 2
PVt (pb ) = 1 − −−→ · DIt (pbt ). (7) min DIt (pbt ) − fnθ (pbt , pj ) .
θ
||pc pg ||2 t,pj
Here, pc denotes the point at the opposite corner such The training data contain game situations from 54 Bundesliga
that the denominator marks the longest possible distance to games (about 34 million observations) where goalkeepers are
the goal. Hence, pitch value equals the defensive influence ignored. To render this training task computationally feasible, we
scaled by the distance to the goal that is in the range choose points as features that lie on an equally spaced 21 × 16
[0, 1] following the idea that points on the pitch are grid G such that pj ∈ G . This results in a (|T | × |G |) × 4
generally more valuable the closer they are to the goal of feature matrix X where each row contains the (x, z) coordinates
the opponent. of one pj ∈ G and the ball position pbt at time t for all available
pj timestamps T in the data set. Dropout (Srivastava et al., 2014) is
Though DIt can be extracted from historic games, we need a applied to all hidden layers to prevent over-fitting, and all hyper-
pj
function fnθ (pbt , pj ) that approximates DIt well and that can be parameters (# layers, # units per layer, dropout rate, and learning
applied to new and unseen situations for generality. Following rate batch size) of the network are optimized with Bayesian
Frontiers in Sports and Active Living | www.frontiersin.org 7 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
of the two models around the left winger. The baseline credits
much space in her back to her team due to an excessively large
influence of the player. However, given the velocity vector of a
player in this situation the player can hardly control the areas
behind her, especially given the rather short distance to the
passer. Moreover, the baseline estimates the area directly in front
of her as a neutral zone (white). With the data-driven model,
however, that particular area turns dark red as one would expect
in this situation. Also note that the defensive team is pretty
disorganized; they are not doing well in covering the important
space because their defensive line and especially their right back
moved up too far that allows the aforementioned left attacker to
run into the exposed region.
5. SPACE GENERATION
FIGURE 9 | The resulting area under the curve (AUC) values.
While the previous section suggests that identifying the impact
of the player with individual movement models actually makes
sense, we now turn toward establishing an empirical basis for
optimization. In our experiments, the network with the best this insight. Since the devised quantities are difficult to evaluate
performance had two hidden layers and 64 units in each layer. quantitatively, we resort to proxies and study space generation
The optimization was carried out using the Adam optimization and measurable outcomes of ball possession phases.
algorithm (Kingma and Ba, 2015). To connect to the previous section, we first test the hypothesis
Figure 7 shows an example of the data-driven approach and that passes into areas of high space quality are more likely to
the baseline (Fernandez and Bornn, 2018). The ball is on the result in a positive outcome than passes into zones with small
left wing just outside the box visualized by the green cross. The space quality. We follow a simple setup: For each pass in the
attacking team plays from left to right. In the data-driven model, event log, the resulting space quality is computed at equidistant
the last defending line forms up right behind the center-line with points pj ∈ F lying on a 50 cm-spaced grid over the pitch6 .
the intention to use the offside rule to limit the space in which We use only two predictors: (i) the average space quality at the
the attacking team can operate. The right defender covers space location of the passer po and (ii) the average space quality at the
slightly deeper than his peers on the left side. This is a useful position of the pass destination pd for every possession. To take
tactic to prevent straight and long passes in the back of the last the distance between a position (grid cell) and the pass origin and
defending line. It also discloses the habit of defending teams to destination, respectively, as well as some smaller inaccuracies in
prevent crosses from one side to the other. Strikers and offensive the pass event data into account, we weigh space quality with
midfielders position themselves in a way that their opponent is exponential decaying factors λo and λd , so that positions far
forced to play the long passes mostly along the sideline. The away from the pass origin and destination, respectively, do not
influence area reaches far out to the left penalty box to isolate impact the results. The magnitude of the exponential decay is
the ball-possessing player on his side. Such insights are hidden controlled by parameters that are found by model selection. So,
in the results of the baseline that considers about half of the the features for the kth possession with np pass timestamps Tk are
pitch important. defined as:
4.4. Space Quality 1 X X pj −−→
As shown in the previous sections, pitch control measures the xok = SQt · exp(−λo · ||pj po ||2 ) (9)
np j
t∈Tk p ∈F
amount of dominance that a player or team has on a certain
location. Pitch value, by contrast, relates to the value that a 1 X X pj
−−→
xdk = SQt · exp(−λd · ||pj pd ||2 ) (10)
location has at that very moment. Space quality (SQ) for the jth np j
t∈Tk p ∈F
location at time t is now simply defined as the product of pitch
control and pitch value (Fernandez and Bornn, 2018),
We use 5,277 ball possession phases in 54 Bundesliga matches
pj pj pj containing 31,824 passes where episodes with fewer than three
SQt = PCt · PVt (pbt ). (8) passes are discarded. In sum, 5.5% of the remaining data
constitute successful ball possession phases that end with a shot at
Figure 8 shows all three parts of the equation for the same goal. These form the positive class. We use a linear support vector
situation. The red team stages an attack that, later on, ends machine (SVM) to learn a model that predicts whether an attack
up in a shot at goal. The player with the ball (green cross) is successful or not, based on the two input features.
plays a deep forward pass to the red player on the left wing.
The pass receiver generates pitch control in a highly valuable 6 The proposed grid size trades-off accuracy and computation time. Other values
area that results in space of high quality. Note the differences are certainly possible.
Frontiers in Sports and Active Living | www.frontiersin.org 8 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
FIGURE 10 | Correlation between SGrec and non-penalty xG per 90 min (Left) and between SGpas and xA per 90 min (Right).
FIGURE 11 | Comparison of average space quality of the player at the time of all passes (left column) and at the time each player is the actual pass recipient (right
column). Playing direction is from left to right. The SG values in the color bar relate to the average space quality a player generates at a certain field position during all
considered passing events.
For each experiment, we randomly choose 80% of the data thus focusing on space quality lead to a much better predictive
for training and 20% for model evaluation using area under the accuracy for the data-driven approach.
curve (AUC). For every combination of parameter and model, we For the data-driven approach, the classifiers perform even
repeat the experiment 1,000 times. To analyze the effect of adding better: A very fine-grained focus on the pass destination increases
pitch value to the space quality equation, we repeat the same the ability to predict the outcome of the ball possession.
setup but replace space quality with pitch control in Equations Translated to the situation in Figure 5, an area in a radius of
(9) and (10). The results are shown in Figure 9. Using only pitch 1.5 m around the shot position is considered as sufficient for
control does not lead to significant differences between data- the classifier. This area is largely controlled by the red team.
driven and baseline approaches. However, adding pitch value and The detailed focus on a small area around the pass destination
Frontiers in Sports and Active Living | www.frontiersin.org 9 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
is possible because the data-driven model is able to approximate The following analysis is based on data from six teams playing
pitch control more accurately than the baseline does. This is against each other leading to a subset of 30 games with a total
pd of 16,631 passes. Space generation is again computed on a 50 cm-
reflected by pitch control values at the shot location (PCt = 0.57
pd equidistant grid on the pitch. In our analysis, we only consider the
for the data-driven model vs. PCt = −0.21 for the baseline 98 players who were involved in at least 30 passes (either as passer
model) and in Figure 6. In fact, for the baseline the classification or pass receiver) during these games for a robust comparison.
results show a very different behavior: the smaller the considered In the remainder, SGrec denotes the amount of average space
space, the worse the performance. Overall, the classifiers based created by a pass receiver and SGpas credits this amount to
on the data-driven model significantly outperform the ones that the passing player. SGrec thus corresponds to a player creating
ground on features from the baseline model. Often, large average space for herself by positioning in areas where she can get the
space quality values in ball possession phases are caused by only ball. Similarly, SGpas describes the ability of a passer to identify
a few high-quality passes. valuable spaces and to pass the ball into valuable areas that were
Unsurprisingly, these experiments show that it is beneficial generated by her teammates. SGtotal simply defines the sum of
for a soccer team to create valuable space during a possession both measurements.
through passing in order to get in promising situations to score We focus on possible relationships between our space
a goal. Our analysis confirms that this can actually be measured generation metrics and existing player metrics and valuations.
with the proposed approach. Our approach turns out accurate Prominent concepts are the expected goal (xG) and expected
and allows to derive meaningful metrics for individual players. assists (xA) metrics that measure the probability that whether
a shot will result in a goal and credit this likelihood either to
5.1. Measuring the Generation of Space the shooter (xG) or the pass giver (xA), respectively. Although
We now leverage space quality to off-ball movements and space
implementations differ in details, the basic idea is to compare
generation. A simple way to measure the off-ball movement is to
i,p shots with similar characteristics (e.g., shot position and body
compute space quality SQt for player i at time t and location p part the attacker made the shot with) and calculate how many
and subtract the space quality of all other players j ∈ P \ {i} at of these shots actually resulted in a goal (Lucey et al., 2015; Le
that point and time, et al., 2017; Rathke, 2017). Besides its popularity, we choose these
X X n
i,p j,p
o measures because, compared to the actual number of goals, it
SGit = max (SQt − SQt ), 0 . (11) leaves aside factors such as luck and rather aims at the ability of
p∈F j∈P \{i}
the players to bring herself into situations to score7 . From that
Hence, the resulting space generation is the sum of individual point of view, xG and SGrec pursue similar goals as the latter
space quality over an equally spaced grid F , i.e., the amount values the ability of a player to bring herself into a position
of control that this player actually has on certain areas on the to receive passes in high-quality areas that ultimately (for the
pitch weighted by the pitch value. Note that this measurement final pass in a possession) results in a position to shoot at
approach differs from the space generation gain concept in the goal.
(Fernandez and Bornn, 2018), which quantifies the space that an Figure 10 (left) clearly shows a significant positive correlation
attacker frees up by dragging the opponents into his direction. between both metrics [Pearson’s r = 0.66 with p-value = 8.85e −
To compute the rating of an individual player for space
generation, SGit is evaluated for all timestamps at which an 7 Comparing to traditional measures like the number of shots leads to similar
offensive player controls the ball and attempts to make a pass. outcomes with slightly lower correlations since data are more noisy.
Frontiers in Sports and Active Living | www.frontiersin.org 10 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
14 and CI = (0.54, 0.76)]. For a more meaningful comparison, understand where these passes come from and, optimally, from
the xG value is standardized per 90 min and penalty kicks which locations and/or player. Figure 12 (left) shows aggregates
are excluded8 . Note that the result is almost unaffected by the of all passes to player 6 in that game, summarized by his
three players with xG > 0.6 and the one with SGrec > 3 teammates. Displayed are also the number of passes by arrow
[r = 0.66, p-value = 7.49e − 13, CI = (0.52, 0.76)]. These width and the average SGrec values by color. The color legend
four players are strikers with very high SGrec values, so all of ranges from light blue (low SGrec ) to dark red (high SGrec ). The
them are able to create high valued space. In addition, the three figure clearly singles out player 13 as the teammate who creates
players with a superior xG > 0.6 are exceptionally good in space of high value by his passes to player 6. Although the overall
converting shots into goals. For the player with SGrec = 3.52, SGpas metric for player 13 is only average, his passes to player 6
the story is quite different. Despite the outstanding ability to are exceptional.
create high valued space, this player is often unable to convert Figure 12 (right) zooms in on this particular connection
these situations. between the two player. All ten passes from player 13 to 6 are
Figure 10 (right) shows the results for SGpas and xA. Although shown by arrows where the color is drawn from the legend before,
their relation is not as strong as in the previous comparison, especially two long passes along the sideline result in very high SG
their correlation is still positive and significant [Pearson’s r = values. Also, the third long ball generates space above average.
0.21, p-value = 0.03, CI = (0.02, 0.4)]. This confirms our Based on this brief analysis, long passes from 13 to 6 must be
initial intuition that both concepts describe similar aspects of prevented by the opposing team to decrease the dangerousness
the game. Space generation metrics are not limited to shot or of striker 6. Particularly when both players are acting on the right
scoring events but allow also for useful insights on preceding side of the pitch, the other team needs to prevent long balls along
actions in ball possessions and game analyses, as we will see in the sline.
section 5.2. This becomes clear, in particular, for the SGpas and Using the proposed concepts, analyses like this one
xA comparison. On one hand, xA only accounts for the direct could be automated and computed automatically before
pass before a shot even though the more important pass might a game. By doing so, dangerous opponent players can
have been the one to initiate the attack. As mentioned above, the be easily identified and, together with video footage,
receiver metric SGrec does not give any insights on how well the dangerous episodes shown to the team. The system
controlled space is used, i.e., the decision-making or the cognitive also proposes a way to decrease the dangerousness of
and physical skills after receiving the ball. On the other hand, these players by preventing the right passes, and also,
xG neglects the amount of defensive pressure; hence, shots can these could be automatically retrieved from videos for a
have a high value even though the attacker is well covered by team briefing.
the defenders.
Frontiers in Sports and Active Living | www.frontiersin.org 11 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
REFERENCES Le, H. M., Carr, P., Yue, Y., and Lucey, P. (2017). “Data-driven ghosting using deep
imitation learning,” in Proceedings of the Sports Analytics Conference (Boston,
Brefeld, U., Lasek, J., and Mair, S. (2019). Probabilistic movement models and MA), 15.
zones of control. Mach. Learn. 108, 127–147. doi: 10.1007/s10994-018-5725-1 Link, D., Lang, S., and Seidenschwarz, P. (2016). Real time quantification of
Brefeld, U., Lasek, J., and Mair, S. (2020). “Analyzing positional data,” inScience dangerousity in football using spatiotemporal tracking data. PLOS ONE
Meets Sports – When Statistics Are More Than Numbers, eds C. Ley and Y. 11:e0168768. doi: 10.1371/journal.pone.0168768
Dominicy (Cambridge Scholars Publishing), 81–94. Lucey, P., Bialkowski, A., Monfort, M., Carr, P., Matthews, I., and Research, D.
Brochu, E., Cora, V. M., and de Freitas, N. (2010). A tutorial on Bayesian (2015). “Quality vs Quantity”: Improved Shot Prediction in Soccer using,” in
optimization of expensive cost functions, with application to active Proceedings of the MIT Sloan Sports Analytics Conference (Boston, MA), 9.
user modeling and hierarchical reinforcement learning. CoRR, abs/1012. Mortensen, J., and Bornn, L. (2019). “From Markov models to Poisson point
2599. processes: modeling movement in the NBA,” in Proceedings of the MIT Sloan
Bryson, A., Frick, B., and Simmons, R. (2013). The returns to scarce talent: Sports Analytics Conference 2015, 10.
footedness and player remuneration in European soccer. J. Sports Econ. 14, Nakanishi, R., Maeno, J., Murakami, K., and Naruse, T. (2010). “An approximate
606–628. doi: 10.1177/1527002511435118 computation of the dominant region diagram for the real-time analysis of
Comaniciu, D., and Meer, P. (2002). Mean shift: a robust approach toward group behaviors,” in RoboCup 2009: Robot Soccer World Cup XIII, Lecture
feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619. Notes in Computer Science, eds J. Baltes, M. G. Lagoudakis, T. Naruse, and
doi: 10.1109/34.1000236 S. S. Ghidary (Berlin; Heidelberg: Springer ), 228–239.
Dick, U., and Brefeld, U. (2019). Learning to rate player positioning in soccer. Big Rathke, A. (2017). An examination of expected goals and shot efficiency in soccer.
Data 7, 71–82. doi: 10.1089/big.2018.0054 J. Hum. Sport Exerc. 12. doi: 10.14198/jhse.2017.12.Proc2.05
Fernandez, J., and Bornn, L. (2018). “Wide Open Spaces: a statistical technique Snoek, J., Larochelle, H., and Adams, R. P. (2012). “Practical Bayesian optimization
for measuring space creation in professional soccer,” In Proceedings of the MIT of machine learning algorithms,” in Proceedings of the 25th International
Sloan Sports Analytics Conference (Boston, MA). Conference on Neural Information Processing Systems (Lake Tahoe).
Franck, E., and Nüesch, S. (2012). Talent and/or popularity: what does it take Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. (2012). Gaussian process
to be a superstar? Econ. Inquiry 50, 202–216. doi: 10.1111/j.1465-7295.2010. optimization in the bandit setting: no regret and experimental design. IEEE
00360.x Trans. Inform. Theor. 58, 3250–3265. doi: 10.1109/TIT.2011.2182033
Franks, A., Miller, A., Borrn, L., and Goldsberry, K. (2015). “Counterpoints: Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.
advanced defensive metrics for NBA Basketball,” in Proceedings of the MIT (2014). Dropout: a simple way to prevent neural networks from overfitting. J.
Sloan Sporty Analytics Conference (Boston, MA). Mach. Learn. Res. 15, 1929–1958. doi: 10.5555/2627435.2670313
Fujimura, A., and Sugihara, K. (2005). Geometric analysis and quantitative Taki, T., and Hasegawa, J.-I. (2000). “Visualization of dominant region in team
evaluation of sport teamwork. Syst. Comput. Jpn 36, 49–58. games and its application to teamwork analysis,” in Proceedings of the IEEE
doi: 10.1002/scj.20254 International Conference on Computer Graphics (Washington, DC).
Gerhards, J., Mutz, M., and Wagner, G. G. (2014). Die berechnung des Siegers: Taki, T., Hasegawa, J.-i., and Fukumura, T. (1996). “Development of motion
Marktwert, Ungleichheit, Diversität und Routine als Einflussfaktoren auf die analysis system for quantitative evaluation of teamwork in soccer games,”
Leistung professioneller Fußballteams / Predictable Winners. Market Value, in Proceedings of 3rd IEEE International Conference on Image Processing
Inequality, Diversity, and Routine as Predictors of Success in European Soccer (Lausanne).
Leagues. Z. Soziol. 43, 231–250. doi: 10.1515/zfsoz-2014-0305 Ueda, F., Masaaki, H., and Hiroyuki, H. (2014). The causal relationship between
Gudmundsson, J., and Horton, M. (2017). Spatio-temporal analysis of team sports dominant region and offense- defense performance - focusing on the time of
– A survey. ACM Comput. Surv. 50, 1–34. doi: 10.1145/3054132 ball acquisition. Football Sci. 11, 1–17.
Gudmundsson, J., and Wolle, T. (2014). Football analysis using Yeo, I.-K., and Johnson, R. A. (2000). A new family of power transformations
spatio-temporal tools. Comput. Environ. Urban Syst. 47, 16–27. to improve normality or symmetry. Biometrika 87, 954–959.
doi: 10.1016/j.compenvurbsys.2013.09.004 doi: 10.1093/biomet/87.4.954
Hobbs, J., Power, P., Sha, L., Ruiz, H., and Lucey, P. (2018). “Quantifying the value
of transitions in soccer via spatiotemporal trajectory clustering,” in Proceedings Conflict of Interest: The authors declare that the research was conducted in the
of the MIT Sloan Sports Analytics Conference (Boston, MA), 11. absence of any commercial or financial relationships that could be construed as a
Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: biased potential conflict of interest.
estimation for nonorthogonal problems. Technometrics 12, 55–67.
doi: 10.1080/00401706.1970.10488634 Copyright © 2021 Martens, Dick and Brefeld. This is an open-access article
Horton, M., Gudmundsson, J., Chawla, S., and Estephan, J. (2017). Classification distributed under the terms of the Creative Commons Attribution License (CC BY).
of passes in football matches using spatiotemporal data. ACM Trans. Spatial The use, distribution or reproduction in other forums is permitted, provided the
Algorithms Syst. 3, 1–30. doi: 10.1145/3105576 original author(s) and the copyright owner(s) are credited and that the original
Kingma, D. P., and Ba, J. (2015). “Adam: a method for stochastic optimization,” publication in this journal is cited, in accordance with accepted academic practice.
in International Conference on Learning Representations (ICLR2015) (San No use, distribution or reproduction is permitted which does not comply with these
Diego, CA). terms.
Frontiers in Sports and Active Living | www.frontiersin.org 12 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer
Frontiers in Sports and Active Living | www.frontiersin.org 13 July 2021 | Volume 3 | Article 676179