Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views13 pages

Space and Control in Soccer

The document discusses space and control in soccer, introducing data-driven metrics to quantify pitch control and space generation. It revisits previous concepts of pitch control and dominant regions, developing new measures based on player movement models derived from positional data. Correlations between the new metrics and outcomes like shots and player value are shown, demonstrating the usefulness for analyzing games.

Uploaded by

Ismail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

Space and Control in Soccer

The document discusses space and control in soccer, introducing data-driven metrics to quantify pitch control and space generation. It revisits previous concepts of pitch control and dominant regions, developing new measures based on player movement models derived from positional data. Correlations between the new metrics and outcomes like shots and player value are shown, demonstrating the usefulness for analyzing games.

Uploaded by

Ismail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

ORIGINAL RESEARCH

published: 16 July 2021


doi: 10.3389/fspor.2021.676179

Space and Control in Soccer


Florian Martens, Uwe Dick and Ulf Brefeld*
Machine Learning Group, Leuphana University of Lüneburg, Lüneburg, Germany

In many team sports, the ability to control and generate space in dangerous areas on
the pitch is crucial for the success of a team. This holds, in particular, for soccer. In
this study, we revisit ideas from Fernandez and Bornn (2018) who introduced interesting
space-related quantities including pitch control (PC) and pitch value. We identify influence
of the player on the pitch with the movements of the player and turn their concepts into
data-driven quantities that give rise to a variety of different applications. Furthermore,
we devise a novel space generation measure to visualize the strategies of the team
and player. We provide empirical evidence for the usefulness of our contribution and
showcase our approach in the context of game analyses.
Keywords: soccer (football), movement model, motion model, pitch control, soccer analytics

1. INTRODUCTION
An important aspect when analyzing soccer games is how much space on the soccer pitch is
controlled by teams and players at any point during a game. While, in general, control is a rather
Edited by:
flexible term in soccer and includes the ability of a player to control the ball or the ability of a team
Matthias Kempe,
University of Groningen, Netherlands
to control possession, we focus on spatial control, that is, control of areas on the pitch. This concept
has been introduced by Taki et al. (1996) who developed the concept of a dominant region of a player
Reviewed by:
that defines the area on the pitch that is controlled by that player. That is, a player is expected to
Robert Rein,
German Sport University Cologne, reach any point in her dominant region before any other player. These regions are derived from the
Germany so-called motion or movement models that are able to predict whether a player can reach a certain
Arnold Baca, point on the pitch in a given time.
University of Vienna, Austria Dominant regions have the advantage that they can be visualized by partitioning a soccer
*Correspondence: pitch into areas around players that they have control over and can thereby be easily interpreted.
Ulf Brefeld Interpretability is a key factor to empower non-technical staff, such as coaches or game analysts, to
[email protected] understand data-driven results and turn them into actionable insights. Therefore, dominant regions
have been frequently used as the basis for research questions on higher-levels, such as the evaluation
Specialty section: of passes or spatial pressure (Taki and Hasegawa, 2000; Gudmundsson and Wolle, 2014; Ueda et al.,
This article was submitted to 2014; Horton et al., 2017; Brefeld et al., 2019). In this line of work, Fernandez and Bornn (2018)
Sports Science, Technology and
understand control on the pitch as a continuous spatial quantity. That is, instead of assigning every
Engineering,
a section of the journal
point on the pitch to exactly one team, they compute a value that measures how much control
Frontiers in Sports and Active Living a team has over a position. Their concepts are intuitive and interpretable but suffer from a too
coarse player influence model. Our first contribution is to remedy this limitation by incorporating
Received: 04 March 2021
Accepted: 26 May 2021
data-driven movement models as the underlying motion model (Brefeld et al., 2019). Secondly, we
Published: 16 July 2021 provide empirical results showing that the data-driven approach leads to realistic measurements of
Citation:
space. Thirdly, we propose new metrics for passers and pass receivers on the basis of data-driven
Martens F, Dick U and Brefeld U quantification of space.
(2021) Space and Control in Soccer. Empirical results are computed on positional data from 54 Bundesliga games from season
Front. Sports Act. Living 3:676179. 2017/18. We show that identifying the influence with movements of the player leads to high
doi: 10.3389/fspor.2021.676179 correlations with quantifiable outcomes such as shots on target, expected goals, and the market

Frontiers in Sports and Active Living | www.frontiersin.org 1 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

value of players. Finally, we showcase the benefit of the usefulness (2017) estimate the quality of a pass by using a prediction model
of our approach on the example of opponent analysis. that, among other features, uses features based on dominant
The remainder is structured as follows. Section 2 reviews regions to learn a human rating of observed passes. Some of those
related work, and section 3 introduces basic player influence features also use a measure of defensive pressure that, based on
models. Section 4 details our approaches to quantify space, and dominant regions, estimate whether defending players are able to
section 5 presents a novel space generation metric together with put the passing player under enough spatial pressure to influence
empirical findings. Section 6 concludes. the outcome of the pass. A similar concept was used in Taki
and Hasegawa (2000) who also measure spatial pressure based
on dominant regions. Ueda et al. (2014) analyze defensive and
2. RELATED WORK offensive positioning depending on the location where the ball
was acquired. For pitch control (PC) introduced by Fernandez
Dominant regions are studied in many publications. A general and Bornn (2018), however, such evaluations are missing so far.
definition refers to dominant regions of a player as the region Several other approaches that model the movements of
on a pitch which can be reached by this very player before any the player exist, however. Recently, models that make use of
other one (Gudmundsson and Horton, 2017). Taki et al. (1996) reinforcement learning and deep learning techniques led to
first introduced this concept based on a simple motion model impressive results such as the study of Le et al. (2017) on deep
that incorporates the acceleration and direction of a player. imitation learning, who show that the movements of the player
Their approach constitutes a significant improvement to simple can be predicted over time periods up to several seconds. Dick
Voronoi region models (Taki et al., 1996; Taki and Hasegawa, and Brefeld (2019) use reinforcement learning in combination
2000), which simply credit space to the closest available player, with deep convolution networks to predict the dangerousness
ignoring running direction, or speed. Further improvements to of an offensive situation. Their model is purely data-driven and
this basic model are presented by Fujimura and Sugihara (2005) works without any expert or prior knowledge. The drawback of
who include a resistive force to bound the, otherwise infinite, such methods, however, is their lack of interpretability that makes
acceleration as in Taki et al. (1996). By contrast, Brefeld et al. it hard for experts to take actions on these insights. This is an
(2019) introduce a purely data-driven probabilistic movement issue that, for example, Mortensen and Bornn (2019) attempt to
model using sampled trajectories of each individual player. The tackle by modeling the movements of the player in basketball
model can be used to derive densities of locations of player and with Markov transitions as Poisson point processes.
convex hulls for all reachable points on the pitch for a predefined Other studies are based on similar ideas. For basketball,
time window that again can be translated to dominant regions. Franks et al. (2015) take a similar approach to rate shots based on
The previously mentioned approaches treat control as a binary spatiotemporal features of defending players. Link et al. (2016)
variable such that every location is either controlled by one or also include distances of the players to the goal to quantify
the other team. Fernandez and Bornn (2018) also rate controlled the dangerousness of offensive actions, and Hobbs et al. (2018)
areas on the field but propose a continuous measure of control use the notion of defensive disruption as a measure of how
that is based on the influence of each player on a given point far defenders deviate from their preferred positions in similar
on the pitch at a given time. They use a general Gaussian situations and compute transition values for the offensive team.
influence model, in which the covariance matrix of each bi-
variate Gaussian is defined by the velocity vector of a player’s and
her distance to the ball. Further, the authors value space on the 3. INFLUENCE OF PLAYER ON THE PITCH
pitch itself. Clearly, occupied zones that are close to the goal of the
opponent are of higher value than open and unoccupied space in 3.1. Data
the center of the pitch (Link et al., 2016). The authors rate areas The data that are used in this study are provided by a European
that are usually controlled by defensive players given a certain top-flight soccer league. The data include 54 Bundesliga games
location of the ball. They use this concept to measure how well from season 2017/18. The data stems from two main sources:
players are able to occupy and gain space during a game. In fact, (i) tracking the player and ball position and (ii) event data.
they empirically show, albeit using only data from a single game, The former is automatically captured from video footage at 25
that top players such as Lionel Messi or Andres Iniesta are able to frames per second by the data provider. At each frame, the (x, y)
actively occupy higher valued space than others (Fernandez and coordinates of all 22 players plus the ball are listed. The event
Bornn, 2018). However, the analysis does not involve movement data consist of manually recorded in-game events such as passes,
models or movement characteristics of an individual player; shots, and tacklings etc. Such events are collected by human
individual differences such as maximum speed, acceleration, and observers who tag each event and enrich them with additional,
agility are ignored. Similarly to the approaches mentioned above, event-specific information such as passing player, pass receiver,
the proposed model is not quantitatively evaluated. and shot success. For both data sources, (x, y) coordinates relate
Dominant regions are used to analyze different aspects of to a pitch size of 105×68 m. The center of the pitch is always at the
soccer. Some studies use dominant regions to evaluate passes. origin (0, 0), and positions are scaled to a [−52.5, 52.5] range on
Taki and Hasegawa (2000) and Nakanishi et al. (2010) estimate the x-axis and to a [−34, 34] range on the y-axis. The timestamps
the success of a pass along a straight line by measuring whether of the two data sets need to be aligned so that instances from both
it ends in the dominant region of the receiver. Horton et al. sources can be processed together.

Frontiers in Sports and Active Living | www.frontiersin.org 2 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

FIGURE 1 | Player influence models according to Fernandez and Bornn (2018). The ball is visualized in green. (Left) Player standing with the ball. (Right) Player
moving away from the ball.

3.2. Gaussian Influence Models with θ = atan2(yti − ytiδ , xti − xtiδ ). Finally, the scaling matrix Vit
An analysis of space and control requires a model of a of the determines the area of the distribution by
influence of a player on the current situation of the game, that is,
2 !
the spatial and temporal configuration on the pitch. Fernandez vit
  
rti + rti vmax
and Bornn (2018) model the influence of a player by a bivariate
0
 
normal distribution to quantify the amount of control at a Vit =  2
 
2 ! 
vit

position p ∈ R2 for a player i at position pit and time t, rti − rti
 
vmax
0 2
 
1 1
fti (p) = q exp − (p − µit )T (6 it )−1 (p − µit ) . where radius ri depends on the Euclidean distance between a
(2π)2 |6 i | 2
t player pit and the ball pbt . By referring to expert knowledge, the
authors restrict ri to be in a range of [4, 10] meters. This function1
is shown in Figure 2. The quantity vmax is the maximum speed of
The mean µit of fti is given by the position of the player and his all the players. We refer to Fernandez and Bornn (2018) for more
velocity vector vit using details on r and vmax . Figure 1 shows two exemplary situations
to illustrate how velocity and distance to the ball affect the shape
1 i of the Gaussian. Note that this approach ignores movement
µit = pit + ·v
2 t capabilities of an individual, e.g., agility and acceleration.

where vit is defined by


3.3. Data-Driven Movement Models
Influence on the pitch can also be determined directly by possible
movements of players in the near future. One could argue that
vit = pit − pitδ = (xt − xtδ , yt − ytδ ) a player can only influence the area she can actually reach in a
given time window. While many movement models have been
with tδ = t − δ for an arbitrary time difference δ > 0. The proposed by approximating equations from physics, Brefeld et al.
covariance matrix 6 it ∈ R2×2 is a function of the velocity and (2019) present a data-driven movement model by computing
distance of a player to ball, as shown in Figure 1. Its computation frequency statistics from historic games. Their approach leads to
resembles an eigendecomposition and is given by individual player movement models that capture characteristic
traits of the respective player.
The approach grounds on triplets (pitδ , pit , pit1 ) generated by
6 it = Rit Vit Vit (Rit )−1 the i-th player, with tδ = t − δ for a time horizon t1 = t + 1
1 Fernandez and Bornn (2018) only provide a graph without any formula for the
where R is the rotation matrix that twists the bivariate normal
function they used in their study. We reproduced this function by capturing some
counterclockwise according to the direction of the velocity vector coordinates from the plot and transformed these points into a feature matrix that
contains 3-degree polynomial combinations for each data point. This matrix is
  learned using the ridge regression model (Hoerl and Kennard, 1970), and hyper-
cos θ − sin θ
R= parameter selection is based on a leave-one-out cross-validation on the negative
sin θ cos θ mean squared error.

Frontiers in Sports and Active Living | www.frontiersin.org 3 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

([7, 14)), running ([14, 20)), and sprinting (≥ 20). Every triplet
in the same bin is then summarized by a non-parametric
kernel density estimation (KDE)2 with Gaussian kernel as it
seems to be a good fit for the resulting endpoint distributions.
The bandwidths of the kernels are optimized using Bayesian
optimization (Brochu et al., 2010; Snoek et al., 2012; Srinivas
et al., 2012). We denote the resulting probability density by
Pi1 (p|pitδ , pit , vti ). The measure Pi1 computes the probability
density that player i can reach position p in time 1 from position
pitδ with initial velocity vti .
Figure 3 shows an example: Trajectories of players are
projected into a new coordinate system such that every trajectory
starts in the origin with an initial movement along the x-axis.
The endpoints of the trajectories are then stored for the actual
initial velocity and time window. Depending on the application,
FIGURE 2 | A function that maps the distance d to the influence radius r i .
the point distribution can be either used directly or approximated
by its convex hull. We refer to Brefeld et al. (2019) for details on
the computation of data-driven movement models.

such that δ, 1 > 0 and tδ < t < t1 holds. Each triplet is a


4. QUANTIFYING SPACE
subset that represents the trajectory of a player with past, current,
and final position. Hence, pitδ and pit can be used to estimate the 4.1. Influence of the Player
velocity vector vit including the direction a player is heading to Fernandez and Bornn (2018) introduce PC to measure the
at time t. Given this initial velocity, pit1 represents a point that a dominance of players and teams in certain areas on the pitch. In
player is able to reach in 1 time steps. To this end, all triplets of that sense, PC is similar to dominant regions (Taki et al., 1996) or
the same player are mapped (and rotated) into a new coordinate zones of control (Brefeld et al., 2019). We aim to study data-driven
system such that the first part realizes a movement along the x- movement instead of Gaussian approximations together with PC.
axis and the final endpoints of the triplets indicate points that are For the data-driven approach, we need to map the distance
reached by the player in time 1 with initial velocity v given by between player and ball to a time horizon 1. Since our analyses
the Euclidean norm of the velocity vector kvk2 [see Brefeld et al. will focus on passing events, the amount of time a player can
(2020) for details on how to estimate v from tracking data]. To be move around is upper bounded by the time it would take to
concrete, p′ t1 is given by pass the ball to her. This can directly be translated into the
time horizon that is necessary to select the best-suited player
(xt′1 , yt′1 ) = (d · cos θ , d · sin θ ) (1) probability density of the player Pi1 because of the binning into
discrete time intervals 1 ∈ T . This function can also be learned
where the rotation angle θ is computed as above, from historic data using pass data as an approximation. The idea
is to learn a predictor of the time a player usually has to reach the
θ = ∡(−
p− → −−−→
tδ pt , pt pt1 ) ball given the initial distance between him and the passer at the
= atan2(yt − ytδ , xt − xtδ ) − atan2(yt1 − yt , xt1 − xt ), (2) time the pass was initiated tp . For example, for short distances the
receiving player has less time to react and therefore less ground
and distance d is defined by she can cover to get herself in an open-spot position to receive a
pass. The distance is then defined as the Euclidean norm of the
d = ||−
p− −→
t pt1 ||2 . (3) vector between passer pb and receiver pr at time tp :

To obtain an individual movement model for player i, all −−−→


d = kpbtp prtp k2 .
available triplets (pitδ , pit , pit1 ) are extracted from historic games
and transformed according to the above procedure. The resulting
endpoints are collected together with the initial velocities in a The time window 1 is derived by the duration of pass i.e., the
set. This can be carried out for each 1 in a finite set of time traveling time of ball from passer to receiver
horizons T such that the result is S1∈ i i
T = {(pt1 , vt )}. The
time window δ to obtain the initial velocity vector remains fixed 1 = tr − tp
for all combinations. For practical reasons, similar velocities are
2 Alternatively, parametric approaches like adapting a Gaussian with maximum
often aggregated into bins of similar ranges. Since all passes
are completed within 5 s, we use the time horizons T = likelihood or a Gaussian mixture using expectation maximization could be
pursued. However, the former cannot appropriately represent the multi-modal
{0.2, 0.4, . . . , 5}. The initial velocity is estimated in the preceeding player distributions (cmp. Brefeld et al., 2019) and there remains the problem
δ = 0.2 s. Following Brefeld et al. (2019), we group velocity of choosing the number of mixing components in the latter. We simply
ranges into standing ([0, 1) km/h), walking ([1, 7)), jogging circumnavigate these issues by staying non-parametric.

Frontiers in Sports and Active Living | www.frontiersin.org 4 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

FIGURE 3 | Data-driven movement models. (Left) The initial position pt is mapped to the origin such that the initial direction of movement vt follows the x-axis.
Position pt1 marks the end of the trajectory. Position ptδ is required to estimate initial velocity. (Center) Resulting point cloud. (Right) Smoothed movement model
using density estimation.

with tr being the point in time, the receiving player actually


receives the ball3 . The mapping function can be phrased as a
regression task with 1 as response and d as an explanatory
variable. We use 25,663 passes to calculate the distance d and pass
duration 1. In short, 80% of the passes are used as the training
set. Hyper-parameters are optimized using cross-validation. All
models are finally validated against the remaining 20% of the
passes, and the best model is chosen by selecting the one with the
minimal mean squared error on the validation set. The resulting
linear regression4 provides a power feature transformation (Yeo
and Johnson, 2000) to better fit the underlying assumptions
for linear regression models (e.g., homoscedasticity in errors).
The learned relationship between distance and time is shown in
Figure 4.
Finally, influence likelihoods of the players are normalized
such that the degree of control of a player’s for each point on
FIGURE 4 | The function that maps the distance d to time window 1.
the field lies in the interval [0, 1] by dividing the likelihood of
each point pj with the likelihood at the underlying mode of
distribution (main). This will further be referred to as the player
influence area (PI). For data-driven movement models, the main As a result, the influence value of the player at the main
mode is computed with mean shift (Comaniciu and Meer, 2002), mode (the highest peak) of each movement distribution has the
and the PI is given by value PIti = 15 .

4.2. Pitch Control


Pitch control (PC) for a team is defined as the sum over all
influence areas of players. Hence, with all players belonging to
Pi1 (p|pitδ , pit , vti ) team a collected in set A and their opponents in set B, the
PIti (p) = (4)
Pi1 (mode|pitδ , pit , vti ) summed team influences can be subtracted to obtain the pitch
control at point p at time t,
X X
PIta (p) − PItb (p) ,

PCt (p) = σ (5)
a∈A b∈B
3 The actual timestamp of the ball reception is difficult to determine due to noise
in the data. In this study, we use a heuristic to select the point in time when the where σ maps PC into an appropriate interval. In the remainder,
ball position is in a radius of 1.5 m around the receiving player. This heuristic is
a trade-off between accuracy and the amount of successful passes that are actually
we make use of tanh : R 7→ [−1, 1], i.e., the value PCt (p) = −1
detected in the tracking data. indicates that the defensive team has full control at point p and
4 For simplicity, we choose a simple model with only a single feature. For higher

predictive accuracies, we suggest to learn more sophisticated (possibly non-linear) 5 The influence of the player also depends on the position of the ball pb . For
functions to estimate the time horizons using additional features like actual notational simplicity in the notation, this is omitted and is implicitly included by
player/ball positions and/or velocity vectors. the time t and the positions of all actors at that time.

Frontiers in Sports and Active Living | www.frontiersin.org 5 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

FIGURE 5 | Pitch control (PC) for the proposed approach (Left) and baseline (Right).

time t whereas PCt (p) = 1 means that the offensive team controls
that area.
Figure 5 compares the resulting PC with the original
formulation in Fernandez and Bornn (2018) (baseline) on an
exemplary situation. The red team plays from left to right. The
ball is currently at the green cross and being passed to the purple
cross where the red striker scores with a volley shot. The figure
reveals the main differences between both the approaches. The
influence areas of the baseline are much larger and cover a great
deal of the pitch. By contrast, influence areas computed with
the data-driven movement model are much smaller, especially
when a player is close to the ball. Note that the location of the
purple cross has a PC value of –0.21 for the baseline, while the
proposed approach clearly reflects the known outcome of this
scoring possession by a PC value of 0.57. Since the red player
is already in possession of the ball and moreover able to pass it FIGURE 6 | PC for final passes before a shot was made.
on to the striker, the data-driven model delivers a more realistic
interpretation of control on the pitch.
To confirm this impression, we aim to conduct an experiment
on all 289 successful ball possession phases in the data. concepts from Fernandez and Bornn (2018) to compute the
Throughout this analysis, we define a possession to be successful value of a position. The underlying idea is that defensive players
if it ends with a shot at the goal. We focus on sequences with intuitively cover highly valuable space. Obviously, defensive
at least three successful passes because the vast majority of players do not position themselves perfectly in every situation,
possessions with fewer passes are rather chaotic and, e.g., consist e.g., to prevent through-passes. But we argue that such individual
of a series of headers after a goal-kick. Analog to the example mistakes are exceptions and that defenders usually cover the
pd
above, we collected the pitch control PCt for the attacking team important areas on the pitch in similar situations, hence,
at the pass destination pd and at the time t the final pass was with a sufficiently high number of training situations that a
made before the attacker shots at the goal. Figure 6 compares the model should be able to generalize well and predict the high
results of both models. For our data-driven approach, in nearly valued space.
75% of the cases the attacking team has a positive PC before We thus aim to learn influence areas for a defending team
the pass receiver is able to take a shot. This follows the intuition from historic data given the ball position at that time pbt . This
that the attacking team must have created some space to realize will be referred to as defensive influence (DI). The observed DI
the shot at the target. Using the baseline model, however, the on point pj is the sum of influences of all players in the defensive
observed PC values do not allow for an informed guess on the team A at time t,
known outcome of these situations.

4.3. Pitch Value


While PC provides interesting insights, many of the colored
( )
pj
X
regions in Figure 5 are irrelevant for the shown situation (e.g., DIt (pbt ) = min PIta (pj ), 1 . (6)
space controlled by the red goal-keeper). Again, we borrow a∈A

Frontiers in Sports and Active Living | www.frontiersin.org 6 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

FIGURE 7 | Pitch value using the learned function fnθ . The ball is located at the green cross, and the attacking team plays from left to right. (Left) data-driven.
(Right) baseline.

FIGURE 8 | PC (left column), pitch value (center column), and space quality (right column) for baseline (top row) and proposed approach (bottom row).

Analogous to PC, we define the maximum amount of DI to be Fernandez and Bornn (2018), we propose to learn fnθ with a
one. Using this definition, the pitch value is defined as feed-forward neural network (FNN) by minimizing the mean
squared error,
 −−→ 
j pg ||
pj || p 2 pj
X pj 2
PVt (pb ) = 1 − −−→  · DIt (pbt ). (7) min DIt (pbt ) − fnθ (pbt , pj ) .
θ
||pc pg ||2 t,pj

Here, pc denotes the point at the opposite corner such The training data contain game situations from 54 Bundesliga
that the denominator marks the longest possible distance to games (about 34 million observations) where goalkeepers are
the goal. Hence, pitch value equals the defensive influence ignored. To render this training task computationally feasible, we
scaled by the distance to the goal that is in the range choose points as features that lie on an equally spaced 21 × 16
[0, 1] following the idea that points on the pitch are grid G such that pj ∈ G . This results in a (|T | × |G |) × 4
generally more valuable the closer they are to the goal of feature matrix X where each row contains the (x, z) coordinates
the opponent. of one pj ∈ G and the ball position pbt at time t for all available
pj timestamps T in the data set. Dropout (Srivastava et al., 2014) is
Though DIt can be extracted from historic games, we need a applied to all hidden layers to prevent over-fitting, and all hyper-
pj
function fnθ (pbt , pj ) that approximates DIt well and that can be parameters (# layers, # units per layer, dropout rate, and learning
applied to new and unseen situations for generality. Following rate batch size) of the network are optimized with Bayesian

Frontiers in Sports and Active Living | www.frontiersin.org 7 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

of the two models around the left winger. The baseline credits
much space in her back to her team due to an excessively large
influence of the player. However, given the velocity vector of a
player in this situation the player can hardly control the areas
behind her, especially given the rather short distance to the
passer. Moreover, the baseline estimates the area directly in front
of her as a neutral zone (white). With the data-driven model,
however, that particular area turns dark red as one would expect
in this situation. Also note that the defensive team is pretty
disorganized; they are not doing well in covering the important
space because their defensive line and especially their right back
moved up too far that allows the aforementioned left attacker to
run into the exposed region.

5. SPACE GENERATION
FIGURE 9 | The resulting area under the curve (AUC) values.
While the previous section suggests that identifying the impact
of the player with individual movement models actually makes
sense, we now turn toward establishing an empirical basis for
optimization. In our experiments, the network with the best this insight. Since the devised quantities are difficult to evaluate
performance had two hidden layers and 64 units in each layer. quantitatively, we resort to proxies and study space generation
The optimization was carried out using the Adam optimization and measurable outcomes of ball possession phases.
algorithm (Kingma and Ba, 2015). To connect to the previous section, we first test the hypothesis
Figure 7 shows an example of the data-driven approach and that passes into areas of high space quality are more likely to
the baseline (Fernandez and Bornn, 2018). The ball is on the result in a positive outcome than passes into zones with small
left wing just outside the box visualized by the green cross. The space quality. We follow a simple setup: For each pass in the
attacking team plays from left to right. In the data-driven model, event log, the resulting space quality is computed at equidistant
the last defending line forms up right behind the center-line with points pj ∈ F lying on a 50 cm-spaced grid over the pitch6 .
the intention to use the offside rule to limit the space in which We use only two predictors: (i) the average space quality at the
the attacking team can operate. The right defender covers space location of the passer po and (ii) the average space quality at the
slightly deeper than his peers on the left side. This is a useful position of the pass destination pd for every possession. To take
tactic to prevent straight and long passes in the back of the last the distance between a position (grid cell) and the pass origin and
defending line. It also discloses the habit of defending teams to destination, respectively, as well as some smaller inaccuracies in
prevent crosses from one side to the other. Strikers and offensive the pass event data into account, we weigh space quality with
midfielders position themselves in a way that their opponent is exponential decaying factors λo and λd , so that positions far
forced to play the long passes mostly along the sideline. The away from the pass origin and destination, respectively, do not
influence area reaches far out to the left penalty box to isolate impact the results. The magnitude of the exponential decay is
the ball-possessing player on his side. Such insights are hidden controlled by parameters that are found by model selection. So,
in the results of the baseline that considers about half of the the features for the kth possession with np pass timestamps Tk are
pitch important. defined as:
4.4. Space Quality 1 X X pj −−→
As shown in the previous sections, pitch control measures the xok = SQt · exp(−λo · ||pj po ||2 ) (9)
np j
t∈Tk p ∈F
amount of dominance that a player or team has on a certain
location. Pitch value, by contrast, relates to the value that a 1 X X pj
−−→
xdk = SQt · exp(−λd · ||pj pd ||2 ) (10)
location has at that very moment. Space quality (SQ) for the jth np j
t∈Tk p ∈F
location at time t is now simply defined as the product of pitch
control and pitch value (Fernandez and Bornn, 2018),
We use 5,277 ball possession phases in 54 Bundesliga matches
pj pj pj containing 31,824 passes where episodes with fewer than three
SQt = PCt · PVt (pbt ). (8) passes are discarded. In sum, 5.5% of the remaining data
constitute successful ball possession phases that end with a shot at
Figure 8 shows all three parts of the equation for the same goal. These form the positive class. We use a linear support vector
situation. The red team stages an attack that, later on, ends machine (SVM) to learn a model that predicts whether an attack
up in a shot at goal. The player with the ball (green cross) is successful or not, based on the two input features.
plays a deep forward pass to the red player on the left wing.
The pass receiver generates pitch control in a highly valuable 6 The proposed grid size trades-off accuracy and computation time. Other values

area that results in space of high quality. Note the differences are certainly possible.

Frontiers in Sports and Active Living | www.frontiersin.org 8 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

FIGURE 10 | Correlation between SGrec and non-penalty xG per 90 min (Left) and between SGpas and xA per 90 min (Right).

FIGURE 11 | Comparison of average space quality of the player at the time of all passes (left column) and at the time each player is the actual pass recipient (right
column). Playing direction is from left to right. The SG values in the color bar relate to the average space quality a player generates at a certain field position during all
considered passing events.

For each experiment, we randomly choose 80% of the data thus focusing on space quality lead to a much better predictive
for training and 20% for model evaluation using area under the accuracy for the data-driven approach.
curve (AUC). For every combination of parameter and model, we For the data-driven approach, the classifiers perform even
repeat the experiment 1,000 times. To analyze the effect of adding better: A very fine-grained focus on the pass destination increases
pitch value to the space quality equation, we repeat the same the ability to predict the outcome of the ball possession.
setup but replace space quality with pitch control in Equations Translated to the situation in Figure 5, an area in a radius of
(9) and (10). The results are shown in Figure 9. Using only pitch 1.5 m around the shot position is considered as sufficient for
control does not lead to significant differences between data- the classifier. This area is largely controlled by the red team.
driven and baseline approaches. However, adding pitch value and The detailed focus on a small area around the pass destination

Frontiers in Sports and Active Living | www.frontiersin.org 9 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

FIGURE 12 | (Left) Average passes to player 6. (Right) Passes from 13 to 6.

is possible because the data-driven model is able to approximate The following analysis is based on data from six teams playing
pitch control more accurately than the baseline does. This is against each other leading to a subset of 30 games with a total
pd of 16,631 passes. Space generation is again computed on a 50 cm-
reflected by pitch control values at the shot location (PCt = 0.57
pd equidistant grid on the pitch. In our analysis, we only consider the
for the data-driven model vs. PCt = −0.21 for the baseline 98 players who were involved in at least 30 passes (either as passer
model) and in Figure 6. In fact, for the baseline the classification or pass receiver) during these games for a robust comparison.
results show a very different behavior: the smaller the considered In the remainder, SGrec denotes the amount of average space
space, the worse the performance. Overall, the classifiers based created by a pass receiver and SGpas credits this amount to
on the data-driven model significantly outperform the ones that the passing player. SGrec thus corresponds to a player creating
ground on features from the baseline model. Often, large average space for herself by positioning in areas where she can get the
space quality values in ball possession phases are caused by only ball. Similarly, SGpas describes the ability of a passer to identify
a few high-quality passes. valuable spaces and to pass the ball into valuable areas that were
Unsurprisingly, these experiments show that it is beneficial generated by her teammates. SGtotal simply defines the sum of
for a soccer team to create valuable space during a possession both measurements.
through passing in order to get in promising situations to score We focus on possible relationships between our space
a goal. Our analysis confirms that this can actually be measured generation metrics and existing player metrics and valuations.
with the proposed approach. Our approach turns out accurate Prominent concepts are the expected goal (xG) and expected
and allows to derive meaningful metrics for individual players. assists (xA) metrics that measure the probability that whether
a shot will result in a goal and credit this likelihood either to
5.1. Measuring the Generation of Space the shooter (xG) or the pass giver (xA), respectively. Although
We now leverage space quality to off-ball movements and space
implementations differ in details, the basic idea is to compare
generation. A simple way to measure the off-ball movement is to
i,p shots with similar characteristics (e.g., shot position and body
compute space quality SQt for player i at time t and location p part the attacker made the shot with) and calculate how many
and subtract the space quality of all other players j ∈ P \ {i} at of these shots actually resulted in a goal (Lucey et al., 2015; Le
that point and time, et al., 2017; Rathke, 2017). Besides its popularity, we choose these
X X n
i,p j,p
o measures because, compared to the actual number of goals, it
SGit = max (SQt − SQt ), 0 . (11) leaves aside factors such as luck and rather aims at the ability of
p∈F j∈P \{i}
the players to bring herself into situations to score7 . From that
Hence, the resulting space generation is the sum of individual point of view, xG and SGrec pursue similar goals as the latter
space quality over an equally spaced grid F , i.e., the amount values the ability of a player to bring herself into a position
of control that this player actually has on certain areas on the to receive passes in high-quality areas that ultimately (for the
pitch weighted by the pitch value. Note that this measurement final pass in a possession) results in a position to shoot at
approach differs from the space generation gain concept in the goal.
(Fernandez and Bornn, 2018), which quantifies the space that an Figure 10 (left) clearly shows a significant positive correlation
attacker frees up by dragging the opponents into his direction. between both metrics [Pearson’s r = 0.66 with p-value = 8.85e −
To compute the rating of an individual player for space
generation, SGit is evaluated for all timestamps at which an 7 Comparing to traditional measures like the number of shots leads to similar

offensive player controls the ball and attempts to make a pass. outcomes with slightly lower correlations since data are more noisy.

Frontiers in Sports and Active Living | www.frontiersin.org 10 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

14 and CI = (0.54, 0.76)]. For a more meaningful comparison, understand where these passes come from and, optimally, from
the xG value is standardized per 90 min and penalty kicks which locations and/or player. Figure 12 (left) shows aggregates
are excluded8 . Note that the result is almost unaffected by the of all passes to player 6 in that game, summarized by his
three players with xG > 0.6 and the one with SGrec > 3 teammates. Displayed are also the number of passes by arrow
[r = 0.66, p-value = 7.49e − 13, CI = (0.52, 0.76)]. These width and the average SGrec values by color. The color legend
four players are strikers with very high SGrec values, so all of ranges from light blue (low SGrec ) to dark red (high SGrec ). The
them are able to create high valued space. In addition, the three figure clearly singles out player 13 as the teammate who creates
players with a superior xG > 0.6 are exceptionally good in space of high value by his passes to player 6. Although the overall
converting shots into goals. For the player with SGrec = 3.52, SGpas metric for player 13 is only average, his passes to player 6
the story is quite different. Despite the outstanding ability to are exceptional.
create high valued space, this player is often unable to convert Figure 12 (right) zooms in on this particular connection
these situations. between the two player. All ten passes from player 13 to 6 are
Figure 10 (right) shows the results for SGpas and xA. Although shown by arrows where the color is drawn from the legend before,
their relation is not as strong as in the previous comparison, especially two long passes along the sideline result in very high SG
their correlation is still positive and significant [Pearson’s r = values. Also, the third long ball generates space above average.
0.21, p-value = 0.03, CI = (0.02, 0.4)]. This confirms our Based on this brief analysis, long passes from 13 to 6 must be
initial intuition that both concepts describe similar aspects of prevented by the opposing team to decrease the dangerousness
the game. Space generation metrics are not limited to shot or of striker 6. Particularly when both players are acting on the right
scoring events but allow also for useful insights on preceding side of the pitch, the other team needs to prevent long balls along
actions in ball possessions and game analyses, as we will see in the sline.
section 5.2. This becomes clear, in particular, for the SGpas and Using the proposed concepts, analyses like this one
xA comparison. On one hand, xA only accounts for the direct could be automated and computed automatically before
pass before a shot even though the more important pass might a game. By doing so, dangerous opponent players can
have been the one to initiate the attack. As mentioned above, the be easily identified and, together with video footage,
receiver metric SGrec does not give any insights on how well the dangerous episodes shown to the team. The system
controlled space is used, i.e., the decision-making or the cognitive also proposes a way to decrease the dangerousness of
and physical skills after receiving the ball. On the other hand, these players by preventing the right passes, and also,
xG neglects the amount of defensive pressure; hence, shots can these could be automatically retrieved from videos for a
have a high value even though the attacker is well covered by team briefing.
the defenders.

5.2. Game Analyses 6. CONCLUSIONS


In this section, we aim to sketch an application of our
contribution to the data-driven analysis of games. The central We incorporated data-driven movement models into measures
idea is to identify dangerous passes and the corresponding pass of space and control that have been originally proposed
givers and receivers and to aggregate this information over by Fernandez and Bornn (2018). We highlighted differences
historic data. Clearly, there are additional factors for players between their original and our proposed approach and provided
to decide where to pass the ball, such as technical skills and empirical evidence for the usefulness of our approach: using
crowded passing lanes. Hence, as a pass receiver, it is important player movement models as the underlying influence of the
to not only generate space but also ensure a positioning that player distinguished from by spatially clearly confined areas
actually allows to receive the ball. Figure 11 compares the average and significant correlations with quantifiable metrics such
space quality created by three players for all their passes (left) as xG. On this basis, we devised a novel space generation
and received balls (right). The midfielders in the first two measure that allowed to credit generated space to either the
rows show clear areas of high quality on the left and right pass giver or pass receiver. Both could play an important
wings, respectively. In particular, player 19 has the highest role when it comes to opponent analysis and analyzing
average space quality as a pass receiver. When receiving the games. As an example, we showed that the new measure
ball, he creates space much closer to the area in front of the can be used to automatically identify key players and to
penalty box than player 11 although both usually control space provide insights on how key passes to these players could
next to the center-line. As a pass receiver, he creates space be prevented.
everywhere in the opponent’s half and is particularly difficult
to defend.
For a more detailed view, we choose this player number 6 DATA AVAILABILITY STATEMENT
(cmp. Figure 11 bottom row) because of his widely distributed
space generation pattern and his high SGpas = 0.71 and SGrec = The data analyzed in this study is subject to the following
1.315 scores. We focus on his receiving qualities and aim to licenses/restrictions: Data is owned by the German league (DFL)
and must not be disclosed. Requests to access these datasets
8 We use xG and xA values from https://fbref.com, provided by StatsBomb. should be directed to www.dfl.de.

Frontiers in Sports and Active Living | www.frontiersin.org 11 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

AUTHOR CONTRIBUTIONS ACKNOWLEDGMENTS


All authors listed have made a substantial, direct and intellectual We would like to thank Hendrik Weber and DFL/Sportec
contribution to the work, and approved it for publication. Solutions for providing the data for the analyses.

REFERENCES Le, H. M., Carr, P., Yue, Y., and Lucey, P. (2017). “Data-driven ghosting using deep
imitation learning,” in Proceedings of the Sports Analytics Conference (Boston,
Brefeld, U., Lasek, J., and Mair, S. (2019). Probabilistic movement models and MA), 15.
zones of control. Mach. Learn. 108, 127–147. doi: 10.1007/s10994-018-5725-1 Link, D., Lang, S., and Seidenschwarz, P. (2016). Real time quantification of
Brefeld, U., Lasek, J., and Mair, S. (2020). “Analyzing positional data,” inScience dangerousity in football using spatiotemporal tracking data. PLOS ONE
Meets Sports – When Statistics Are More Than Numbers, eds C. Ley and Y. 11:e0168768. doi: 10.1371/journal.pone.0168768
Dominicy (Cambridge Scholars Publishing), 81–94. Lucey, P., Bialkowski, A., Monfort, M., Carr, P., Matthews, I., and Research, D.
Brochu, E., Cora, V. M., and de Freitas, N. (2010). A tutorial on Bayesian (2015). “Quality vs Quantity”: Improved Shot Prediction in Soccer using,” in
optimization of expensive cost functions, with application to active Proceedings of the MIT Sloan Sports Analytics Conference (Boston, MA), 9.
user modeling and hierarchical reinforcement learning. CoRR, abs/1012. Mortensen, J., and Bornn, L. (2019). “From Markov models to Poisson point
2599. processes: modeling movement in the NBA,” in Proceedings of the MIT Sloan
Bryson, A., Frick, B., and Simmons, R. (2013). The returns to scarce talent: Sports Analytics Conference 2015, 10.
footedness and player remuneration in European soccer. J. Sports Econ. 14, Nakanishi, R., Maeno, J., Murakami, K., and Naruse, T. (2010). “An approximate
606–628. doi: 10.1177/1527002511435118 computation of the dominant region diagram for the real-time analysis of
Comaniciu, D., and Meer, P. (2002). Mean shift: a robust approach toward group behaviors,” in RoboCup 2009: Robot Soccer World Cup XIII, Lecture
feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619. Notes in Computer Science, eds J. Baltes, M. G. Lagoudakis, T. Naruse, and
doi: 10.1109/34.1000236 S. S. Ghidary (Berlin; Heidelberg: Springer ), 228–239.
Dick, U., and Brefeld, U. (2019). Learning to rate player positioning in soccer. Big Rathke, A. (2017). An examination of expected goals and shot efficiency in soccer.
Data 7, 71–82. doi: 10.1089/big.2018.0054 J. Hum. Sport Exerc. 12. doi: 10.14198/jhse.2017.12.Proc2.05
Fernandez, J., and Bornn, L. (2018). “Wide Open Spaces: a statistical technique Snoek, J., Larochelle, H., and Adams, R. P. (2012). “Practical Bayesian optimization
for measuring space creation in professional soccer,” In Proceedings of the MIT of machine learning algorithms,” in Proceedings of the 25th International
Sloan Sports Analytics Conference (Boston, MA). Conference on Neural Information Processing Systems (Lake Tahoe).
Franck, E., and Nüesch, S. (2012). Talent and/or popularity: what does it take Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. (2012). Gaussian process
to be a superstar? Econ. Inquiry 50, 202–216. doi: 10.1111/j.1465-7295.2010. optimization in the bandit setting: no regret and experimental design. IEEE
00360.x Trans. Inform. Theor. 58, 3250–3265. doi: 10.1109/TIT.2011.2182033
Franks, A., Miller, A., Borrn, L., and Goldsberry, K. (2015). “Counterpoints: Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.
advanced defensive metrics for NBA Basketball,” in Proceedings of the MIT (2014). Dropout: a simple way to prevent neural networks from overfitting. J.
Sloan Sporty Analytics Conference (Boston, MA). Mach. Learn. Res. 15, 1929–1958. doi: 10.5555/2627435.2670313
Fujimura, A., and Sugihara, K. (2005). Geometric analysis and quantitative Taki, T., and Hasegawa, J.-I. (2000). “Visualization of dominant region in team
evaluation of sport teamwork. Syst. Comput. Jpn 36, 49–58. games and its application to teamwork analysis,” in Proceedings of the IEEE
doi: 10.1002/scj.20254 International Conference on Computer Graphics (Washington, DC).
Gerhards, J., Mutz, M., and Wagner, G. G. (2014). Die berechnung des Siegers: Taki, T., Hasegawa, J.-i., and Fukumura, T. (1996). “Development of motion
Marktwert, Ungleichheit, Diversität und Routine als Einflussfaktoren auf die analysis system for quantitative evaluation of teamwork in soccer games,”
Leistung professioneller Fußballteams / Predictable Winners. Market Value, in Proceedings of 3rd IEEE International Conference on Image Processing
Inequality, Diversity, and Routine as Predictors of Success in European Soccer (Lausanne).
Leagues. Z. Soziol. 43, 231–250. doi: 10.1515/zfsoz-2014-0305 Ueda, F., Masaaki, H., and Hiroyuki, H. (2014). The causal relationship between
Gudmundsson, J., and Horton, M. (2017). Spatio-temporal analysis of team sports dominant region and offense- defense performance - focusing on the time of
– A survey. ACM Comput. Surv. 50, 1–34. doi: 10.1145/3054132 ball acquisition. Football Sci. 11, 1–17.
Gudmundsson, J., and Wolle, T. (2014). Football analysis using Yeo, I.-K., and Johnson, R. A. (2000). A new family of power transformations
spatio-temporal tools. Comput. Environ. Urban Syst. 47, 16–27. to improve normality or symmetry. Biometrika 87, 954–959.
doi: 10.1016/j.compenvurbsys.2013.09.004 doi: 10.1093/biomet/87.4.954
Hobbs, J., Power, P., Sha, L., Ruiz, H., and Lucey, P. (2018). “Quantifying the value
of transitions in soccer via spatiotemporal trajectory clustering,” in Proceedings Conflict of Interest: The authors declare that the research was conducted in the
of the MIT Sloan Sports Analytics Conference (Boston, MA), 11. absence of any commercial or financial relationships that could be construed as a
Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: biased potential conflict of interest.
estimation for nonorthogonal problems. Technometrics 12, 55–67.
doi: 10.1080/00401706.1970.10488634 Copyright © 2021 Martens, Dick and Brefeld. This is an open-access article
Horton, M., Gudmundsson, J., Chawla, S., and Estephan, J. (2017). Classification distributed under the terms of the Creative Commons Attribution License (CC BY).
of passes in football matches using spatiotemporal data. ACM Trans. Spatial The use, distribution or reproduction in other forums is permitted, provided the
Algorithms Syst. 3, 1–30. doi: 10.1145/3105576 original author(s) and the copyright owner(s) are credited and that the original
Kingma, D. P., and Ba, J. (2015). “Adam: a method for stochastic optimization,” publication in this journal is cited, in accordance with accepted academic practice.
in International Conference on Learning Representations (ICLR2015) (San No use, distribution or reproduction is permitted which does not comply with these
Diego, CA). terms.

Frontiers in Sports and Active Living | www.frontiersin.org 12 July 2021 | Volume 3 | Article 676179
Martens et al. Space and Control in Soccer

A. APPENDIX the player is able to increase his performance in this category by


10%, i.e., the player with the median value of EUR 7.5 m would
Last but not least, we study the impact of space generation be worth almost EUR 8 m. We observe a similar effect when
on the market value of players. We use market values of considering SGrec alone. Here, a 10% increase in the receiver
the 2017/18 season as a quality indicator for the players metric would account for a 3.4% increase in market value. For
gathered from an online platform9 that have been shown to the passing metric SGpas , we find a significant relationship only
correlate with actual transfer fees and even with the outcome for midfielders. Nevertheless, the influence this metric has on
of soccer tournaments (Franck and Nüesch, 2012; Bryson et al., the market value is the highest as market value would be 9.3%
2013; Gerhards et al., 2014). We use a standard ordinary higher if the SGpas metric increases by 10%. This observation also
least squares (OLS) linear regression analysis to understand matches with the traditional role for midfield players who usually
the relationship between market values (response variable) need to have good play-making abilities. Although many other
and space generation measurements (independent variables). parameters are factored into the estimation of market value, the
Usually in soccer, teams within the same league have very ability to create high-quality space as passer or pass receiver is
different financial resources, and therefore, some teams can something that is of great interest for soccer teams and clearly
afford buying and paying players with higher market values results in corresponding market values.
than others. So, we factor in the team name as fixed effects
into the model. As both response and independent variables
are exponentially distributed, we need to log-transform them
TABLE A1 | Linear model results for relationship between market values and SG
before fitting the models to meet the basic assumptions of the metrics.
OLS model.
Table A1 summarizes the results. For the summed up passing Formula Coefficient p-value

and receiving space generation metrics SGtotal , the model


log(market value) ∼ log(SGtotal ) + team 0.65 0.013
coefficient suggests that market value of a player is 6.5% higher if
log(market value) ∼ log(SGrec ) + team 0.35 0.029
log(market value) ∼ log(SGpas ) + team 0.94 0.035
9 www.transfermarkt.de accessed at February 2nd, 2021. (only midfield players)

Frontiers in Sports and Active Living | www.frontiersin.org 13 July 2021 | Volume 3 | Article 676179

You might also like