Input-Output Network Analysis
of the Italian Economy Using Graph
Databases: A Case Study
Francesco Cambria(B)
Politecnico di Milano, via Ponzio 34/5, Milan, Italy
[email protected] Abstract. Graph databases and graph networks have gathered signifi-
cant attention for their success in enriching economic analysis by enabling
the visualization and exploration of complex relationships among enti-
ties, such as industries and products. In this study, the features of graph
networks are exploited to model and analyze the network derived from
the input-output system of the Italian economy. Thanks to graph-based
algorithms, including community detection and centrality algorithms, we
analyzed how the structure of the Italian economy evolved from 2000 to
2014, highlighting the major changes among the key Italian industry
sectors and their interconnections.
Keywords: Graph networks · Neo4j · Input-output analysis · Italian
economy
1 Introduction
Graph databases and graph networks are increasingly pivotal in data analysis
due to their ability to naturally model complex and interconnected data in the
real world [15]. Graph databases excel in traversing relationships and uncov-
ering hidden patterns with remarkable efficiency [1] and they offer an intuitive
approach to solving and explaining data problems that grow increasingly in com-
plexity and volume [14]. One of the most suitable applications of networks in
economy is input-output analysis, which is an economic model that examines
the flow of goods and services between different sectors within a region, a coun-
try, or even the whole world. This model uses an input-output matrix, where
each row represents the distribution of a sector’s output across other sectors,
and each column represents the inputs required by a sector from other sectors
[9]. Essentially, it captures how the output from one sector becomes an input
for another, illustrating the intricate web of economic dependencies. The com-
plexity in input-output analysis arises from these inter-dependencies between
sectors[18]. Changes in one sector can have cascading effects throughout the
economy. Understanding and predicting these ripple effects requires modeling
the entire network of relationships, which is where graph theory excels.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2025
H. Cherifi et al. (Eds.): COMPLEX NETWORKS 2024, SCI 1188, pp. 163–174, 2025.
https://doi.org/10.1007/978-3-031-82431-9_14
164 F. Cambria
In this work, Neo4j, one of the most widely used graph databases [7], is
exploited to efficiently translate the complex interconnections modeled by input-
output matrices into a graph network to traverse its edges to identify sectors that
have significant influence over others. Thanks to this, similarly to the input-
output analysis, multiple economic scenarios can be simulated in order to assess
the impact of changes in one sector on the overall network. This graph-based
approach could not only simplify the visualization of economic data but also
enhance the ability to perform sophisticated analyses: for example, the ability
to easily traverse connections and compute metrics like centrality or cluster-
ing coefficients can prove to be extremely useful in understanding the systemic
impacts and optimizing economic policies.
2 Related Work
In the context of economic analysis, several studies explored the application
of graph theory and graph databases. For instance, concerning trade networks
graphs are exploited to model the flow of goods between countries, in particular:
in [16] the structure and evolution of the trade networks are analyzed with
random graph models, and in [3] centrality algorithms are used to rank the
trading relevance of countries. These examples prove that applying graph-based
algorithms for analyzing economic systems leads to meaningful results, pushing
us to test them on input-output analysis.
The study of input-output matrices has a rich history, back in the 1930 s
Wassily Leontief introduced in [10] the concept to analyze the economic inter-
dependencies between different sectors of an economy. Traditional approaches
to input-output analysis have largely relied on matrix algebra and linear pro-
gramming techniques [8] to solve the complex equations that describe sector
interactions, but recent studies have utilized graph networks to achieve a more
nuanced analysis of the structure of input-output systems. In [12] for instance,
input-output networks are introduced as the translation into a graph model of
all the fundamental concepts behind the study of economic systems through
input-output analysis; in addition in this work, some of the most common graph
algorithms are also studied in order to understand the meaning of their results
in this context. From a more practical perspective, in [5] a directed graph is
constructed based on the technical coefficients of the matrix reflecting the sec-
tor dependencies of the Andalusian economy, and this procedure is exploited to
better explain and visualize all the major concepts of the input-output analysis.
Both in [2] and in [19] instead, both the community detection and centrality algo-
rithms are used on the global I-O network to investigate the inter-dependencies
of the industry sector of each country, highlighting the strength of these connec-
tions. These examples prove that graph models successfully stretch the bound-
aries of input-output analysis, making it more flexible by studying the data from
different perspectives.
Input-Output Network Analysis of the Italian Economy 165
3 Methods and Results
3.1 Construction of Italian Input-Output Network
To construct the Italian Input-Output network the World Input-Output
Database (WIOD) is taken as a reference. The WIOD provides time-series
data on national input-output tables, harmonized across countries and sectors,
facilitating the examination of international trade flows, production structures,
and consumption patterns. The World Input-Output Tables (WIOTs) are con-
structed from published statistics from each country’s national statistical insti-
tutes and various international statistical sources like UN National Accounts
and OECD. Considering that this work focuses on Italy, only its national input-
output table (NIOT) is used in which all the global imports and exports are
grouped for each sector. Then, data pre-processing is required, where the input-
output table is cleaned and structured to match the loading format of Neo4j.
This involves re-organizing the matrix data vertically to represent every indus-
try and product of the economic system into nodes and the flow of goods and
services into relationships between nodes. In addition, in WIOD matrices all
the values are given in millions of US dollars which need to be normalized with
respect to each sector’s domestic gross output to obtain the technical coefficients
following Leontief’s algebra. The resulting graph schema of the Italian NIOT is
reported in Fig. 1.
Fig. 1. Graph Schema of the Italian input-output Network
Nodes labeled as Industry represent each economic sector of the input-
output table, and they are uniquely identified by the property Code (the same
code used in WIOD, defined in the 2-digit ISIC revision 4 level [4]). Each pair
of Industry nodes (considering also self-loops where each node is paired with
itself) has multiple relationships of type TO, one of each year, storing the value
of the yearly technical coefficients calculated from the input-output matrix; if
the value of the technical coefficient is equal to 0 no relationship is created in
the graph. Nodes labeled Import and Total Output, instead, are unique nodes
that act as placeholders from or to which are directed all the IMPORT and OUTPUT
relationships for each Industry node storing the yearly values of total imports
and total final demand in millions of US dollars.
166 F. Cambria
3.2 Leontief Model in Graphs
The Leontief Model [9,10] is a foundational framework in economic input-output
analysis that depicts the interdependencies between different sectors of an econ-
omy. This model uses a matrix representation to describe how the output from
one sector can become an input to another, thereby facilitating the analysis
of both direct and indirect economic contributions. By representing sectors as
nodes and economic dependencies as directed edges, Neo4j allows for the efficient
computation of indirect contributions through its graph traversing algorithms,
enhancing the explainability and visualization of the origin of these indirect
coefficients.
Central to this model is the input-output matrix A, where each element aij
represents the input from sector i required to produce one unit of output in
sector j. The model also uses the vector of total outputs X and the vector of
final demands Y . The solution Leontief Model equation, X = (I − A)−1 (Y −
J), involves calculating the Leontief inverse (I − A)−1 , which captures both
direct and indirect effects of changes in final demand on total output; although
considering the dimensions of the matrix A, this calculation can prove to be
computationally challenging, and thus (I − A)−1 is usually approximated to
I + A + A2 + A3 + A4 + ... + An according to Taylor series expansion. The first-
order contribution coefficients are equal to the technical coefficients aij of the
Leontief matrix A and their values are directly stored in the relationships TO.
The higher-order contributions represent more complex relationships between
the sectors, and it is possible to demonstrate that any n-order contribution
coefficient αij can be calculated as the sum of the products of the technical
coefficients of the relationships traversed across all the paths of length n from
sector i to sector j. In Neo4j, these contributions can be easily calculated with
path traversal queries, thus making it possible to evaluate the yearly indirect
contributions of the Italian input-output network adding value to the analysis
of the whole economic system.
3.3 Topological Analysis
The study of input-output network topology provides crucial insights into the
structural inter-dependencies and flow dynamics within an economy, enabling a
deeper understanding of how sectors are interconnected and how shocks propa-
gate through the economic system.
In [5], the concept of Autonomous Sets is introduced: two sets of economic
sectors are autonomous if there is no connection between sectors of different
sets. In particular, if the Industry nodes of the Italian input-output network
can be divided in a way that for a given year there are sets of nodes that are not
connected by any relationship TO of that year, then, these sets can be consid-
ered autonomous. Since autonomous sets are independent, the network can be
decomposed into smaller networks (corresponding to each autonomous set) scal-
ing down the input-output analysis to smaller independent problems. Dividing
Input-Output Network Analysis of the Italian Economy 167
the Industry nodes into autonomous sets is straightforward in Neo4j’s frame-
work: exploiting its own native library Graph Data Science [13], many com-
munity algorithms can be applied to any graph such as the Weakly Connected
Components [17] algorithm.
In particular, this algorithm creates node communities based on whether a
path exists between them, which is the definition of autonomous sets. The com-
munity information can be saved as a property of the Industry so that it can be
exploited for further analysis. In the first (1) column of Fig. 2, the autonomous
sets of the Italian input-output network are reported for the years 2000, 2004,
2008, and 2012; this plot shows almost all the Italian economic sectors were
connected except for one (and an additional one in 2012), resulting in a large
autonomous set and an independent Industry. To further study the network
structure, we estimated the strength of the community. The Weakly Connected
Components algorithm can use a threshold to determine whether to consider a
relationship between Industry nodes based on the value of the technical coeffi-
cient. This threshold acts as a filter for the relationships: only those with techni-
cal coefficients above the specified threshold are included in the computation. By
adjusting this threshold, the strength of the connections considered in the anal-
ysis can be effectively controlled, thereby influencing the size and composition of
the detected communities. Higher thresholds result in considering only stronger
relationships, which can lead to identifying more restricted and potentially more
meaningful communities. Conversely, lower thresholds include weaker relation-
ships, leading to larger and more loosely connected components. The second (2)
and third (3) columns of Fig. 2 show how the Industry nodes can be segmented
into smaller subsets according to two different threshold values of the technical
coefficients: at 0.05, more than half of the economic sectors are still connected
into a single autonomous set while the other sectors are all divided is smaller sets,
mostly made of a unique node; at 0.1, instead, the large autonomous set is itself
divided, resulting in limited sets of different dimensions with the biggest made of
6 Industry nodes. Lastly, we also evaluated the Louvain algorithm [11], which is
one the most used community algorithms for graphs. It is designed to optimize
modularity, a measure of the strength of the division of a network into commu-
nities, by iteratively grouping nodes into communities based on the density of
connections within them compared to between them. The resulting communities,
reported in the fourth (4) column of Fig. 2, are more evenly distributed across
the whole network, but compared to the Weakly Connected communities they
show a higher variability across the years.
3.4 Industries Centrality
To enhance the application of economic input-output analysis, this study also
investigates the use of centrality algorithms: they quantify the importance of
individual nodes within a network; thus, identifying among the industries which
one acts as a critical hub and influential connector in Italy’s economic network.
Among the centrality algorithms commonly applied to network analysis, PageR-
ank [6] stands out for its efficacy in identifying influential nodes within a graph.
168 F. Cambria
Fig. 2. Italian Input-Output Network Communities. Each vertical array of rectangles
represents the whole ensemble of the economic sectors in a specific year (2000, 2004,
2008, 2012), and rectangles with the same colors belong to the same community. From
one year to the next, the community kept the same color if more than half of the nodes
of the community of the 4 years prior were inside the new community. The commu-
nities were calculated using Weakly Connected Components with increasing values of
relationship threshold (0, 0.05, 0.1) and with the Louvain algorithm. Each Industry is
reported according to their unique ISIC encoding.
Originally developed to rank web pages, PageRank assigns a significance score
to each node based on the quantity and quality of links, thereby reflecting its
relative importance within the network. This algorithm can be applied to the
Input-Output Network Analysis of the Italian Economy 169
Italian input-output network to extract the most central industries across the
years, reported in Fig. 3, exploiting the technical coefficients as weights for the
importance of each relationship.
Fig. 3. Top 15 most influential Industry nodes according to the PageRank Centrality
algorithm across the years 2000, 2004, 2008, and 2012. Each Industry is reported
according to their unique ISIC encoding.
4 Discussion
As an initial step in our analysis, we conduct a preliminary evaluation of the
statistical parameters of the distributions of technical coefficients across various
sectors of the Italian economy. In Fig. 4 the following parameters are reported
for each year: the count of relationships TO connecting the Industry nodes,
representing all the non-zero technical coefficients between different sectors; the
mean μ of the values of technical coefficients; their standard deviation σ; the
coefficient of variation (CV) which is the ratio between the standard deviation
σ; and the mean μ and the maximum (Max) and minimum (Min) values of
technical coefficients. Firstly, the coefficient of variation (CV) is a good measure
of how diversified the strength of the connection across the network is: since the
CV is higher than 2 every year, the technical coefficients are spread out far from
the mean thus resulting in significantly stronger or weaker connections between
sectors. This is further confirmed considering that the maximum values of the
technical coefficients are several orders of magnitude bigger than the minimum
values. Another interesting aspect resulting from Fig. 4 is that, across the years,
the most significant change happened between 2008 and 2009 when the financial
crisis hit the European economies. From 2000 to 2008 a regular number of links
(2959) connected the Italian economy, but, after an initial slight increase in
2009, the count of connections settled (at 2900) from 2012 to 2014. In addition,
170 F. Cambria
the minimum value among the coefficients increases significantly by a factor of
40 from 2000 to 2014 suggesting that those links with minimal values of the
technical coefficients have been cut off.
Fig. 4. Statistical parameters of the Italian input-output network across the years
(from 2000–2014). a) Count, the count of relationships TO connecting the Industry
nodes. b) μ, the mean of technical coefficients. c) σ, the standard deviation; CV. d) the
coefficient of variation. e) Max, the maximum value of technical coefficients. f) Min,
the minimum value of technical coefficients.
In particular, the Forestry and logging sector (ISIC code A02), the Fishing
and aquaculture sector (A03), and Activities of households as employers sector
(T) have lost in total 63 (4, 6 and 53 respectively) connections directed to other
sectors in 2010. While all the technical coefficient values of these relationships
were close to the minimum value, indicating weaker connections within the net-
work, as a consequence of the financial crisis these three industries have lost some
of their importance in the overall Italian network by 2010. Overall, as a result
of the crisis, the Italian network appears to be less connected, and although
the mean value of the technical coefficients increased, the coefficient of variation
also rose, highlighting the growing disparity between stronger and weaker con-
nections. To better analyze the variation in the strength of the network across
the year, we also evaluated how each single connection evolved through time.
In Fig. 5 the absolute difference and percentage variation of the technical coeffi-
cients linking two Industry nodes between two consequent years are reported.
In the left side plot, those links that have grown the most in terms of abso-
lute value are reported, and on the right side the ones that have declined the
most. The absolute difference is chosen as the ranking indicator to emphasize the
variations among the links with the highest technical coefficients, representing
the most influential links in the network. In both plots, many auto-connections,
or links where a sector is connected to itself, occupy top positions. Among the
top 100 most increasing and decreasing values over the years, 39 and 24 are
Input-Output Network Analysis of the Italian Economy 171
auto-connections, respectively. Given that these links account for only 810 out
of 44091 connections (1.8%) in the Italian network from 2000 to 2014, the high
concentration of auto-connections among the most variable links suggests that
these links experience the most significant variations over the years, suggesting
that to make predictions to evaluate the effect of future positive or negative
shocks correctly assessing the behavior of those auto-connections is fundamen-
tal. Moreover, most top positions are occupied by changes occurring during the
financial crisis years, or more broadly, after 2008. Specifically, among the top
100 most increasing and decreasing values over the years, only 7 and 11, respec-
tively, refer to changes before 2008. This further indicates that, as a result of
the financial crisis, the structure of the Italian input-output network began to
change significantly, with many connections between sectors both strengthening
or weakening.
Fig. 5. Top 10 most growing (left side) and declining (right side) links. The values are
calculated as the difference, from the year reported to its previous, of the technical
coefficient from Industry 1 (on the left) to Industry 2 (on the right). Each Industry
is reported according to their unique ISIC encoding.
Next, we analyzed the communities reported in Fig. 2. Each vertical array
of rectangles represents a unique set of communities, each labeled by a spe-
cific color, according to a given algorithm and year of the Italian input-output
network. To study how the communities evolved throughout the years, we con-
sidered that a community is an evolution of a 4-years-prior community if more
than half of the nodes of that community are inside the new community, thus
also keeping the same color in different years. At a threshold of t = 0, the Weakly
Connected Components column represents the autonomous sets within the Ital-
ian network across different years. Most industries are connected into a single
large community; however, the “Activities of extraterritorial organizations and
bodies” (U) remains unconnected to any other sector. “Activities of households as
employers” (T), instead, separated from the main autonomous set of the network
in 2012. This separation was caused by a decrease in the number of connections,
a consequence of the financial crisis, as previously discussed. By increasing the
threshold to t = 0.05, the large autonomous set shrinks, with several Industry
nodes breaking away and becoming single-node communities. This allows the
172 F. Cambria
identification of the most peripheral nodes, which are less influential within the
network, as they are excluded from the original autonomous set at a relatively
low threshold. Once again, by increasing the threshold to t = 0.1, the large
autonomous set disintegrates, resulting in single nodes and small communities
that contain at most 3 to 6 nodes. These small groups highlight the strongest
connections within the network, making them particularly interesting for fur-
ther analysis as groups. The communities extracted using the Louvain algorithm
are more evenly distributed across the network, resulting in fewer communities
of similar sizes. This algorithm optimizes modularity, a measure of the density
of links within communities compared to links between communities. Over the
years, the optimized modularity values of the network’s communities are sig-
nificantly lower than 1 (0.224 in 2000, 0.184 in 2004, 0.198 in 2008, and 0.221
in 2012). This indicates that the communities are not well-defined within the
network. Additionally, this could explain why the community structure changes
significantly over time, thus making it less reliable for a deeper analysis. Among
the community evolutions reported for different algorithms, one common feature
stands out: in 2012, there were significant changes compared to previous years.
This confirms that the financial crisis substantially reshaped the structure of
the entire network. Lastly, we evaluated the most influential economic sectors of
the network according to the PageRank algorithm and reported in Fig. 3. Each
year, the majority of the top 15 positions are occupied by sectors in manufactur-
ing (those with codes starting with C). Until 2008, the top positions remained
relatively stable, with minor changes. In 2012, significant changes impacted the
influence of certain Industry nodes. For instance, the “Human health and social
work activities” sector (Q) dropped 14 positions, losing its status as the most cen-
tral node, and while most manufacturing sectors maintained their high influence
within the network, the “Manufacture of chemicals and chemical products” sector
(C20) fell by 16 positions. Cross-referencing the PageRank ranking with the com-
munities extracted by the Weakly Connected Components algorithm at a thresh-
old t = 0.1, we found that the most influential nodes are often in single-node
communities (e.g., Q, C16, C21, C31-C32, E37-E39). This is counter-intuitive, as
their centrality would suggest they should have strong connections with other
nodes. These results highlight an interesting characteristic of the Italian input-
output network: the most influential nodes do not form a few strong connections
with each other. Instead, they have numerous connections that, while not strong
enough to create small communities, are sufficient to make them central in the
network.
5 Conclusion
This study has successfully modeled the Italian economy from 2000 to 2014
using input-output tables within a Neo4j graph database. As an initial step, we
translated the Leontief equation into graph queries and leveraged path traver-
sal, providing an efficient and straightforward way to calculate and explain the
higher-order contributions. Applying community detection methods, specifically
Input-Output Network Analysis of the Italian Economy 173
Weakly Connected Components and the Louvain algorithms, has allowed for a
detailed analysis of the distribution of the technical coefficients across the net-
work, highlighting the economy’s community structure. Additionally, the use
of PageRank centrality has identified influential sectors and their shifting roles
over time. The analysis indicates that the financial crisis significantly reshaped
the network’s structure, resulting in both the strengthening and weakening of
various economic connections. These findings underscore the utility of graph-
based methods in economic modeling and provide a stronger explanation power
of the links among economic sectors. Having modeled the graph, the next step
would be to apply more advanced and detailed algorithms to extract specific
insights, thereby enhancing our understanding of the complex dynamics within
the economy.
References
1. Batra, S., Tyagi, C.: Comparative analysis of relational and graph databases. Int.
J. Soft Comput. Eng. (IJSCE) 2(2), 509–512 (2012)
2. Cerina, F., Zhu, Z., Chessa, A., Riccaboni, M.: World input-output network. PloS
one 10(7), e0134,025 (2015)
3. Colombo, A., Liu, G.: Extending network tools to explore trends in temporal gran-
ular trade networks. In: Botta, F., Macedo, M., Barbosa, H., Menezes, R. (eds.)
Complex Networks XV: Proceedings of the 15th Conference on Complex Networks,
CompleNet 2024, pp. 71–83. Springer Nature Switzerland, Cham (2024). https://
doi.org/10.1007/978-3-031-57515-0 6
4. Division, U.N.S.: Classifications on economic statistics. https://unstats.un.org/
unsd/classifications/Econ/
5. Fedriani, E.M., Ángel, F.: Tenorio: simplifying the input-output analysis through
the use of topological graphs. Econ. Model. 29(5), 1931–1937 (2012)
6. Gleich, D.F.: Pagerank beyond the web. siam REVIEW 57(3), 321–363 (2015)
7. Guia, J., Soares, V.G., Bernardino, J.: Graph databases: Neo4j analysis. In: ICEIS
(1), pp. 351–356 (2017)
8. Kuijper, M., Schumacher, J.: Input-output structure of linear differential/algebraic
systems. IEEE Trans. Autom. Control 38(3), 404–414 (1993). https://doi.org/10.
1109/9.210139
9. Leontief, W.: Input-Output Economics. Oxford University Press (1986)
10. Leontief, W.W.: Quantitative input and output relations in the economic systems
of the united states. Rev. Econ. Stat. 18(3), 105–125 (1936). http://www.jstor.
org/stable/1927837
11. Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable com-
munity detection. Parallel Comput. 47, 19–37 (2015)
12. McNerney, J.: Network properties of economic input-output networks (2009)
13. Neo4J: Neo4j graph data science. https://neo4j.com/product/graph-data-science/
14. Robinson, I., Webber, J., Eifrem, E.: Graph databases: new opportunities for con-
nected data. “ O’Reilly Media, Inc.” (2015)
15. Sakr, S., et al.: The future is big graphs! A community view on graph processing
systems. CoRR abs/2012.06171 (2020). https://arxiv.org/abs/2012.06171
16. Setayesh, A., Sourati Hassan Zadeh, Z., Bahrak, B.: Analysis of the global trade
network using exponential random graph models. Appl. Netw. Sci. 7(1), 38 (2022)
174 F. Cambria
17. Sutton, M., Ben-Nun, T., Barak, A.: Optimizing parallel graph connectivity com-
putation via subgraph sampling. In: 2018 IEEE International Parallel and Dis-
tributed Processing Symposium (IPDPS), pp. 12–21. IEEE (2018)
18. Ten Raa, T.: The economics of input-output analysis. Cambridge University Press
(2006)
19. Xu, M., Liang, S.: Input–output networks offer new insights of economic structure.
Phys. A: Stat. Mech. Appl. 527, 121,178 (2019)