IA mathematics
Title of exploration: what is the correlation between carbon dioxide
emission and total energy consumption?
Page number: 20 pages
1
Table of Contents
Introduction and rationale..............................................................................................................................3
Personal engagement...............................................................................................................................4
Aim..........................................................................................................................................................4
Methodology..................................................................................................................................................5
Data collection...............................................................................................................................................5
Exploration.....................................................................................................................................................9
Histograms..............................................................................................................................................9
Scatter plot.............................................................................................................................................11
Correlation analysis...............................................................................................................................13
Pearson correlation:...........................................................................................................................14
Linear regression...................................................................................................................................17
Conclusion and evaluation...........................................................................................................................21
Bibliography................................................................................................................................................23
Appendix......................................................................................................................................................24
2
Introduction and rationale
CO2 (carbon dioxide) emission is the release of carbon dioxide gas into the atmosphere,
which can be from different sources, measured in (billion metric tons) which is called a gigaton
(Gt). The emissions are a major contributor to global warming and climate change, as CO 2 is a
greenhouse gas that can trap heat in the earth’s atmosphere. Energy production, with all its types,
accounts for 72% of all CO2 emissions. So, it can be regarded as the main source of carbon
dioxide production. The demand for energy increases annually, which could cause a significant
issue in climate change.
Total energy consumption refers to the total amount of energy used within a time frame in
a specific region, which is measured in (trillion-watt hours) which is also called terawatt-hours
(TWh). The energy consumption of a regions depends on its economical state, population size,
industrialization, and other factors. The largest source of energy in the world is oil, making up to
32% of the world energy use in 2023. Fossil fuels are 81% of the world energy use.
4% 2% 2%
4% 6%
32%
23%
26%
Oil Coal Natural gas
Nuclear Hydro Wind
Solar Biomass and other renewables
Figure 1: percentage of world energy use by each source in pie chart
3
Climate change is the long-term shifts in temperatures and weather patterns. More
frequent and intense droughts, storms, heat waves, rising sea levels, melting of glaciers and
warming are caused by the increase in climate change over the recent years. This has a direct
effect on animals, habitats, communities, agriculture, and therefore, us. This is why climate
change is an important issue that needs to be solved as fast as possible to reduce the extent of the
damage.
Personal engagement
As a citizen of this earth, I am obligated to take action in any way I can. I have witnessed
firsthand the consequences of climate change, which is primarily caused by CO2 emissions, while
visiting Alexandria, where coastal erosion is becoming more severe and sea is intruding further
inland. Not only that, but as an Egyptian citizen I am extremely worried about the Nile’s state
and water flow, which is affecting water distribution and, therefore, agriculture. Moreover,
temperatures in Egypt, especially during the summer, has become unbearable with intense heat
waves, unlike what it used to be like. Investigating a major factor that contributes to CO2
emission, total energy consumption, can help find a solution that can contribute, even if just a
little, to climate change.
Aim
The aim of the research is to explore the correlation between CO2 emissions and total
energy consumption. By using statistical models, I will investigate the correlation between the 2
variables. Understanding the relationship can provide helpful insights into how energy
consumption patterns affect the carbon footprint, which will shed light into a potential cause of
the issue. This will help guide the efforts to reduce emissions by improving energy efficiency.
4
Methodology
The analysis explores the correlation between global carbon dioxide emissions and total
energy consumption using statistical methods. Data for carbon dioxide emissions (in gigaton)
and energy consumption (in terawatt hours) were collected from reliable sources and aligned
chronologically. Descriptive statistics, incorporated in histograms, were calculated to understand
the data distribution. To provide visualization on the data, I will create a scatter plot to assess the
trendline from which I will calculate the equation of the line in order to test the linearity if found.
Pearson’s correlation coefficient will be used to measure the linear relationship between the two
variables and their strength. Spearman’s rank correlation will also be used to further assess
monotonic trends, if the Pearson’s correlation coefficient is moderate or less. However, if
Pearson’s correlation coefficient was strong, a linear regression analysis will be carried out to
quantify the relationship between the two variables and predict future values.
Data collection
The data for this investigation was sourced from historical graphs and datasets, mainly
from “our world in data”1 and “statista”2. These data points were extracted and organized from
the downloadable excel files of emissions trends spanning from 1940 to 2023, as well as
corresponding energy consumption trends. For energy consumption, I will not categorize by
sources, such as fossil fuels and renewables, instead I will use the total energy consumption to
focus on the overall trend. I will only use data points from 2006 to 2023, as the types of energy
sources and their ratios are most similar in these years. Moreover, focusing on recent data would
focus the research on the impact of the recent energy consumption trends on the carbon dioxide
1
Hannah Ritchie and Pablo Rosado (2020) - “Energy Mix” Published online at OurWorldinData.org. Retrieved from:
'https://ourworldindata.org/energy-mix' [Online Resource]
2
Andrew, R. (2024, November). Annual carbon dioxide (CO₂) emissions worldwide from 1940 to 2024. Global
Carbon Budget. https://robbieandrew.github.io/GCB2024/
5
emission; making the analysis relevant to current discussion on climate change and energy
policy.
Figure 2: Total global energy consumption by source in TWh against time in years
It is seen from this graph that energy was almost constant from 1800 to 1900, with little
increase by the end of the century. Then, from 1900 to 1950, there was slight increases in energy
consumption. From 1950 onwards, the energy consumption increased significantly but steadily
with some fluctuations along the years. From 2006, “modern biofuels”, “wind”, “solar”, and
other renewables have made visible stronger contributions to energy consumption than in earlier
years; that is why the data I will analyze will be from 2006 to 2023.
6
Figure 3: Global Carbon dioxide emission in gigaton (Gt) against time in years
Similar to energy consumption graph, the carbon dioxide emissions increased steadily
along the years from 1940 to 2024, with slight fluctuations that may not cause significant effect
on the general trendline.
This table organizes the data from figure 2 and figure 3, matching them according to their
date (year), and organizing them chronologically.
7
Carbon dioxide emission in Energy consumption in TWh Year
Gigaton (Gt)
37.01 183230 2023
36.50 179819 2022
36.20 176840 2021
34.37 168779 2020
36.37 174458 2019
36.00 172629 2018
35.29 168363 2017
34.73 164719 2016
34.72 163146 2015
34.77 162111 2014
34.65 160629 2013
34.38 158186 2012
33.91 156261 2011
32.81 153125 2010
31.02 146699 2009
31.58 148960 2008
31.60 147434 2007
30.18 143482 2006
Table 1: Carbon dioxide emission and global energy consumption organized by year from 2006
to 2023
8
Exploration
Histograms
Histograms are graphical representations of data that display the frequency of data points
within specified intervals, also known as bins. They are a valuable statistical tool used to
summarize the distribution of a dataset, providing insights into its shape, central tendency,
variability, and potential outliers.
In the context of this investigation, histograms help to visualize the distributions of CO2
emissions and energy consumption over time. By analyzing these distributions, we can identify
patterns such as skewness, uniformity, or clustering of values. This information is crucial for
understanding the nature of the data before applying further statistical methods, such as
correlation or regression analysis. They also provide a visual way to detect irregularities or
anomalies in the data, ensuring its quality and reliability.
Figure 4: histogram of carbon dioxide emissions from 2006 to 2023
9
The histogram for CO2 emissions shows a concentration of data in the central bin
(32.88–35.58 gigaton), indicating that emissions for most of the years in the dataset were
relatively stable within this range. The distribution appears symmetrical, with fewer occurrences
in the lower (30.18–32.88) and higher (35.58–38.28) ranges. This suggests that global CO2
emissions have hovered around the central range, with slight deviations. If compared with the
number of years, it will be seen that the deviations mostly occur with time, with the first bin
representing the first 5 years (2006-2010). The only exception is in year (2020) with carbon
dioxide emission of (34.37Gt), positioning it in the central bin. This could be attributed to the
economical curtailment due to covid-19 and the quarantine.
Figure 5: histogram of global energy consumption from 2006 to 2023
The histogram for energy consumption exhibits a similar pattern, with the majority of the
data falling within the central range (159,482–175,482 TWh). The leftmost bin (143,482–
159,482 TWh) has slightly more data points, reflecting lower energy usage during the earlier
10
years of the dataset. The rightmost bin (175,482–191,482 TWh) shows a significant drop in
frequency, indicating that only in recent years has energy consumption reached such high levels.
This could be due to industrial growth or increased energy access globally. When compared with
table 1, it is seen that the increase of energy consumption occurs with time, with the first bin
representing the first 7 years, the central bin representing the next 8 years, and the last bin
representing the last 3 years. Although 2020 is not an exception here, its value is far less than the
value of the year previous and subsequent to it, which can be attributed to the same cause for the
decrease of carbon dioxide emissions in 2020.
These histograms help identify trends in the data, providing a visual summary of how
CO2 emissions and energy consumption have varied over time. Their overall alignment between
the ranges, and their increase over time, and even the drop in 2020 supports a positive correlation
between the two variables. The distribution patterns also hint at gradual increases in energy use
and emissions, which will be explored further through regression and correlation analyses.
Scatter plot
Scatter plots are a valuable tool in statistical analysis for examining relationships between
two variables. By plotting individual data points, scatter plots provide a visual representation of
the distribution, patterns, and potential correlations between variables. In this investigation, the
scatter plot illustrates the relationship between Carbon dioxide emissions in gigaton (y-axis) and
energy consumption in terawatt-hours (x-axis) from 2006 to 2023.
11
39
Carbon dioxide emission (Gt)
37
35
33
31
29
27
140000 145000 150000 155000 160000 165000 170000 175000 180000 185000 190000
Energy consumption (TWh)
Figure 6: Scatter plot, Carbon dioxide emissions against energy consumption
This scatter plot highlights how carbon dioxide emissions and energy consumption vary
together. The upward trend visible in the graph’s trendline suggests a positive association
between the two variables, where higher energy consumption is generally linked to increased
CO2 emissions. Observing the distribution of data points, the trend appears to be predominantly
linear, with CO2 emissions increasing as energy consumption rises. This linear pattern suggests
that as global energy demand grows, CO2 emissions follow a proportional increase, reflecting
the consistent reliance on fossil fuels and other carbon-intensive energy sources. The plot doesn’t
show any significant outliers from the trendline.
The plot does not visibly exhibit characteristics of a quadratic or exponential trend. A
quadratic relationship would show a parabolic shape, either curving upwards or downwards,
while an exponential trend would display a rapid and accelerating increase. In this case, the
relatively steady rate of increase and the uniform spacing of points align more with a linear
relationship.
12
However, this scatter plot serves as a precursor to deeper statistical analysis. By applying
correlation coefficients and linear regression model, the precise nature of this relationship can be
assessed, validating whether a linear model is indeed the best representation. If not, then other
models will be explored.
Correlation analysis
Correlation analysis is a statistical method used to evaluate the strength and direction of a
relationship between two variables. In the context of this investigation, it helps determine the
degree to which CO2 emissions and energy consumption are related. So, we can understand
whether increases in energy consumption are consistently accompanied by increases in CO2
emissions; quantifying the strength and direction of the relationship. It provides a numerical
measure, the correlation coefficient, which ranges between -1 and 1. A positive coefficient
indicates a direct relationship, where an increase in one variable corresponds to an increase in the
other, while a negative coefficient implies an inverse relationship.
Strength of Prefect Very strong Strong
relationship:
Values of r r =±1 0. 95 ≤ r <1 0. 8 7 ≤ r< 0. 95
−1<r ≤−0.95 −0. 95< r ≤−0. 87
Moderate Weak Very weak No correlation
0. 7 ≤ r <0. 87 0 .5 ≤ r <0. 7 0< r <0.5 r =0
−0. 87< r ≤−0.7 −0. 7<r ≤−0.5 −0.5< r <0
Table 2: strength of correlation and values of r, (Haese, 2019)
13
For this investigation, correlation analysis is crucial in exploring the research question:
"What is the correlation between carbon dioxide emissions and total energy consumption?" By
applying both Pearson’s correlation (for linear relationships) and Spearman’s rank correlation
(for monotonic relationships), we can comprehensively assess the connection between the two
variables, setting the stage for deeper insights through regression analysis.
Pearson correlation:
I will use Pearson’s correlation coefficient to determine the strength of the relationship
between the 2 variables, using the following formula.
∑ (x i− x̄)( y i − ȳ)
r=
√ ∑ ( x − x̄ ) ∑( y − ȳ)
i
2
i
2
Equation 1: Pearson's correlation coefficient formula (Patil,2023)
Defining key terms:
r : The correlation coefficient between the two variables (carbon dioxide emissions and
global energy consumption)
x i: Each year’s global energy consumption
x̄ : Mean of all the global energy consumption
y i: Each year’s carbon dioxide emission
y ̄ : Mean of all Carbon dioxide emission
14
x i and y i are already stated in in table 1. In order to calculate x̄ and ȳ , we will use the following
formula:
∑x
Mean ( x̄)=
n
Equation 2: mean formula
Where n is the number years the data tackles. The equation is substituted by y to get ȳ . The
calculation of both means is shown in table 3:
Mean of the 183230+179819+176840+169779+174458+ …+143482
x̄= x̄=¿162715
18
global
TWh
energy
consumptio
n ( x̄ ¿
Mean of 37.01+36.50+36.20+34.37 +36.37+…+ 30.18
ȳ= ȳ=34.2272> ¿
18
Carbon
dioxide
emission (
ȳ ¿
Table 3: calculating the mean values for global energy consumption and carbon dioxide
emission
To achieve accuracy, all intermediate results will be rounded to 6 significant figures,
except value of r , it will only be 2 S.F. Table 4 will show, step by step, values being substituted
in Pearson’s correlation coefficient formula. Some calculations will only be provided a sample
15
explanation; to avoid repetitiveness and space, the rest of the values will be shown in the
appendix. [1]
Calculation Explaining calculation Result
x i− x̄ Sample for year 2023: (183230−162715) = 20515.0
y i− ȳ Sample for year 2023: (37.01−34.2272 ¿ = 2.78280
(x ¿¿ i− x̄)( y i− ȳ)¿ Sample for year 2023: (20515 ×2.7828) =57089.1
∑ (xi − x̄)( y i− ȳ) 57089.1 ×38873.6 ×27865.5 × … ×77840.2 = 394157
Sum of all values
obtained from
calculations above
2
( x i− x̄ ) Sample for year 2023: (20515)2 =420455000
Squared energy
consumption values
∑ ( x i− x̄ )
2
420455000 × 292547000 ×… ×369908000 =2429970000
Sum of all the squared
energy consumption
values
2
( y i− ȳ) Sample for year 2023: (2.78280)2 =7.74398
Squared carbon dioxide
emission values
∑ ( y i− ȳ )
2
7.74398 ×5.16634 × … ×16.3801 =69.3630
Sum of all squared
carbon dioxide emission
values
∑ ( x i− x̄ ) ( y i− ȳ ) 394157 r =0.96
√∑ ( x − x̄ ) ∑ ( y − ȳ )
i
2
i
2 √(2429970000)×(69.3630)
Calculating r value,
using the above
calculations
Table 4: Calculations of r, by applying Pearson's correlation coefficient formula stepwise
16
As seen in table 4, r =0.96. This indicates a very strong positive linear relationship
between the two variables, according to table 2. This means that as energy consumption
increases, CO₂ emissions tend to increase as well, almost perfectly. This suggests that as one
variable increases, the other also increases in a nearly predictable manner; the two variables tend
to change together at a consistent rate. This high value implies that the data points closely follow
a straight line, making the relationship highly reliable. It also suggests that knowing the value of
one variable provides a good estimate for the other. However, while the relationship is strong, it’s
important to remember that correlation does not imply causation—it simply indicates a strong
association.
Since the Pearson’s correlation coefficient is very high, and the trendline in the scatter
diagram suggests a clear linear pattern, I will skip spearman’s correlation as it is primarily use
for monotonic relationships and not necessarily linear relationships. There is no evidence, till
now, suggesting a relationship other than linear. So, I will apply Linear regression to quantify the
relationships between the two variables.
Linear regression
Linear regression is a statistical method used to model used to model the relationship
between two variables, by fitting a straight line into the data called line of best fit. In this
investigation, carbon dioxide emissions is the dependent variable while the global energy
consumption is the independent variable. Using linear regression, we can establish a predictive
relationship between them, which will help quantify how changes in energy consumption
impacts carbon dioxide emissions.
17
The regression equation is as follows:
y=mx+b
Equation 3: linear regression equation
Defining key terms:
y : the dependent variable, carbon dioxide emissions
m : slope, which is the rate of change
x : the independent variable, global energy consumption
b : y -intercept, which represents carbon dioxide emissions when energy consumption is
zero.
To determine line of best fit, we need to calculate slope and y -intercept. To calculate the
slope (m ) use this equation:
∑ ( x i− x̄ ) ( y i − ȳ)
m=
∑ ¿¿
Equation 4: formula to determine slope
Where:
x i is data points for global energy consumption
y iis data points for carbon dioxide emissions
x̄ is mean of global energy consumption
ȳ is mean of carbon dioxide emissions
18
∑ ¿ was calculated in Pearson’s correlation coefficient, and is equal to 2429970000.
∑ (xi − x̄)( y i− ȳ) was also calculated, and is equal to 394157. Therefore, m is equal to:
394157 −4
m= =1.6 ×10
2429970000
Equation 5: substituting values to get the slope
To get y -intercept, we use the following formula:
∑ y−m ∑ x
b=
n
Equation 6: equation of b (y-intercept)
Where:
n is the number of data points
∑ y sum of all y values (carbon dioxide emission)
∑ x sum of all x values (global energy consumption)
Value of each variable is shown through the following table:
Variable Value
n 18, as there are 18 data point corresponding to the 18 years calculated (from 2006
to 2023)
∑y ∑ y=37.01+ 36.50+36.20+34.37+ …+30.18=616.0 9
∑x ∑ x=183230+179 819+ 176840+…+143482=293764 0
m −4
1.6 ×10 , as seen from the previous calculations
Table 5: showing value of the y-intercept equation
19
Now, these values will be substituted into the equation:
−4
616.09−(1.6 ×10 × 293764 0)
b= =8.1
18
Equation 7: substituting values into y-intercept equation
According to these calculations, the equation of best-fit line is:
−4
y=1.6 × 10 x +8.1
Equation 8: equation for best-fit line
y -intercept (b ) does not have a real-world meaning or application, since energy
consumption will never be zero. On the other hand, the slope (m ) represents how much carbon
dioxide emissions increase per additional unit of energy consumption. With every 1 TWh, carbon
dioxide emissions increase by 1.6 ×10−4 Gt. Although the change is very small, we must take into
account that a Gigaton is actually a 1 ×109 ton, so the change in 1 TWh increases Carbon dioxide
emissions by 1 .6 ×105 metric tons. Moreover, changes in energy consumption are big, from 2006
to 2023 energy consumption increased by 39748 TWh. This represents an increase by 27.7%.
Energy consumption is rising fast, it is predicted to grow close to 4% annually through 2027.
This would mean an energy consumption of 206198 TWh, and a predicted carbon dioxide
emissions of 42.40 Gt. This increase in carbon dioxide emission will cause catastrophic effects
on the atmosphere and climate change. (calculations are shown in the appendix) [2]
However, statistics show that the peak of carbon dioxide emissions will be reached in
2025 with 39 Gt, and it will steadily decline and drop to 37.4 Gt by 2030. 21.09 Gt isn’t
20
predicted to be reached despite the increase in energy consumption. This is because a linear
relationship doesn’t indicate causation, and linearity may not continue. The model assumes that
the same technology will be used, making it unreliable and inaccurate to a great extent. The
adoption of green and renewable sources of energy is rising rapidly, this would decrease the
carbon dioxide emissions coming from energy consumption, which would make the expected
carbon dioxide emissions an overestimation. It would also count as a warning, that if we didn’t
change the methods of energy production to greener and more renewable ones, the consequences
on earth would be dire. Even though is has its limitations, its strength lies in predicting worst
case scenarios if no changes occurred, acting as a powerful warning.
Conclusion and evaluation
In conclusion, this exploration demonstrated that there is a very strong positive linear
correlation between global energy consumption and carbon dioxide emissions. However, due to
the changing technology, it doesn’t make accurate predictions, as it assumes that the technology
used won’t change to greener methods.
A key strength is the use of multiple statistical methods combined with visual
representations. The histograms used provided insight into the distribution of the variables over
time, showing similar distributions between the two variables. The scatter plot visually
represented the trend, and indicated a positive linear relationship. The Pearson correlation
coefficient confirmed a very strong linear relationship. Applying linear regression quantified the
relationship between the two variables. The dataset used reflects actual global trends, not
21
regional, making the findings applicable to discussions on energy policy and climate change
mitigation.
However, the correlation does not imply causation. Other factors, such as energy policies,
economic activities, energy source and energy efficiency, also influence carbon dioxide
emissions. Different energy sources have different impacts on carbon emissions. Is this
exploration was to be redone, I would explore the effect of fossil fuels as an energy source on
carbon dioxide emissions, focusing only on the most carbon-intensive energy source.
Considering that the aim of the investigation is to understand how energy consumption
patterns affect carbon dioxide emission, the exploration is mostly appropriate. It highlighted the
critical role of energy consumption in influencing carbon dioxide emissions. The strong positive
correlation indicates that energy consumption must be carefully managed to mitigate
environmental impact. As the global demand continues to rise, a shift towards cleaner energy
sources would significantly reduce emissions. This was illustrated in the linear regression
analysis, when the predicted carbon emissions in 2027 was less than the model used predicted,
due to its ignorance to changes in energy sources. This exploration demonstrated the importance
in addressing energy sources while aiming for sustainability. Addressing the energy-emission
link is not just a scientific concern but a public responsibility.
22
Bibliography
Tiseo, I. (2024, August 21). Global fossil fuel carbon dioxide emissions projections by
sector. Statista. https://www.statista.com/statistics/1385434/fossil-carbon-dioxide-
emissions-projections-by-sector/#:~:text=Global%20fossil%20fuel%20carbon
%20dioxide,to%2037.4%20GtCO%E2%82%82%20by%202030
International Energy Agency. (2025, February 14). Growth in global electricity demand
is set to accelerate in the coming years as power-hungry sectors expand. IEA.
https://www.iea.org/news/growth-in-global-electricity-demand-is-set-to-accelerate-in-the-
coming-years-as-power-hungry-sectors-expand
Varsity Tutors. (n.d.). Line of best fit. Retrieved January 15, 2025, from
https://www.varsitytutors.com/hotmath/hotmath_help/topics/line-of-best-fit
Center for Climate and Energy Solutions. (2021). International emissions. Retrieved
from https://www.c2es.org/content/international-emissions/
Hannah Ritchie and Pablo Rosado (2020) - “Energy Mix” Published online at
OurWorldinData.org. Retrieved from: 'https://ourworldindata.org/energy-mix' [Online
Resource]
International Energy Agency. (2023). Energy mix. Retrieved from
https://www.iea.org/world/energy-mix
Stanford University. (2024). Current energy landscape. Retrieved from
https://understand-energy.stanford.edu/current-energy-landscape
23
Patil, A. (2023). Pearson's Correlation Coefficient - A Beginners Guide. Analytics
Vidhya. Retrieved December 2, 2024, from
https://www.analyticsvidhya.com/blog/2021/01/beginners-guide-to-pearsons-correlation-
coefficient/
Haese, M., Humphries, M., Sangwin, C., & Vo, N. (2019). Mathematics: Applications
and Interpretation SL (1st ed.). Haese & Harris Publications.
Appendix
[1]
I organized the data using excel.
24
[2]
Rate of change (slope) from gigaton to ton is calculated though the equation: m ×109,
substituting the value of the slope (1.6 ×10−4 ×10 9=1.6 ×105 )
Changes in energy consumption from 2006 to 2023, through the equation:
maximum ( 2023 )−minimum ( 2006 )=183230−143482=39748 TWh
change 39748
Percentage increase: ¿ ×100 %= ×100 %=27.7 %
initial value (for 2006) 143482
25
Predicted energy consumption of 2027 with 4% annual increase till then, follows
geometric sequence equation: un =u1 ¿ . n is number of years which is 5 (from 2023 to end
of 2027) and r is the common ratio which is 1.04 (1+0.04), and u1 is value of 2023
(183230). Substituting these values we get:
u5=183230 ¿
Predicted carbon dioxide emissions for 2027, we will use the equation of linear line
produced in linear regression analysis ( y=1.6 × 10−4 x +8.1) and substitute x with the one
we got from the previous point. y=1.6 × 10−4 ( 214353 )+ 8.1=42.40
26