Data analysis
Report feedback on content
The analysis criterion is allocated six of the twenty-four
available marks and focuses on the following areas:
Communication of the recording and
processing of data.
Considering uncertainties.
Processing of data.
Each of these factors is discussed in more detail below.
Communication of the recording
and processing of data
There should be sufficient and relevant data collected in the
experiment to be able to come to a valid and detailed
conclusion. The expectation is that all data is recorded in the
International System of Units (SI) with correct precision for the
equipment used to measure the dependent and independent
variables and for any relevant controlled variables.
Recording raw data
Raw data contains both qualitative (descriptive) and
quantitative (numerical) data.
Qualitative data is data that is
observed and described such as colour
or emotional orphysical reaction for
each trial. This can be done in a
separate data table or one that is
attached to your quantitative data
table.
Quantitative data is data that is
counted or measured. This could be the
length of a root measured by a metric
ruler, or the number of colonies or
bacteria using an app such as
ColonyCounter, or the number of cells
in various phases of mitosis when
looking under a microscope.
Quantitative data must be recorded
with the correct precision depending on
the measuring device with appropriate
units and uncertainty.
Qualitative Trial 1 Trial 2 Trial 3 Trial 4 Trial 5
Description of conditions Data (units) (units) (units) (units) (units)
Condition 1
Condition 2
Condition 3
Condition 4
Condition 5
Table 1. Example of data collection.
If you have a significant amount of raw data taken from a data
logger for example, you are not required to include all of your
raw data. You can simply show a portion of the data and
summarise the rest.
It is also appropriate to take data collected from a simulation or
database and to display it in the same manner as presented
above. Make sure you have a significant amount of data
collected from these databases to analyse appropriately in the
next section. Be sure to cite your source.
Outliers
In the recording of your data, there might be an inconsistent
measurement or result that does not fit with the rest of the
data recorded – this is considered to be an outlier and is
considered to be a random error. If you have enough time and
materials, you can repeat the affected trial, but you may not
have the opportunity to do so. Outliers should be included in
the data, but are typically removed from calculations with an
explanation.
If you feel there is an outlier, you can use a box-and-whisker
plot calculation to determine if the data set is outside the first
or third quartile by 1.5 times the interquartile range. If the
sample size is too small to calculate it, then it is acceptable to
show the change with and without the inclusion of the
suspected outlier. An explanation to justify omission must be
included and related to weakness in methodology.
Mark band Descriptor Descriptor explained
1–2 The recording and processing of The raw data contains missing
the data is communicated but is information, missing or incorrect units,
neither clear nor precise. and inconsistent precision in the
quantitative data.
3–4 The communication of the The raw data contains missing
recording and processing of the information, missing or incorrect units,
data is either clear or precise. or inconsistent precision in quantitative
data.
5–6 The communication of the The raw data contains all required
recording and processing of the qualitative and quantitative information
data is both clear and precise. correct units, and correct precision in
the quantitative data.
Table 2. Mark allocation for the recording and processing of
data.
Considering uncertainties
Uncertainties
Figure 1. This graduated cylinder shows that it has an
uncertainty of +/− 0.5mL.
All measuring devices contain a small amount of unavoidable
error when taking measurements which must be taken into
consideration when recording quantitative data. The precision
of your measurement must be in agreement with the precision
of the uncertainty. For example, 5.0 cm ± 0.5 cm or 2.00 g ±
0.01 g.
For analogue equipment, such as a
ruler or graduated cylinder, the
unavoidable error is your ability to
correctly estimate the last significant
digit in the measurement. In general,
the absolute uncertainty for analogue
equipment is one half of the smallest
increment.
For digital equipment, such as an
electronic balance or temperature
sensor, the unavoidable error is made
by the device in determining the last
significant digit in the measurement. In
general, the absolute uncertainty for
digital equipment is equal to the
smallest increment.
The propagation of uncertainty is not required in biology.
Mark band Descriptor Descriptor explained
1–2 The recording and processing of There are no uncertainties recorded for
the data shows limited evidence the quantitative data or they are
of the consideration of incorrect.
uncertainties.
3–4 The recording and processing of There are uncertainties recorded for the
the data shows evidence of a quantitative data, but there are some
consideration of uncertainties but missing or incorrect values.
with significant omissions or
inaccuracies.
5–6 The recording and processing of All quantitative data recorded contains
the data shows evidence of an correct uncertainties.
appropriate consideration of
uncertainties.
Table 3. Mark allocation for considering uncertainties.
Processing of data
In this section, the raw data that you collected must be
processed to enable a conclusion to be drawn that addresses
your research question. Processing involves the relevant
calculations required to draw a relationship between your
dependent and independent variables with confidence. You
may use spreadsheets, the GDC, graphing techniques
etc. Table 2 shows some common methods of data processing.
For all mathematical calculations, a sample calculation should
be included for one trial or data set, showing all steps with
correct units and significant figures. The final calculated values
for the rest of the data can be displayed in a table. Care must
be taken to ensure that the calculations properly address the
research question (appropriate), avoid any omissions (missing
steps) or incorrect processing (inaccuracies) without showing
repetitive/repeat calculations.
Hypotheses
When doing a statistical analysis, it is important to include the
null and alternate hypotheses.
Null Hypothesis, H0: The accepted
hypothesis. There is no difference
between the control and experimental
group and any difference is due to
chance.
Alternate Hypothesis, HA: What you are
aiming to demonstrate or support with
evidence. There is a difference
between the control and experimental
group.
When doing statistical testing, you will either fail to reject
(accept) or reject each of the hypotheses.
When given a p-value as a result of statistical testing, it is most
commonly compared to a 95% confidence where p = 0.05.
pexperiment < 0.05
o reject the null hypothesis and fail
to reject the alternate hypothesis.
o statistically significant difference.
pexperiment > 0.05
Example methods Description Calculations
Finding average values from Average of a data set You may use spreadsheets or a
multiple trials and graphic display calculator.
calculations.
Standard deviation Measure the average amount Remember that this is a
of variability in the data set. sample data set when
choosing your calculations.
This can be done in MS Excel.
Example methods Description Calculations
Finding the % change Relative difference between This will help you get more
between initial and final data sets. reliable processed data than
values of the dependent just finding change by
variable. Positive values subtracting initial from final
represent an increase value, e.g., the % change of
over time. mass before and after the
Negative values activity of an enzyme.
represent a decrease
over time
Finding the rate of change Line of best fit and You will need to divide the
of the dependent variable. corresponds to an increase or change or % change per unit of
decrease over time time.
Statistical tests for hypothesis testing
Running a t-test. Note, you This is a comparison of two You may use an online t-test
need a sample larger than 5 samples. calculator with proper citation
for the t-test. or a spreadsheet such as MS
1 sample t-test: Excel. Since these programs
difference between the only give you the p-value, you
sample mean and the do not need to draw a
null hypothesis. probability table because this
Paired t-test: paired is already built into the
observations, such as program. For the same reason
before and after degrees of freedom do not
treatment need to be presented.
2 sample t-test: analyse
the difference between
means of independent
samples
Running an ANOVA This is a comparison of 3 or You may use an online ANOVA
more groups. calculator with proper citation
It is helpful to run a post-HOC or a spreadsheet such as MS
Tukey test which will run Excel. Since these programs
paired t-tests to identify only give you the p-value, you
which groups are different. do not need to draw a
probability table because this
is already built into the
program. For the same reason
degrees of freedom do not
need to be presented.
Calculation of the Measure the closeness of This will help you to evaluate if
correlation coefficient (r). association between two there is positive, negative or
Example methods Description Calculations
variables no correlation. The values are
between –1 and 1. Also, it may
be accompanied by a graph.
Calculation of the Measures the strength of the When you make a graph on MS
coefficient of determination statistical mode and the Excel the R2 equation is
(R ).
2
outcome automatically produced. How
can this value help? It gives an
indication of how close the
data is to a linear relationship.
R2 gets values between 0 and
1. The value 1 shows a perfect
fit between the trendline and
the data, i.e., a strong linear
relationship. The value 0 shows
no statistical relationship
between the line and the data.
o fail to reject the null hypothesis
and reject the alternate
hypothesis.
o not a statistically significant
difference.
Statistical analysis
In this section, the raw data that you collected must be
processed to enable a conclusion to be drawn. You may use
spreadsheets, the GDC, graphing techniques etc. Table
4 shows some common methods of data processing.
Table 4. Common methods of data processing.
Graphing
Ideally, with two quantifiable variables, your independent and
dependent variables should be expressed in graphical form to
show the relationship visually. Graphing processed data is more
appropriate than graphing raw data. For example, it is more
appropriate to graph the mean of a data set rather than
individualised data points. The exception is using a scatter plot
to graph correlation and including a trendline.
The uncertainties and any error propagation methods (such as
standard deviation) should be expressed on the graph, where
appropriate, by using error bars.
Be sure that all graphs contain:
Title
Labelled x- and y-axes with units (the
independent variable is typically
graphed on the x-axis and the
dependent variable on the y-axis).
Regular scaling on the axes.
Correct graph type
o Line graph: changes over time
o Bar graph: comparing groups
o Scatter plot: correlation.
Error bars
o Standard deviation: sample sizes
between 5 and 30
o Standard error: sample sizes
greater than 30
Figure 2. This graph represents the germination (%) in Vigna
unguiculata in the presence of different light treatments. Are all
of the components of a good graph present?
Mark band Descriptor Descriptor explained
1–2 Some processing of data relevant Some calculations/graphs are included,
to addressing the research but there are many mistakes, missing
question is carried out but with steps and/or do not address
major omissions, the research question suitably.
Some calculations/graphs are
included, but there are many
mistakes, missing steps and/or do
not address the inaccuracies or
inconsistencies.
3–4 The processing of data relevant to The calculations/graphs included
addressing the research question contain some mistakes, missing steps
is carried out but some significant and/or do not completely address the
omissions, inaccuracies or research question.
inconsistencies.
5–6 The processing of data relevant to The calculations/graphs included
addressing the research question address the research question suitable
is carried out appropriately and with no mistakes or omissions.
accurately.
Table 5. Mark allocation for the processing of data.
Mark as complete
Continue to next section
Conclusion
Report feedback on content
The conclusion criterion is allocated six of the twenty-four
available marks and focuses on these areas:
Answering the research question by
stating a relationship between the
dependent and independent variables,
supported by the processed data.
Comparing the experiment with an
accepted scientific context.
Interpretation of processed data
Conclusion
Describe the trends seen in the graph (or other visual display of
processed data used).
Does the dependent variable show an
increase or decreasing trend in relation
to the dependent variable? Is there no
observable relationship?
Is the relationship linear, exponential,
logarithmic or inconsistent?
Uncertainties
Identify the impact of uncertainties on your investigation.
Do the error bars overlap, or do they
not overlap? What does this mean for
the confidence of your data? What was
the impact?
Justify your choice in the statistical
analysis.
How do these factors provide
confidence in your conclusion?
Table 1. Mark allocation for interpretation of processed data.
Mark band Descriptor Descriptor explained
1–2 A conclusion is stated that is The conclusion stated is inconsistent
relevant to the research question, with the data supported. For example,
but is not supported by the the processed data shows a positive
analysis presented. correlation between the dependent and
independent variables, but the
conclusion states a negative correlation.
3–4 A conclusion is described that is The conclusion stated is partially
relevant to the research question, consistent with the data supported. For
but is not fully consistent with the example, the relationship between the
analysis presented. dependent and independent variables is
correct, but the uncertainties were not
taken into consideration or the incorrect
mathematical relationship is applied
(processed data is exponential, but
conclusion states linear).
5–6 A conclusion is justified that is The conclusion stated is consistent with
relevant to the research question the data supported, including the
and is fully consistent with the impact of uncertainties.
analysis presented.
You might be concerned that the results are inconclusive so
that means the investigation ‘failed’. Remember this just
means that the results were inconclusive and now your job is to
evaluate if you designed an investigation that appropriately
tested the questions and how you would correct it.
Relevance of the conclusion
The conclusion and interpretation of data must be connected
back to accepted scientific context with reliable resources. This
step requires research to compare your results with a suitable
and reliable source. There are likely similar experiments that
you can cite or data available for comparison. If your
experiment is very unique and a comparable experiment/data
cannot be found, do your best to find the most similar context
you can.
If your conclusion shows results that are not supported by your
background information, it is important to do more research on
accepted scientific context to determine if there was an aspect
that was missed in your research.
Describe how background information
supports the conclusion.
Justify using accepted research.
Describe how published information
supports conclusion.
Justify using information and citations
in background or in using new
citations.
When trying to find accepted scientific context in relation to
your results, a good tip is to include the words journal article in
your query on your internet search. This will pull up links to
appropriate research such as ‘blue lights effect on germination
journal article’.
Table 2. Mark allocation for the relevance of the conclusion.
Mark band Descriptor Descriptor explained
1–2 The conclusion makes There is no comparison to
superficial comparison to scientific context or the comparison is
Table 2. Mark allocation for the relevance of the conclusion.
Mark band Descriptor Descriptor explained
accepted scientific context. not suitable.
3–4 A conclusion is described that The comparison to scientific context is
makes some relevant comparison only partially valid or the resource used
to the accepted scientific context. for comparison is not reputable or
contains incomplete referencing
information.
5–6 A conclusion is justified through The comparison to scientific context is
relevant comparison to accepted with a reputable and complete source.
scientific context.
Mark as complete
Continue to next section
Evaluation
Evaluation
Report feedback on content
The evaluation criterion is allocated six of the twenty-four
available marks and focuses on these areas:
Methodological weaknesses and limitations
Suggesting improvements to the investigation
Methodological weaknesses and
limitations
Methodological weakness
You should include weaknesses involved in your methodology
and how these weaknesses affected the resultant data. These
weaknesses must be specific to your experiment and you must
provide a reasoning for how each weakness/limitation affected
the data collected.
Weaknesses are unavoidable issues that presented themselves
in the methodology – the overall approach to the experiment
– not mistakes that resulted from poor planning or execution.
Things to consider:
Inability/difficulty to control variables.
Low precision of a specific piece of equipment used,
resulting in high uncertainties for some measurements.
The data does not show very much variation for a
confident trend.
An alternative method that was not selected that may
have been a better choice.
Limitations
Limitations address factors affecting the confidence in the data
and the ability to apply your conclusion to more scenarios.
Things to consider:
The availability of time and resources in the laboratory did
not allow for a greater scope of data to be investigated.
Limitations of available data in databases/simulations.
Generic weaknesses and limitations should not be included. All
experiments have factors that impact the results such as
uncertainty in measurements, not enough time or repeats.
Think about how your experiment was specifically affected.
Mark band Descriptor Descriptor explained
1–2 The report states generic The weaknesses/limitations identified
methodological weaknesses or could be applied to any investigation
limitations. without applicable details or reasons.
Mark band Descriptor Descriptor explained
For example, not enough repeats.
3–4 The report describes specific The weaknesses/limitations identified
methodological weaknesses or are suitable to the investigation, but the
limitations. impact on the experiment is missing.
5–6 The report explains the relative The weaknesses/limitations identified
impact of specific methodological are suitable to the investigation and the
weaknesses or limitations. impact on the experiment is valid.
Improvements to the
investigation
Improvements should be relevant to the investigation and
address the specific weaknesses and limitations identified. They
should also be realistic so that they can be carried out in the
allotted time that was given for the individual investigation
using materials and equipment that are commonly found in a
school laboratory.
Suggested improvements and extensions must be related to
the research question and be focused and precise. Also, you
should mention how your suggestion would bring the results of
your experiment closer to your expectations.
Generic improvements should not be included. All experiments
could benefit from more precise equipment, more time and
more repeats. Think about how your experiment could benefit
from a specific set of improvements.
To simplify your writing or to organise your thoughts, you can
make a table.
Weakness/Source of error Impact on data Improvement
Mark band Descriptor Descriptor explained
1–2 Realistic improvements to the The improvements identified could be
investigation are stated. applied to any investigation without
applicable details or reasons. For
example, conduct more repeats.
3–4 Realistic improvements to the The improvements identified are
investigation, that are relevant to suitable for the investigation, but the
the identified weaknesses or impact on the experiment is missing.
limitations, are described.
5–6 Realistic improvements to the The improvements identified are
investigation, that are relevant to suitable to the investigation and the
the identified weaknesses or impact on the experiment is valid
limitations, are explained.
The report
Report feedback on content
Throughout this process, you will document your experiment
and produce a written report for submission. The report will be
assessed by your teacher and counts as 20% of your final IB
grade (the remaining 80% is from your performance on the
external exam).
The maximum overall word count for the report is 3000 words.
The following are not included in the word count:
Charts and diagrams
Data tables
Equations, formulae and calculations
Citations/references (whether parenthetical, numbered,
footnotes or endnotes)
Bibliography
Headers.
The following details should be stated at the start of the report:
Title of the investigation
Candidate’s personal code (alphanumeric, for example,
xyz123)
Candidate’s personal code for all group members (if
applicable)
Number of words.
There is no requirement to include a cover page or a contents
page.
It is also appropriate to maintain an appropriate font (such as
Times New Roman or Arial) no smaller than 11 or larger than 12
point font. Headings for each section make it easier for the
reader to follow and helps you to organise your information. Be
sure to have someone else look over your final report to make
sure that the language and formatting is clear.
Structure and clarity of the report
There is some flexibility in the presentation of your report, but
the overall sequence should be a logical flow for the beginning
to end of your experiment.
Make sure that your report contains:
The research question presented early on as the title of
the report or at the end of the introductory paragraph.
Appropriate scientific language, units, and symbols
throughout the paper.
Correctly labelled graphs and the diagrams
Consistent referencing for citations using endnotes,
footnotes, or a works cited page.
Referencing and bibliography
It is very important that you correctly reference all of the
sources that you use in your investigation. There is no official
referencing method required by the IB, but you should be
consistent with the method that you use in your report. Consult
your teacher on the referencing method used at your school.
Remember that:
Citations/references must be carried even by diagrams
taken from online sources.
You should cross-reference the bibliography at the end of
your report with the footnotes if these are used.
An example report layout is below.
1. Research Design
Background information on the scientific context of
the research question and selection of methodology
Research question
Hypotheses (if applicable)
Variables defined
Methodological considerations
Description of the methodology
2. Data Analysis
Raw data
Processed data
Uncertainties
Graphs
3. Conclusion
Stated relationship between variables, supported by
processed data
Comparison to relevant scientific context
4. Evaluation
Weaknesses and limitations
Suggested improvements
5. Works cited
6. Appendix (if needed – human consent form, large quantity
of raw data).
Internal assessment checklist
Layout
The following details should be stated
at the start of the report.
o Title of the investigation
o Candidate’s personal code
(alphanumeric, for example,
xyz123)
o Candidate’s personal code for all
group members (if applicable)
o Number of words.
Maximum of 3000 words.
o The word count does not include:
charts and diagrams,
data tables,
sketches,
equations, formulas, and
calculations,
graphs,
headings,
references or bibliographies.
Properly formatted
Times New Roman or Arial no
o
smaller than 11 or larger than 12
point font.
o Headings for new sections.
o Data tables on one page so they
do not cross over to 2 pages.
Internal assessment checklist
Research design
Relevant background information that
focuses on your methodology.
A focused research question that
includes both the independent and
dependent variable.
A detailed description of variables
including the independent, dependent,
control, and confounding variables.
A detailed explanation of the decisions
regarding scope and quantity of data
collected.
A methodology written with sufficient
detail that another person could repeat
the experiment.
An explanation of the safety, ethical
and environmental considerations of
your investigation.
Internal assessment checklist
Data analysis
All qualitative and quantitative data is
organised into tables for all variables,
expressed with correct precision and
includes units and uncertainties.
Consideration of the uncertainties of
each piece of apparatus used in your
investigation (if applicable).
Data processing relevant to research
question is complete.
Graphs (if applicable) showing the
relationship between the dependent
and independent variables are suitable
and correctly labelled.
Internal assessment checklist
Conclusion
The conclusion is fully supported by the
processed data.
The conclusion is compared to
accepted scientific context.
Internal assessment checklist
Evaluation
Suitable weaknesses and limitations in
procedure identified and their impact
on the data explained.
Improvements to procedure suggested
are realistic and relevant to the
weaknesses and limitations identified.
Mark as complete
Continue to next section