9 Data Analysis
9 Data Analysis
Chapter 9
Data Analysis
Using Scenarios, Goal Seek, Solver, others
Copyright
This document is Copyright © 2010-2012 by its contributors as listed below. You may distribute it
and/or modify it under the terms of either the GNU General Public License
(http://www.gnu.org/licenses/gpl.html), version 3 or later, or the Creative Commons Attribution
License (http://creativecommons.org/licenses/by/3.0/), version 3.0 or later.
All trademarks within this guide belong to their legitimate owners.
Contributors
Barbara Duprey
Jean Hollis Weber
John A Smith
Feedback
Please direct any comments or suggestions about this document to:
[email protected]
Acknowledgments
This chapter is based on Chapter 9 of the OpenOffice.org 3.3 Calc Guide. The contributors to that
chapter are:
Jean Hollis Weber Nikita Telang
James Andrew Claire Wood
Data Analysis 3
Introduction
Once you are familiar with functions and formulas, the next step is to learn how to use Calc's
automated processes to quickly perform useful analysis of your data.
Calc includes several tools to help you manipulate the information in your spreadsheets, ranging
from features for copying and reusing data, to creating subtotals automatically, to varying
information to help you find the answers you need. These tools are divided between the Tools and
Data menus.
If you are a newcomer to spreadsheets, these tools can be overwhelming at first. However, they
become simpler if you remember that they all depend on input from either a cell or a range of cells
that contain the data with which you are working.
You can always enter the cells or range manually, but in many cases it is easier to select the cells
with the mouse. Click the Shrink/Maximize icon beside a field to temporarily reduce the size of the
tool’s window, so you can see the spreadsheet underneath and select the cells required.
Sometimes, you may have to experiment to find out which data goes into which field, but then you
can set a selection of options, many of which can be ignored in any given case. Just keep the
basic purpose of each tool in mind, and you should have little trouble with Calc’s function tools.
You don’t need to learn them, especially if your spreadsheet use is simple, but as your
manipulation of data becomes more sophisticated, they can save time in making calculations,
especially as you start to deal with hypothetical situations. Just as importantly, they can allow you
to preserve your work and to share it with other people—or yourself at a later session.
One function tool not mentioned here is Pivot Table, but it is a topic that is sufficiently complex that
it requires a separate chapter: see Chapter 8 in this book.
Consolidating data
Data > Consolidate provides a way to combine data from two or more ranges of cells into a new
range while running one of several functions (such as Sum or Average) on the data. During
consolidation, the contents of cells from several sheets can be combined into one place. The effect
is that copies of the identified ranges are stacked with their top left corners at the specified result
position, and the selected operation is used in each cell to calculate the result value.
1) Open the document containing the cell ranges to be consolidated.
2) Choose Data > Consolidate to open the Consolidate dialog. Figure 1 shows this dialog
after making the changes described below.
3) The Source data range list contains any existing named ranges (created using Data >
Define Range) so you can quickly select one to consolidate with other areas.
If the source range is not named, click in the field to the right of the drop-down list and
either type a reference for the first source data range or use the mouse to select the range
on the sheet. (You may need to move the Consolidate dialog or click on the Shrink icon to
reach the required cells.)
4) Click Add. The selected range is added to the Consolidation ranges list.
5) Select additional ranges and click Add after each selection.
6) Specify where you want to display the result by selecting a target range from the Copy
results to drop-down list.
If the target range is not named, click in the field next to Copy results to and enter the
reference of the target range or select the range using the mouse or position the cursor in
the top left cell of the target range. Copy results to takes only the first cell of the target
range instead of the entire range as is the case for Source data range.
4 Data Analysis
Figure 1: Defining the data to be consolidated
7) Select a function from the Function list. This specifies how the values of the consolidation
ranges will be calculated. The default setting is Sum, which adds the corresponding cell
values of the Source data range and gives the result in the target range.
Most of the available functions are statistical (such as Average, Min, Max, Stdev), and the
tool is most useful when you are working with the same data over and over.
8) At this point you can click More in the Consolidate dialog to access the following additional
settings:
• In the Options section, select Link to source data to insert the formulas that generate
the results into the target range, rather than the actual results. If you link the data, any
values modified in the source range are automatically updated in the target range.
Caution The corresponding cell references in the target range are inserted in consecutive
rows, which are automatically ordered and then hidden from view. Only the final
result, based on the selected function, is displayed.
• In the Consolidate by section, select either Row labels or Column labels if the cells of
the source data range are not to be consolidated corresponding to the identical
position of the cell in the range, but instead according to a matching row label or
column label. To consolidate by row labels or column labels, the label must be
contained in the selected source ranges. The text in the labels must be identical, so
that rows or columns can be accurately matched. If the row or column label of one
source data range does not match any that exist in other source data ranges, it is
added to the target range as a new row or column.
9) Click OK to consolidate the ranges.
If you are continually working with the same range, then you probably want to use Data > Define
Range to give it a name.
The consolidation ranges and target range are saved as part of the document. If you later open a
document in which consolidation has been defined, this data is still available.
Consolidating data 5
Creating subtotals
Subtotals are implemented in two ways:
• The SUBTOTAL function
• Data > Subtotals from the menu bar.
Select the location for the subtotal to be displayed by clicking in the chosen cell. Select Insert >
Function from the Menu bar, or press Ctrl+F2. Select SUBTOTAL from the function list in the
Function Wizard dialog. Enter the required information into the two input boxes as shown in Figure
5. The range is selected from the filtered data, and the function is selected from the list of available
possible functions as shown in the Help file extract of Figure 6. In our example we select the sales
figures (B5:B23) and we require the sum total (function index 9).
Click OK to return the summed values of Brigitte's sales (Figure 4).
6 Data Analysis
Figure 4: SUBTOTAL result for Brigitte's sales
Creating subtotals 7
You will appreciate this is a tedious and time consuming exercise for a sales report if you want to
subtotal for more than a couple of categories.
A partial view of the results using our example data is shown in Figure 8. Subtotals for Sales by
Employee and Category were used.
8 Data Analysis
Figure 8: Subtotals are calculated for each employee (partial view)
using the 1st Group and 2nd Group
Calc inserts, to the left of the row numbering labels, an outline area that graphically represents the
structure of the subtotals. Number 1 represents the highest level of grouping, the Grand Total.
Numbers 2 to 4 show reducing grouping levels, with number 4 showing individual entries. The
number of levels is dependent on the number of groupings in the subtotals.
Clicking on a number at the top of the column, shrinks the structure of that element of the subtotal.
For column 1, this changes the minus button in the column to one with a plus symbol, indicating
that it is expandable. For column 2 and others with content, each element of the column shrinks,
and each button changes to a plus. For our example subtotal displayed in Figure 8, the structure
which is displayed is Column 1 is the Grand Total, column 2 is the Employee subtotal, and column
3 is the Category subtotal.
For column 2, and for others if you have more groups, you can also click each individual minus
button to shrink only that subtotal. If you click on the numbered button at the top, you must then
click on the resultant plus buttons to expand the structure again (see Figure 9). Shrinking any
element, temporarily hides any element contained in a column to its right. In Figure 9 Individual
entries are hidden by shrinking the Category subtotals for Brigitte.
To turn off outlines, select Data > Group and Outline > Remove from the Menu bar. Select
AutoOutline to reinstate the outlines.
Creating subtotals 9
Figure 9: Click the plus buttons to expand the elements again
Further choices are available in the Options page of the Subtotals dialog as follows.
In the Groups section:
• Selecting Page break between groups inserts a new page after each group of subtotaled
data.
• Selecting Case sensitive recalculates subtotals when you change the case of a data label.
• Selecting the Pre-sort area according to groups option sorts the area that you selected in
the Group by box of the Group tabs according to the columns that you selected.
In the Sort section:
• Selecting Ascending or Descending, sorts beginning with the lowest or the highest value.
You can define the sort rules on Data > Sort > Options.
• Selecting Include formats option gives consideration to the formatting attributes when
sorting.
• Selecting Custom sort order sorts according to one of the predefined custom sorts defined
in Tools > Options > LibreOffice Calc > Sort Lists.
10 Data Analysis
Using “what if” scenarios
The Scenario is a tool to test “what-if” questions. Each scenario is named, and can be edited and
formatted separately. When you print the spreadsheet, only the contents of the currently active
scenario are printed.
A scenario is essentially a saved set of cell values for your calculations. You can easily switch
between these sets using the Navigator or a drop-down list which can be shown beside the
changing cells. For example, if you wanted to calculate the effect of different interest rates on an
investment, you could add a scenario for each interest rate, and quickly view the results. Formulas
that rely on the values changed by your scenario are updated when the scenario is opened. If all
your sources of income used scenarios, you could efficiently build a complex model of your
possible income.
Creating scenarios
Tools > Scenarios opens a dialog with options for creating a scenario.
To create a new scenario:
1) Select the cells that contain the values that will change between scenarios. To select
multiple ranges, hold down the Ctrl key as you click. You must select at least two cells.
2) Choose Tools > Scenarios.
3) On the Create Scenario dialog (Figure 11), enter a name for the new scenario. It’s best to
use a name that clearly identifies the scenario, not the default name as shown in the
illustration. This name is displayed in the Navigator and in the title bar of the border around
the scenario on the sheet itself.
Settings
The lower portion of the Create Scenario dialog contains several options. The default settings (as
shown in Figure 11) are likely to be suitable in most situations.
Display border
Places a border around the range of cells that your scenario alters. To choose the color of the
border, use the field to the right of this option. The border has a title bar displaying the name of
the active scenario. Click the arrow button to the right of the scenario name to open a drop-
down list of all the scenarios that have been defined for the cells within the border. You can
choose any of the scenarios from this list at any time.
Copy back
Copies any changes you make to the values of scenario cells back into the active scenario. If
you do not select this option, the saved scenario values are never changed when you make
changes. The actual behavior of the Copy back setting depends on the cell protection, the
sheet protection, and the Prevent changes setting (see Table 1 on page 13).
If you are viewing a scenario which has Copy back enabled and then create a new
scenario by changing the values and selecting Tools > Scenarios, you also
Caution inadvertently overwrite the values in the first scenario.
This is easily avoided if you leave the current values alone, create a new scenario
with Copy back enabled, and then change the values only when you are viewing the
new scenario.
Prevent changes
Prevents changes to a scenario enabled as a Copy back, when the sheet is protected but the
cells are not. Also prevents changes to the settings described in this section while the sheet is
protected. A fuller explanation of the effect this option has in different situations is given below.
Changing scenarios
Scenarios have two aspects that can be altered independently:
• Scenario properties (the settings described above)
• Scenario cell values (the entries within the scenario border)
The extent to which either of these aspects can be changed is dependent upon both the existing
properties of the scenario and the current protection state of the sheet and cells.
12 Data Analysis
If the sheet is protected, and Prevent changes is not selected, then all scenario properties can be
changed except Prevent changes and Copy entire sheet, which are disabled.
If the sheet is not protected, then Prevent changes does not have any effect, and all scenario
properties can be changed.
14 Data Analysis
You can also make formula arrays easier to work with if you apply some simple design logic. Place
the original and the formula array close together on the same sheet, and use labels for the rows
and columns in both. These small exercises in organizational design make working with the
formula array much less painful, particularly when you are correcting mistakes or adjusting results.
Before you choose the Data > Multiple Operations option, be sure to select not only
Note your list of alternative values but also the adjacent cells into which the results should
be placed.
In the Formulas field of the Multiple Operations dialog, enter the cell reference to the formula that
you wish to use.
The arrangement of your alternative values dictates how you should complete the rest of the
dialog. If you have listed them in a single column, you should complete the field for Column input
cell. If they are along a single row, complete the Row input cell field. You may also use both in
more advanced cases. Both single and double-variable versions are explained below.
The above can be explained best by examples. Cell references correspond to those in the
following figures.
Let’s say you produce toys that you sell for $10 each (cell B1). Each toy costs $2 to make (cell B2),
in addition to which you have fixed costs of $10,000 per year (cell B3). How much profit will you
make in a year if you sell a particular number of toys?
16 Data Analysis
Calculating with several formulas simultaneously
1) In the sheet from the previous example, delete the contents of column E.
2) Enter the following formula in C5: =B5/B4. You are now calculating the annual profit per
item sold.
3) Select the range D2:F11, thus three columns.
4) Choose Data > Multiple Operations.
5) With the cursor in the Formulas field of the Multiple operations dialog, select cells B5 and
C5.
6) Set the cursor in the Column input cell field and click cell B4. Figure 15 shows the
worksheet and the Multiple operations dialog.
Beware of entering the cell reference of a variable into the wrong field. The Row input
Caution
cell field should contain not the cell reference of the variable which changes down the
rows of your results table, but that of the variable whose alternative values have been
entered along a single row.
3) With the cursor in the Formulas field of the Multiple operations dialog, click cell B5 (profit).
4) Set the cursor in the Row input cell field and click cell B1. This means that B1, the selling
price, is the horizontally entered variable (with the values 8, 10, 15 and 20).
5) Set the cursor in the Column input cell field and click cell B4. This means that B4, the
quantity, is the vertically entered variable.
6) Click OK. The profits for the different selling prices are now shown in the range E2:H11
(See Figure 18).
18 Data Analysis
Figure 18: Results of multiple operations calculations
20 Data Analysis
Solver example
Let’s say you have $10,000 that you want to invest in two mutual funds for one year. Fund X is a
low risk fund with 8% interest rate and Fund Y is a higher risk fund with 12% interest rate. How
much money should be invested in each fund to earn a total interest of $1000?
To find the answer using Solver:
1) Enter labels and data:
• Row labels: Fund X, Fund Y, and total, in cells A2 thru A4.
• Column labels: interest earned, amount invested, interest rate, and
time period, in cells B1 thru E1.
• Interest rates: 8 and 12, in cells D2 and D3.
• Time period: 1, in cells E2 and E3.
• Total amount invested: 10000, in cell C4.
• Enter an arbitrary value (0 or leave blank) in cell C2 as amount invested in Fund X.
2) Enter formulas:
• In cell C3, enter the formula C4–C2 (total amount – amount invested in Fund X) as the
amount invested in Fund Y.
• In cells B2 and B3, enter the formula for calculating the interest earned (see Figure
21).
• In cell B4, enter the formula B2+B3 as the total interest earned.
3) Choose Tools > Solver. The Solver dialog (Figure 22) opens.
4) Click in the Target cell field. In the sheet, click in the cell that contains the target value. In
this example it is cell B4 containing total interest value.
5) Select Value of and enter 1000 in the field next to it. In this example, the target cell value is
1000 because your target is a total interest earned of $1000. Select Maximum or Minimum
if the target cell value needs to be one of those extremes.
6) Click in the By changing cells field and click on cell C2 in the sheet. In this example, you
need to find the amount invested in Fund X (cell C2).
7) Enter limiting conditions for the variables by selecting the Cell reference, Operator and
Value fields. In this example, the amount invested in Fund X (cell C2) should not be greater
than the total amount available (cell C4) and should not be less than 0.
8) Click OK. A dialog appears informing you that the Solving successfully finished. Click Keep
Result to enter the result in the cell with the variable value. The result is shown in Figure
23.
22 Data Analysis