See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/322420869
STATA User Guide
Method · January 2018
DOI: 10.13140/RG.2.2.31715.04645
CITATIONS READS
0 3,434
1 author:
Dalila Nicet - Chenaf
University of Bordeaux
103 PUBLICATIONS 89 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
internationalization and performance View project
Economic geography View project
All content following this page was uploaded by Dalila Nicet - Chenaf on 12 January 2018.
The user has requested enhancement of the downloaded file.
STATA User Guide
Dalila Chenaf-Nicet - University of Bordeaux - LAREFI
January 2018
Abstract
The purpose of this guide is to facilitate the first manipulations under STATA. As a general rule, these
manipulations are illustrated using screenshots from STATA 14 but these illustrations remain valid for
the other versions of STATA.
With the help of this document you will be able to learn how to download a database from an Excel
format or to create variables and data directly in STATA, modify them, label them, and so on.
Chapter 1 Getting Started with the Software
By opening the software you will discover the interface below. It can be slightly different depending
on the versions (there is a slight evolution from STATA 8 to STATA 14 via STATA 10 and STATA
12). However the interface still consists of 5 windows and a menu bar at the top of the screenshot.
The menu bar contains the tabs: Open, File, Edit, etc., as in most processing software. Data
management specific tabs: Data. Finally, there are tabs specific to statistical and econometric analysis:
Graphic, Statistics
1.1 The «Command» Window
The first window at the very bottom of the image capture is the "Command" window: you can
write in this window the code lines that you want to execute.
Under STATA, there are always two possibilities:
- either you know the code to execute in order to obtain a result and you write it directly
in this window.
- or you do not know the code and you use the menu bar (File, Edit, Graphics, Statistics,
etc.) from which you will be able to obtain the same result via a dropdown menu
system.
For example, if you want to create a variable "Y" as double of the variable "X", it is possible :
First, write in the window command the line code
generate Y = 2*X (then enter to obtain the program execution)
“generate” is the code word for creating new variables (you can write generate or only gen).
A new variable Y will then be created.
Either you click in the Menu in the tab: Data -> Create Data -> A dialog box opens in
which, by letting you be guided by the software, you indicate that you want to create a
variable "Y" from the duplicate variable "X". You will get the new variable "Y" in the same
way as before. In this latter case, STATA indicates in the "central window" (window 3) and in
the "Review - Command window” (window 2), the code that was finally used to run this
program. In these two windows, the code line appears:
generate Y = 2*X
This is very handy because if for the first use you don't know a command, using the path
proposed by the menu bar, it is provided to you as well as its syntax and you will be able to
memorize it again a next time by going directly through the "Command" window and no
longer by the Menu.
1.2 Window 2 "Review - Command".
This is the window to the left of the screenshot. It allows visualizing the commands that have
been executed (the text of the code line whose execution you have requested).
If the command line, after you have requested for its execution, appears in black, the code is
perfectly written. If the line appears in red in this window it is because there is an error in the
command line. In the central window you have in this case a sentence explaining where your
error comes from: “variable not found”, “variable name badly written”, “inappropriate code”,
etc.
The command lines appear in this window all the time during a working session. This is very
practical because, if after executing a command line at a certain time, you need to execute this
line again, just double-click on the command line in the "Review - Command" window,
which will then reappear directly in the "Command" window without having to rewrite the
code.
1.3 Window 3 "Central" window
This is the largest window of the interface. This window is generally black in the older
versions of STATA. This is the "output" window, or in other words the window where the
"results" of your orders appear.
The results tables can be copied and pasted for use in Word files. There are several ways to
retrieve the results. You can select the table in the central window with the mouse and copy it
(by right-clicking) as follows: a table, an HTML table, an image. Then you can paste the
tables, or results into a Word file.
We will see later on that it is easier to retrieve all the results in a word file thanks to what we
call the "Log". You can find the “log” in the menu bar, under the tab “Data”, with the symbol
of a sheet of paper: We will come back to this point when describing the menu bar.
In this window 3 you can also read the path that you use when you want to download a data
file. But once again, we will return to this in the section entitled: downloading a data file in
Excel format.
1.4 Window 4 "Variables"
In this window appears simply all the variables that are used in the database and that were
created or imported. You can therefore read the name of the variable "x" and its label (e. g.
exports from France in euros). We can read in this window two words:
- name: x
- Label: exports from France in euros
When you write a code line, always use the name of the variable and not the label. For
example, if you want to use the log of "x", and you want create a new variable "y" which will
be the log of the variable "x", the code to write in the command bar will then be:
gen y = log(x) and not gen y =log(exports from France in euros)
We thus create a new variable "y" which is the log of "x".
What's practical is that when you have long variable names (Example:" exports from France
in euros ") you don't have to write the name of the variable in the command window every
time you need it in a command line but just by clicking on the variable name in the
"variables" window, its name appears in the command line without having to write it.
When you create new variables they appear in this window at the end of the list.If you delete a
variable it disappears from the list.
Note that to create a new variable you can use:
generate y = log (x)
or simply,
gen y = log (x)
The two syntaxes are equivalent: for some keywords. STATA admits abbreviations (we will
see this in the section below).
To eliminate a "z" variable that is no longer needed in the database, the program is simple.
Just write in the command bar:
drop z
You can also change the order of variables in the window. If, for example, the variable "w" is
often used in a program but what is at the end of the list in the "Variables" window and you
want it to be at the top of the list, just type the following code in the command window:
order w, before (z)
You replace the variable w before the variable “z”.
But you can also write :
order w, after (a)
The variable w is replaced after variable “a”.
order w, first
In this latter case you place variable w first in the list.
But instead of writing all the previous command lines, you can go to the menu and follow the
path: Data -> Data Utilities -> Reorder variables, and let yourself be guided by the dialog box.
You get the same result. In this case, the corresponding commands appear in the "Central
window" and in the "Command - Review window”. The results of the different manipulations
on the variables always appear in the "Variables" window.
1.5 The menu bar
It is composed of two lines. In the first line you will find the following tabs :
“File” open the files, save them, print them, etc.),
“Edit” to copy, paste, search, etc.),
“Data”, (to create data, modify it, reorder it, etc.)
“Graphics”, (to make all possible graphs such as histograms, regression curves, time
evolutions of variables, etc.),
“Statistics "(do simple or elaborate statistics, simple and very elaborate econometrics),
“User”, (not really useful but allows you to go back on the data, graphs and results,
“Window ", (which allows you to manage the layout of the different windows),
“Help”, (which is the help on all STATA functionalities).
In the second line of the menu bar are icons that are shortcuts to the main actions that the user
frequently mobilizes.
For example to open a new file you can follow the path "File" -> "Open" in the menu bar
using the tabs of the 1st line. But you can also directly click on their icon of the 2nd line
These shortcuts exist for the following cases:
- open a file, save it, print it,
- open a log file (this point will be specified later),
- call for help,
- open a "do-file" file (this point will be specified later),
- open the data editor,
- open the data explorer,
- open the variable manager,
- there is also the cross to stop a running program.
- the symbol for a down bridged arrow (only in STATA 14) permits to customize the
toolbar.
We will come back on the log, do-file, help and data editors in the following paragraphs
STATA).
1.5.1. The “help”
It is very useful when looking for a particular command line. It is also very useful for syntax
if you know how to decrypt it correctly. First of all, be careful to type everything in English.
For example, if you search how to make a regression with the variables y (dependent
variable) and x, z (independent variables), you have to write in the help the word :
“regression” (be careful without the accent).
It then appears a list of all the places where the word “regression” appears as a command:
When you click on the blue keywords corresponding to the search, the appropriate help on the
keyword is obtained.
For example, if you click on “regress” in the help (picture above), you get the syntax of this
command and the different options of the regression program. For example for regression the
syntax is:
regress depvar [indepvars] [if] [in] [weight] [, options]
In this syntax you have the list of possible options (for example for a regression without
constant you can use the "noconstant" option). Before the options there is a comma that
separates them from the main program.
How to read this sentence ? “regress depvar [indepvars] [if] [in] [weight] [, options]
The command word for the regression is "regress" then it is specified the name of the
dependent variable (for example "y" is depvar), then it is specified the list of independent
variables just separated by a blank (indepvars)... Do not put square brackets when writing a
command. Brackets appear in the help just to separate the different steps of the syntax.
This regression can then be requested only under certain conditions: for example if the values
of x are positive (if is then used). This can, for example, give the following program:
regress y x if x>0, noconstant
STATA therefore regresses the variable y with respect to the variables x and z, only for
positive values of x and the regression is without the constant. Note that before “if”, “in” and
“weight”, there are no check marks or commas. The command should be written as simply as
possible, just with a space between the terms.
In the help at the end of each theme (regression, post-estimate, panel regression...) you have
examples of syntax that often shed light on how to write the command.
Underlined words such as “regress” can be written in full or just the part of the word
underlined. For example, it is equivalent to write :
regress y x if x>0, noconstant
Or, simply
reg y x if x>0, noconstant
In the menu bar if you click on "help", you have several options: search for a command; a
word, a new command..... But by clicking on “Content” you will get all the topics covered by
the help listed by category. The "Graph" category is very useful because the graphs under
STATA are very "pro" but not necessarily easy to handle.
Regularly new commands can be downloaded in STATA (under R we would talk about
Package). To download them the procedure is simple: for example, you want to download the
"Cusum" time series test which is not automatically included in all the different versions of
STATA. Just type in the help: Cusum. The answer given to you is that the test for binary
variables exists but not for time series. However, this test can be downloaded and is called
cusum6. Clicking on the proposed "cusum6" link opens a dialog window explaining the test
and asking if you want to install the module (click here to install).
By clicking in "click here to install", the new command is immediately installed as well as the
help that is related to it (notably the examples of syntax on this new command).
1.5.2. The data editors
They are the shortcuts : Data Editor (Edit), Data Editor (Browse) and Variable
Manager :
The Data Editor (Browse), symbolized with the small magnifying glass, makes it possible to
"navigate" in the database but it offers few possibilities of interventions on the data. However,
you can sort or filter the data in this editor. It is less useful than the Data Editor (Edit) which
is represented by the small pencil.
In the Data Editor Edit you can directly write data or copy/paste a database from a file in an
Excel format.
Let's take both cases:
Enter data manually.
Open the Data Editor (Edit) by clicking on the tab. Then appears the next window, with a
main window where the variables will be created in column and a smaller window where they
are listed:
In the main window you can directly write in column your data. You thus create a new
variable. If these variables are not given a name, STATA will automatically name them var1,
var2, etc.
If no variables exist or are created then the "variables" window on the right of the image
capture indicates: There is no items to show. When variables are created then the list of
variables appears.
For example if you write the numbers 5 and 7 in the first column of this editor you obtain :
When the data editor is then closed, the created variables appear immediately in the list of
window 4 without having to save or save the new data (Variables) -.
In this editor you can type data, delete them and create new variables, but you can also type a
program directly in the command window and get the same result.
For example if you type in the command bar:
set obs 1
generate var1 = 5 in 1
set obs 2
replace var1 = 7 in 2
A variable (var1) will also have been created with the number 5 in the first line and the
number 7 in the second line. However, this procedure can only be proposed for the occasional
manipulation of data. In general, we want to create a large number of variables in large
databases. For example, you want to be able to import a large amount of data from an Excel
file.
You can directly paste an entire database in Excel format into this "Data editor" with a simple
copy/paste from Excel to STATA but you must take precautions beforehand, especially for
digital data. First of all in Excel, the numbers are comma digits and under Stata they must
have dots: the number 2,5 will be considered as text by STATA. While 2.5 will be read like a
number.
You must open your Excel file where you have the data and make "search" for the commas
and "replace" them with dots. Once the manipulation is done, it is possible to copy the data
from the Excel file into the editor which will ask if the first line of the file should be
considered as the "variable name line". Question to be answered naturally yes if it is the case.
If you close the editor, all variables created appear in window 4.
This may not always work. In some cases, variables appear in red instead of black in the data
editor. In fact, this means that STATA does not recognize by the numerical character of the
variable it considers a text variable. The problem is that at the level of one of the lines of the
variable there is a sign that STATA does not recognize as a number and therefore considers
the whole variable (all column) as text. This can happen if, for example, "NA" is written in
one of the lines of the variable to indicate that the numerical data is not available at this level.
In this case, either the problem is immediately identified by removing the "NA" in Excel to
leave the box empty and the manipulation of the copied/paste is repeated. But sometimes it is
difficult to find the problem in a large database.
There is therefore another method that consists in transforming its Excel file into a format that
will be read by STATA. Then you have to go through the procedure called "INSHEET"
whose steps are as follows:
(1) you save its initial excel file (whose commas have been replaced by dots) in the
format "text (separator - tabulation). You save this file on your computer's desktop (this is
easier to find the path of the file afterwards). For example, the file is saved with the name:
workbook1. txt. Here it is necessary to be very careful because the name of the file must not
contain a blank otherwise STATA will not recognize the file.
(2) you open the STATA software and write in the command bar the code that allows
STATA to find the path of the file named workbook1. txt. It is necessary to type as a formula
"insheet using" then the path as in the following example:
insheet using C:\users\ destok\classeur1.txt.
(3) At the end of this manipulation your file is downloaded and STATA confirms it to
you by indicating the number of variables and observations imported in the "central window"
as well as in the "variable" window.
If you didn't know the path to use you can use the little trick that consists in going through the
menu by doing File, Open, all file -> and ask Stata to open the file in question which is on
your desktop. However, STATA will give you an error message: (R (610) not stata format.
But it doesn't matter because at the same time it will give you the correct syntax of the path to
use to find your file that is on your computer desktop. By pasting the syntax of the path into
the command bar you will be able to use it in the "insheet using..." procedure.
Once the imported data has been imported, it is necessary to verify in the editor that they are
correctly imported. In all cases, text data such as the name of countries, companies, etc.
remain in red. All digital data must be in black.
The "variable manager" tab allows you to label variables or change their names. When
you click on the tab the following dialog box opens and by following the instructions you can
easily change the name and label of the variables.
BECAREFUL : from STATA version 14 onwards you can directly import data in excel
format without any special manipulation by simply using in the menu bar, FILE-> import data
-> in excel format (Excel Spreadsheet).
1.5.3. The LOG
In the Menu bar, this is the tab represented by a sheet of paper torn on the side .
This is a text file, which allows you to print results and store commands and results during a
STATA work session.
When you click in the shortcut of the LOG, you create a text file that automatically records all
the statistical and econometric results obtained as well as the results tables. The log must be
saved as soon as it is created and it is then possible to reopen it during another working
session on the same data file. The Log will continue to record the new results obtained. You
can suspend the log recording work for a while and then resume recording. It is enough to be
guided by the dialog box which always offers 3 possibilities: View the contents of the Log file
(the file is in. smcl format. It is a text format); close it for a later use; suspend it for a later re-
use.
The Log is very practical to store all the results and to keep records of its work.
1.5.4 . The Do-File
A Do-file file is also a file in text format but of a particular kind, since if you write the
command of a program in this file, the program can be executed from the file without having
to type the program in the command window (this corresponds to the "script" under R).
Thus, the Do-file file is a text file in which STATA commands are included as a program.
This file keeps track of STATA's commands.
When you click on the icon , a file "untittled. do" opens (this file can then be given the
name you want). You can write all the codes you want to execute. Each line of code will
receive a command line number. By clicking Run (if you want to test the validation of the
program lines you have selected) or Do (to execute all program lines in full), the commands
are executed in the order of writing.
Generally speaking, when working on a new database, it is useful to always open a Do-file in
which you write all the command lines you want to execute. You can also write text in this
file (if you want to remember some procedural elements) that will not be considered as a
command if you place a star (*) in front of it. In the latter case the text becomes green and the
Do-file understands that it is not a command and will not consider this text in its execution.
For example, in a Do-file it is possible to write :
line 1 : summarize x y
* for statistics on variables x and y, line 1 gives the mean of the variance, min and max
on variable x and variable y
line 2 : twoway (scatter var1 var2)
* to make a two-dimensional graph that gives us the scatter plot between x and y
line 3 : regress y x if x>0, noconstant
* to make a regression of the variable y with respect to x without using a constant.
By clicking on Do-execute (run), STATA executes the 3 command lines and will display all
the results in the central window (and in the Log if it was created). The Do-file created in this
way can be reused in all future working sessions without the need to rewrite all commands.
View publication stats