Stata Class Notes: Exploring Data http://www.ats.ucla.edu/stat/stata/notes/exploring12.
htm
Help the Stat Consulting Group by
stat > stata > notes > exploring12.htm
Stata Class Notes
Exploring Data
1.0 Stata commands in this unit
cd Change directory
use Load dataset into memory
describe Describe a dataset
list List the contents of a dataset
codebook Detailed contents of a dataset
labelbook Information on value labels
log Create a log file
lookfor Find variables in large dataset
summarize Descriptive statistics
tabstat Table of descriptive statistics
table Create a table of statistics
stem Stem-and-leaf plot
graph High resolution graphs
kdensity Kernel density plot
sort Sort observations in a dataset
histogram Histogram for continuous and categorical variables
tabulate One- and two-way frequency tables
correlate Correlations
pwcorr Pairwise correlations
view Display file in viewer window
2.0 Demonstration and explanation
We will begin by loading hs0.dta, a dataset saved in Stata's format that we encountered in the Entering Data unit. Stata data files end with the dta
extension. Stata data files are loaded into memory using the use command. Only one dataset can be loaded at a time.
use http://www.ats.ucla.edu/stat/data/hs0, clear
Before we start our statistical exploration we will look at the data using the describe, codebook, lookfor, labelbook
and list commands. Note that the variable prgtype is a string variable.
describe
codebook
lookfor s
labelbook
list
list gender-read in 1/20
Next, we will open a log file which will save all of the commands and the output (except for graphs) in a text file. We use
the text option so that the log can be read in any text editor, such as NotePad or WordPad.
log using unit2.txt, text replace
1 of 3 4/27/2015 5:12 PM
Stata Class Notes: Exploring Data http://www.ats.ucla.edu/stat/stata/notes/exploring12.htm
The basic descriptive statistics command in Stata is summarize. Along with summarize, we also show the tabstat and
table commands for displaying descriptive statistics within groups.
summarize
summarize read math science write
display 9.48^2 /* note: variance is the sd (9.48) squared */
summarize write, detail
sum write if read>=60 /* note: sum is abbreviation of summarize */
sum write if prgtype=="academic"
sum write in 1/40
tabstat read write math, by(prgtype) stat(n mean sd)
tabstat write, by(prgtype) stat(n mean sd p25 p50 p75)
Next, let's use some graphics commands to look at our data. We will begin with stem which generates an ASCII stem-and-leaf plot. We will also use the
graph command with the histogram (histogram) and box (boxplot) options. We also show the kdensity command which produces a smoothed density
plot.
stem write
stem write, lines(2)
histogram write, normal
histogram write, normal start(30) width(5)
kdensity write, normal
kdensity write, normal width(5) /* a smoother kdensity plot */
kdensity math, normal
graph box write
graph box write, over(prgtype)
The tabulate command can produce one-way or two-way frequency tables. The tab1 command is a convenience
command to produce multiple one-way frequency tables. The histogram command is used to display histograms for
categorical variables.
histogram ses
histogram ses, discrete
tabulate ses
tab write /* note: tab is abbreviation of tabulate */
tab1 gender schtyp prgtype
Two-way crosstabulation.
tab prgtype ses
Two-way crosstabulation with row and column percents.
tab prgtype ses, row col
There are two commands to create correlation matrices, correlate which uses listwise deletion of missing data and
pwcorr which uses pairwise deletion. The general purpose graph command produces scatter plots using the twoway
option and an scatterplot matrix using the matrix option. The jitter option is used to spread apart identical observations.
correlate write read science
pwcorr write read science, obs
scatter write read
scatter write read, jitter(2)
graph matrix read science write, half
We have completed all of the analyses in this unit, so it is time to close the log file.
2 of 3 4/27/2015 5:12 PM
Stata Class Notes: Exploring Data http://www.ats.ucla.edu/stat/stata/notes/exploring12.htm
log close
Now, let's see what is in our log file.
view unit2.txt
3.0 For more information
Statistics with Stata 12
Chapters 3 and 5
Gentle Introduction to Stata, Revised Third Edition
Chapter 5
Data Analysis Using Stata, Third Edition
Chapter 7
An Introduction to Stata for Health Researchers, Third Edition
Chapter 11
Stata Learning Modules
Descriptive information and statistics
Using if with Stata commands
How to cite this page Report an error on this page or leave a comment
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.
IDRE RESEA RCH TECHNOLOGY
GROUP
High Performance Computing GIS Statistical Computing
Hoffman2 Cluster Mapshare Classes
High Perform ance
Hoffman2 Account Application Visualization Conferences
Com puting
Hoffman2 Usage Statistics 3D Modeling Reading Materials
UC Grid Portal Technology Sandbox IDRE Listserv
Statistical Com puting
UCLA Grid Portal Tech Sandbox Access IDRE Resources
Shared Cluster & Storage Data Centers Social Sciences Data Archive
GIS and Visualization About IDRE
ABOUT CONTACT NEWS EVENTS OUR EXPERTS
© 2015 UC Regents Terms of Use & Privacy Policy
3 of 3 4/27/2015 5:12 PM