An Introduction to Stata Programming
Author: 
Christopher F. Baum 
Publisher: 
Stata Press 
Copyright: 
2009 
ISBN13: 
9781597180450 
Pages: 
362; paperback 
Price: 
$54.00 



Comment from the Stata technical group
Christopher F. Baum’s An Introduction to Stata Programming is
worthwhile for anyone wanting to learn about programming in Stata. For the
beginner, Baum assumes only that the user is familiar with Stata, so he
builds up accordingly. For the more advanced Stata programmer, the book
introduces Stata’s Mata programming language and provides optimization
tips for daytoday work. All readers will find better, new ways to approach
old tasks.
Baum steps the reader through the three levels of Stata programming. First
up are dofiles. Though often thought of as simple batch files, dofiles
support both loops and conditional execution; hence, they can be used for
automation as well as reproducibility. While giving examples of dofile
programming, Baum introduces useful but oftenoverlooked Stata
constructions.
Next come adofiles, which are used to extend Stata by creating new commands
that share the syntax and behavior of official commands. Baum gives an
example of how to write a simple additional command for Stata, complete with
documentation and certification. After writing the simple command, users can
then learn how to write their own custom estimation commands by using both
Stata’s builtin numerical maximumlikelihood estimation routine,
ml, and its builtin nonlinear leastsquares routines,
nl and nlsur.
Finishing up the book are two chapters on programming in Mata, which is
Stata’s matrix programming language. Mata programs are integrated into
adofiles to build a custom estimation routine that is optimized for speed
and numerical stability. While stepping through these structures, Baum
weaves in the details that are needed to become an expert at Stata
programming, so readers will also learn more about Stata itself while
learning the tools for programming.
Baum approaches each topic by first explaining the background and need for
the topic, then looking at the basic usage and examples, and finally
examining use within larger, more applied “cookbook” examples.
Many of his examples come from questions posed on the Statalist listserver,
so they address complexities of interest to a broad range of Stata users.
The programming examples cover an array of topics, illustrate some of
Stata’s builtin tools (such as the resampling techniques of
bootstrapping and jackknifing), and offer solutions to tricky data
management questions.
The breadth and depth of this book make it a necessity for anyone
interested in programming in Stata.
Table of contents
List of tables
List of figures
Acknowledgments
Notation and typography
1 Why should you become a Stata programmer?
Dofile programming
Adofile programming
Mata programming for adofiles
1.1 Plan of the book
1.2 Installing the necessary software
2 Some elementary concepts and tools
2.1 Introduction
2.1.1 What you should learn from this chapter
2.2 Navigational and organizational issues
2.2.1 The current working directory and profile.do
2.2.2 Locating important directories: sysdir and adopath
2.2.3 Organization of dofiles, adofiles, and data files
2.3 Editing Stata do and adofiles
2.4 Data types
2.4.1 Storing data efficiently: The compress command
2.4.2 Date and time handling
2.4.3 Timeseries operators
2.5 Handling errors: The capture command
2.6 Protecting the data in memory: The preserve and restore commands
2.7 Getting your data into Stata
2.7.1 Inputting data from ASCII text files and spreadsheets
Handling text files
Free format versus fixed format
The insheet command
Accessing data stored in spreadsheets
Fixedformat data files
2.7.2 Importing data from other package formats
2.8 Guidelines for Stata dofile programming style
2.8.1 Basic guidelines for dofile writers
2.8.2 Enhancing speed and efficiency
2.9 How to seek help for Stata programming
3 Dofile programming: Functions, macros, scalars, and
matrices
3.1 Introduction
3.1.1 What you should learn from this chapter
3.2 Some general programming details
3.2.1 The varlist
3.2.2 The numlist
3.2.3 The if exp and in range qualifiers
3.2.4 Missing data handling
Recoding missing values: The mvdecode and mvencode commands
3.2.5 Stringtonumeric conversion and vice versa
Numerictostring conversion
Working with quoted strings
3.3 Functions for the generate command
3.3.1 Using if exp with indicator variables
3.3.2 The cond() function
3.3.3 Recoding discrete and continuous variables
3.4 Functions for the egen command
Official egen functions
egen functions from the user community
3.5 Computation for bygroups
3.5.1 Observation numbering: _n and _N
3.6 Local macros
3.7 Global macros
3.8 Extended macro functions and macro list functions
3.8.1 System parameters, settings, and constants: creturn
3.9 Scalars
3.10 Matrices
4 Cookbook: Dofile programming I
4.1 Tabulating a logical condition across a set of variables
4.2 Computing summary statistics over groups
4.3 Computing the extreme values of a sequence
4.4 Computing the length of spells
4.5 Summarizing group characteristics over observations
4.6 Using global macros to set up your environment
4.7 List manipulation with extended macro functions
4.8 Using creturn values to document your work
5 Dofile programming: Validation, results, and data
management
5.1 Introduction
5.1.1 What you should learn from this chapter
5.2 Data validation: The assert, count, and duplicates commands
5.3 Reusing computed results: The return and ereturn commands
5.3.1 The ereturn list command
5.4 Storing, saving, and using estimated results
5.4.1 Generating publicationquality tables from stored estimates
5.5 Reorganizing datasets with the reshape command
5.6 Combining datasets
5.7 Combining datasets with the append command
5.8 Combining datasets with the merge command
5.8.1 The dangers of manytomany merges
5.9 Other datamanagement commands
5.9.1 The fillin command
5.9.2 The cross command
5.9.3 The stack command
5.9.4 The separate command
5.9.5 The joinby command
5.9.6 The xpose command
6 Cookbook: Dofile programming II
6.1 Efficiently defining group characteristics and subsets
6.1.1 Using a complicated criterion to select a subset of observations
6.2 Applying reshape repeatedly
6.3 Handling timeseries data effectively
6.4 reshape to perform rowwise computation
6.5 Adding computed statistics to presentationquality tables
6.5.1 Presenting marginal effects rather than coefficients
6.6 Generating timeseries data at a lower frequency
7 Dofile programming: Prefixes, loops, and lists
7.1 Introduction
7.1.1 What you should learn from this chapter
7.2 Prefix commands
7.2.1 The by prefix
7.2.2 The xi prefix
7.2.3 The statsby prefix
7.2.4 The rolling prefix
7.2.5 The simulate and permute prefix
7.2.6 The bootstrap and jackknife prefixes
7.2.7 Other prefix commands
7.3 The forvalues and foreach commands
8 Cookbook: Dofile programming III
8.1 Handling parallel lists
8.2 Calculating movingwindow summary statistics
8.2.1 Producing summary statistics with rolling and merge
8.2.2 Calculating movingwindow correlations
8.3 Computing monthly statistics from daily data
8.4 Requiring at least n observations per panel unit
8.5 Counting the number of distinct values per individual
9 Dofile programming: Other topics
9.1 Introduction
9.1.1 What you should learn from this chapter
9.2 Storing results in Stata matrices
9.3 The post and postfile commands
9.4 Output: The outsheet, outfile, and file commands
9.5 Automating estimation output
9.6 Automating graphics
9.7 Characteristics
10 Cookbook: Dofile programming IV
10.1 Computing firmlevel correlations with multiple indices
10.2 Computing marginal effects for graphical presentation
10.3 Automating the production of LATEX tables
10.4 Tabulating downloads from the Statistical Software Components archive
10.5 Extracting data from graph files’ sersets
10.6 Constructing continuous price and returns series
11 Adofile programming
11.1 Introduction
11.1.1 What you should learn from this chapter
11.2 The structure of a Stata program
11.3 The program statement
11.4 The syntax and return statements
11.5 Implementing program options
11.6 Including a subset of observations
11.7 Generalizing the command to handle multiple variables
11.8 Making commands byable
11.9 Documenting your program
11.10 egen function programs
11.11 Writing an eclass program
11.11.1 Defining subprograms
11.12 Certifying your program
11.13 Programs for ml, nl, nlsur, simulate, bootstrap, and jackknife
Writing an mlbased command
11.13.1 Programs for the nl and nlsur commands
11.13.2 Programs for the simulate, bootstrap, and jackknife prefixes
11.14 Guidelines for Stata adofile programming style
11.14.1 Presentation
11.14.2 Helpful Stata features
11.14.3 Respect for datasets
11.14.4 Speed and efficiency
11.14.5 Reminders
11.14.6 Style in the large
11.14.7 Use the best tools
12 Cookbook: Adofile programming
12.1 Retrieving results from rolling:
12.2 Generalization of egen function pct9010() to support all pairs of quantiles
12.3 Constructing a certification script
12.4 Using the ml command to estimate means and variances
12.4.1 Applying equality constraints in ml estimation
12.5 Applying inequality constraints in ml estimation
12.6 Generating a dataset containing the single longest spell
13 Mata functions for adofile programming
13.1 Mata: First principles
13.1.1 What you should learn from this chapter
13.2 Mata fundamentals
13.2.1 Operators
13.2.2 Relational and logical operators
13.2.3 Subscripts
13.2.4 Populating matrix elements
13.2.5 Mata loop commands
13.2.6 Conditional statements
13.3 Function components
13.3.1 Arguments
13.3.2 Variables
13.3.3 Saved results
13.4 Calling Mata functions
13.5 Mata’s st_ interface functions
13.5.1 Data access
13.5.2 Access to locals, globals, scalars, and matrices
13.5.3 Access to Stata variables’ attributes
13.6 Example: st_ interface function usage
13.7 Example: Matrix operations
13.7.1 Extending the command
13.8 Creating arrays of temporary objects with pointers
13.9 Structures
13.10 Additional Mata features
13.10.1 Macros in Mata functions
13.10.2 Compiling Mata functions
13.10.3 Building and maintaining an object library
13.10.4 A useful collection of Mata routines
14 Cookbook: Mata function programming
14.1 Reversing the rows or columns of a Stata matrix
14.2 Shuffling the elements of a string variable
14.3 Firmlevel correlations with multiple indices with Mata
14.4 Passing a function to a Mata function
14.5 Using subviews in Mata
14.6 Storing and retrieving countrylevel data with Mata structures
14.7 Locating nearest neighbors with Mata
14.8 Computing the seemingly unrelated regression estimator
14.9 A GMMCUE estimator using Mata’s optimize() functions
References