Data frames: multiple datasets in memory

Order

Watch video demo

<- See Stata's other features

Highlights

Multiple datasets in memory simultaneously
Each dataset is stored in a frame
Frames are easy to use interactively
Link frames
Frames are fully programmable, in both ado and Mata
Access data in frames from Java and Python

Datasets in memory are stored in frames, and frames are named. When Stata launches, it creates a frame named default, but there is nothing special about it, and the name has no special or secret meaning. You can rename it.

You can create frames, and delete them, and rename them. The commands are

. frame create framename
. frame drop framename
. frame rename oldname newname

Stata will list the names of all the existing frames if you type

. frames dir

One of the frame names that frames dir lists will be the current frame. It is the frame that Stata commands assume that you want them to use. To find out the name of the current frame, type

. frame
  (current frame is default)

We are in the frame default. If we fit a regression, it would be fit on the data in default. Or we could change to another frame. We might type

. frame change myframe

Now if we fit a regression, it would be fit on the data in myframe.

So that is one way of working with frames. You can frame change, issue the Stata commands, and then frame change back.

Another way of working with frames is

. frame framename {
        stata_command
        stata_command
        .
        .
  }

and

. frame framename: one_stata_command

These commands run the Stata commands on the specified frame, and switch back to the original frame once they are finished.

And the final way to work with frames is to link them. If a frame is linked to another, it can access the other frame's data without changing them. We will demonstrate that below.

Let's see it work

Here are five ways you can use frames.

Example 1: Multitask

You are working to finish your project when the phone rings. Something has to be handled right now. Here is what you do:

. frame create interruption

. frame change interruption

. use another_dataset

. do what needs doing

. frame change default

. frame drop interruption

Example 2: Use frames to perform tasks integral to your work

You want to predict the income of men as if they were women and of women as if they were men. Frames provides yet another way you can do this. We are about to

run a regression,
change the data so that men are recorded as women and women as men,
obtain predicted income on the changed data,
and all the while not change the data.

Frames is how we will avoid changing the data.

. regress income i.sex##(i.ed c.age##c.age) i.occ

. frame copy default new

. frame new {
        replace sex = !sex       // reverse the sexes
        predict pincome
  }

. generate alt_income = _frval(new, pincome, _n)

. frame drop new

generate copied values from frame new by using the _frval() function . The argument _n specified that observation 1 in new be copied to 1 in default, 2 in new to 2 in default, and so on.

Example 3: Work with separate but related datasets simultaneously

You have two files, persons.dta and counties.dta, that are related. The persons live in the counties. You can load the datasets into separate frames and link them.

. use persons

. frame create counties

. frame counties: use counties

. frlink m:1 countyid, frame(counties)

frlink links observations in the current frame to corresponding observations in the other frame. Variable countyid in persons.dta records the county in which each person lives. A variable of the same name in counties.dta records the county on which additional data are provided. The data were linked on countyid.

Assume counties contains a variable med_income containing each county's median income. Then you could type

. fralias add med_income, from(counties)

. regress income med_income educ age

The first command adds an alias variable named med_income in the current frame. This alias variable can be used to access the values of the underlying variable med_income in the counties frame. The values are accessed dynamically in the following regress command — you do not have to copy them into the current frame! This new variable behaves like a copy of the original variable, like what you would get from frget, but does not actually copy the data. There are lots of issues in referencing obeservations in a linked frame like this, but they are handled automatically. For example, some individuals might live in counties not recorded in counties. Others might live in the same county. And there may be counties in which no one in persons.dta lives. All of that is handled. Alias variables can result in significant memory savings. In the above example, you don't have to merge repeated values of med_income from the counties frame onto your persons data. Instead, those values are accessed on-the-fly through the alias variable, med_income.

Example 4: Record results in another frame

You can use one frame to record results from another. The frame create command, which we have used before, can also create new frames containing new variables. For instance,

. frame create newframename stat1 stat2

creates a new frame containing zero observations on variables named stat1 and stat2.

Another frame command,

. frame post framename (expression) (expression) ...

will add observations to an existing frame, filling in the variables with the values of the expressions.

Thus, we can use frame create to create a new frame ready to receive new observations, and we can use frame post to send the new observations we want to add. Here is an example of how we can put frame create and frame post to use.

How often will a sample of 100 draws from N(0,1) have a mean different from 0 at the 5% level? Let's do 1,000 simulations.

. frame create results t p 

. forvalues i=1(1)1000 {
  2.         quietly set obs 100
  3.         quietly generate x = rnormal()
  4.         quietly ttest x=0
  5.         frame post results (r(t)) (r(p))
  6.         drop _all
  7. }

. frame results: count if p<=0.05
  43

How often will draws from N(0,1) produce coefficients with |t|>2 in a regression? Let's do 1,000 simulations:

. sysuse auto
(1978 Automobile Data)

. frame create results b se

. forvalues i=1(1)1000 {
  2.         quietly generate x = rnormal()
  3.         quietly regress  mpg  x weight displ
  4.         frame post results (_b[x]) (_se[x])
  5.         drop x
  6. }

. frame results: count if abs(b/se) > 2
  54

Recording simulation results is one way you can use frame create and frame post. Here's another. We recently had a dataset with 2,000-plus variables in it, and we wanted to get its names organized and standardized. We started by creating a dataset of the variable names:

. frame create varnames str32 varname

. foreach name of varlist _all {
  2.         frame post varnames ("`name'")
  3. }

Now we had a dataset in frame varnames with 2,000-plus observations of variable varname. We looked at the dataset, sorted it, performed other shrewd transformations on it, and finally knew what we wanted to do. We started like this:

. frame change varnames
. rename varname oldname
. generate str32 newname = ""

Then, we copied some old names over to newname. We filled others in by hand. We even filled some of them in with programs we wrote. Finally, we reached the point where we had a new name for each original name.

Then, we used frames to change the names in the original data:

. frame change varnames

. local N = _N

. forvalues i=1(1)`N' {
  2.         local old = oldname[`i']
  3.         local new = newname[`i']
  4.         frame default: rename `old' `new'
  5. }

Then, we put the names in the order we had them in our dataset:

. local names = ""

. forvalues i=1(1)`N' {
  2.         local names = "`names' " + newname[`i']
  3. }

. frame default: order `names'

Example 5: Use frames to make your work easier

Another frame feature is frame put for copying a subset of data from one frame to another.

. frame put varlist if expression, into(framename)

Here is how you might use it.

You have hundreds of variables in your dataset. Right now, you want to look at only a few of them:

. frame put city population med_income, into(subset)
. frame change subset
. stata_command
. stata_command
. frame change default
. frame drop subset

You have these data for most cities and countries of the world. You want to analyze the data for Germany:

. frame put city population med_income if country=="Germany", into(subset)
. frame change subset
. stata_command
. stata_command
. frame change default

We once had country data and wanted to perform country_analysis.do for each country separately, starting with Afghanistan and ending with Zimbabwe. We did the following and produced Afghanistan.log, Albania.log, Algeria.log, ... Zimbabwe.log.

. egen c = group(country)

. quietly summarize c

. local N_of_countries = r(max)

. forvalues i=1(1)`N_of_countries' {
  2.         frame put if c==`i', into(subset)
  3.         frame subset {
  4.                 local cntryname = country[1]
  5.                 log using "`cntryname'.log"
  6.                 do country_analysis
  7.                 log close
  8.         }
  9.         frame drop subset
 10. }

Example 6: Make code run faster

We said we would show you five ways to use frames, and yet here we are on number 6. We do not count this one because you do not have to do anything to experience the benefit.

Do- and ado-files that you have written that use preserve and restore will run faster if you use Stata/MP because it secretly uses frames in place of temporary files to preserve data. The speed-up is sometimes remarkable. We have do- and ado-files that run 20 percent faster.

Tell me more

Learn more about Stata's data manipulation features.

Read more about frames in the Data Management Reference Manual; see [D] frames intro.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies