Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: How can I fill in missing values for the month or quarter in this data set?


From   Aaron Kirkman <ak1795mailserv@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: How can I fill in missing values for the month or quarter in this data set?
Date   Wed, 14 Nov 2012 13:47:46 -0600

Dear Statalist,

I have a dataset in this format:

#####
input year str8 state quarter expenditure
1998   Illinois         1       1000
1998   Illinois         3       3000
1998   Illinois         4       4000
1998   Illinois         2       2000

2000   Illinois         1       5000
2000   Illinois         4      11000
2000   Illinois         2       6000
2000   Illinois         3       8000

2001   Illinois         2       1575

2002   Illinois         4       5600

1998   Missouri         3       2100
1998   Missouri         4       2100
1998   Missouri         1       1900
1998   Missouri         2       1900

2000   Missouri         1       8500

2001   Missouri         1       8000
2001   Missouri         2       9800
2001   Missouri         3       7500
2001   Missouri         4       1800

2002   Missouri         3       1400
end
#####

I'd like to fill in the missing quarters for each state/year group, so
that each state/year group has four observations that include quarters
1 - 4. As long as the --expenditure-- value is preserved for the
quarter in which it occurs in the original dataset, it can be missing
for the rest.

I attempted a solution using --expand--, but I'm not sure how to
proceed from here.

#####
bysort state year: egen numquarters = count(expenditure)
gen quartermissing = numquarters < 4
drop numquarters
expand 4 if quartermissing, gen(new)
sort state year
#####

This successfully fills in the missing quarters with the value of the
only existing quarter, and it's trivial to set these values to missing
instead. The variables --quartermissing-- and --new-- indicate if
quarters were missing and if new observations were created, but I'm
stuck as to how I can proceed from there.

If there is a way to sort the observations by group, while ensuring
that the observation where quarter was not missing remained in its
correct position, setting quarter to the observation number by group
would work. Unfortunately, I'm not sure how to go about sorting in
this way, since logically, the sorting will be dealing with either
missing values or identical quarter numbers, which complicates the
situation.

Is there a simple way to accomplish this that I'm missing?

Thank you,
Aaron Kirkman
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index