Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How can I fill in missing values for the month or quarter in this data set?


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How can I fill in missing values for the month or quarter in this data set?
Date   Wed, 14 Nov 2012 20:04:39 +0000

-fillin- is the command of choice here.

. fillin state year quarter

That said, the larger issue is why you want to do this. -fillin-
without any extras will just extend your dataset and add missings on
-expenditure-, but necessarily no extra information.

As elsewhere I've posted on interpolation, it may seem odd that I ask
this, but what will you gain thereby?

Nick

On Wed, Nov 14, 2012 at 7:47 PM, Aaron Kirkman <ak1795mailserv@gmail.com> wrote:

> I have a dataset in this format:
>
> #####
> input year str8 state quarter expenditure
> 1998   Illinois         1       1000
> 1998   Illinois         3       3000
> 1998   Illinois         4       4000
> 1998   Illinois         2       2000
>
> 2000   Illinois         1       5000
> 2000   Illinois         4      11000
> 2000   Illinois         2       6000
> 2000   Illinois         3       8000
>
> 2001   Illinois         2       1575
>
> 2002   Illinois         4       5600
>
> 1998   Missouri         3       2100
> 1998   Missouri         4       2100
> 1998   Missouri         1       1900
> 1998   Missouri         2       1900
>
> 2000   Missouri         1       8500
>
> 2001   Missouri         1       8000
> 2001   Missouri         2       9800
> 2001   Missouri         3       7500
> 2001   Missouri         4       1800
>
> 2002   Missouri         3       1400
> end
> #####
>
> I'd like to fill in the missing quarters for each state/year group, so
> that each state/year group has four observations that include quarters
> 1 - 4. As long as the --expenditure-- value is preserved for the
> quarter in which it occurs in the original dataset, it can be missing
> for the rest.
>
> I attempted a solution using --expand--, but I'm not sure how to
> proceed from here.
>
> #####
> bysort state year: egen numquarters = count(expenditure)
> gen quartermissing = numquarters < 4
> drop numquarters
> expand 4 if quartermissing, gen(new)
> sort state year
> #####
>
> This successfully fills in the missing quarters with the value of the
> only existing quarter, and it's trivial to set these values to missing
> instead. The variables --quartermissing-- and --new-- indicate if
> quarters were missing and if new observations were created, but I'm
> stuck as to how I can proceed from there.
>
> If there is a way to sort the observations by group, while ensuring
> that the observation where quarter was not missing remained in its
> correct position, setting quarter to the observation number by group
> would work. Unfortunately, I'm not sure how to go about sorting in
> this way, since logically, the sorting will be dealing with either
> missing values or identical quarter numbers, which complicates the
> situation.
>
> Is there a simple way to accomplish this that I'm missing?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index