Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: generating observations in data set


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: generating observations in data set
Date   Thu, 7 Mar 2013 12:15:39 +0000

"Missing" is naturally a treacherous word here: although you carefully
said "missing observations" that is all too likely to be read as
"observations with missing values".

If something might (should) be in the dataset, but is not, I prefer to
say "omitted" but my chances of convincing the world on this point are
tiny.

However, terminology is not the point here.

-fillin- is your friend, e.g.

fillin yydx dis age
replace grp_count = 0 if grp_count == .

See -help fillin- as usual and if so desired

SJ-5-1  dm0011  . . . . . . . . . . . . . .  Stata tip 17: Filling in the gaps
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/05   SJ 5(1):135--136                                 (no commands)
        tips for using fillin to fill in gaps in a rectangular
        data structure

which is accessible via

http://www.stata-journal.com/sjpdf.html?articlenum=dm0011

Nick

On Thu, Mar 7, 2013 at 11:23 AM, Tim Evans <[email protected]> wrote:

> I am trying to calculate age standardised incidence rates using -distrate- which is a user written package (accessible by -ssc install distrate-) in Stata 11.2, but need help in order to identify where I have missing levels of data in my dataset.
>
> I have 5 year age groups and am looking at type 1 and type 2 disease. For type 1 disease I have observations in every age group from 0-4 and 85+, but in type 2 disease there is an absence of observations in 0-4 and 10-14 age group. What I would like to do is evaluate whether there are any 'missing' observations and insert a row for that age group and set the number of observations to 0 - this may happen many times in my data as I have multiple years of data. My data look like this:
>
> dis     yydx    age_grp count
> 1       2003    0-4     321
> 1       2003    5-9     266
> 1       2003    10-14   201
> 1       2003    15-19   167
> 1       2003    20-24 150
> 2       2003    5-9     266
> 2       2003    15-19   167
> 2       2003    20-24 100
>
> I would like to be able to change it to this:
>
> dis     yydx    age_grp count
> 1       2003    0-4     321
> 1       2003    5-9     266
> 1       2003    10-14   201
> 1       2003    15-19   167
> 1       2003    20-24 150
> 2       2003    0-4     0
> 2       2003    5-9     266
> 2       2003  10-14 0
> 2       2003    15-19   167
> 2       2003    20-24 100
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index