Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: re: st: Imputation vs. Reducing Dataset?


From   John Simpson <john.simpson@ualberta.ca>
To   statalist@hsphsun2.harvard.edu
Subject   Re: re: st: Imputation vs. Reducing Dataset?
Date   Mon, 13 Jul 2009 17:53:28 -0600

Hi David,

That's a good point. I'm not sure if I can use a censored or truncated model though because I only know some of the Xs and maybe that's not enough.

There are two categories of Xs. The ones that I know are the static environmental variables (things like how much it costs to live into the next round or to breed). The ones I don't know are the dynamic environmental variables such as the distributions of the various behaviours that the agents can display. These change over time and so are also Ys when considered on their own.

Ideally I'd like to include build a model (or set of models) that can account for both population size and the distribution of behaviours across the population even though these are not observed after the population reached 15000 members.

Is it possible to get away with using a censored or truncated model in this case without biasing the model towards the non-censored cases? My worry is that by censoring I'll lose as much as 30% of my panels/ observations.

-John



From: David Airey <david.airey@Vanderbilt.Edu>
Subject: re: st: Imputation vs. Reducing Dataset?
Date: Mon, 13 Jul 2009 14:34:01 -0500

Next Article (by Date): st: wald tests with mfx Rich Steinberg
Previous Article (by Date): st: Treatment for Missing Values - What Options ? Chao Yawo
Top of Thread: st: Imputation vs. Reducing Dataset? John Simpson
Articles sorted by: [Date] [Author] [Subject]

.

Should you also consider a censored model, because you know your Xs
but not the Y for those populations that got larger than your cutoff?

-Dave

> Hello Statarians,
> I have a very large set of data featuring population counts
> generated by a computer simulation. In order to speed processing
> populations that grew beyond 15000 within the 100 generation limit
> were pulled from the simulation. As a result there are numerous
> populations that now have missing data, making my panels unbalanced.
> I am curious how to best fit a model to this data given what is
> missing. In particular, I have two worries:
> 1. That unless I do something the missing values will cause any
> procedure to misrepresent the actual situation as the smaller values
> that remain towards the end of the time period will skew the mean. I
> am curious if this is a problem for populations that have died off
> early as well (do I need to carry the 0 through all the
> remaininggenerations?).
> 2. I am unsure whether imputation (with ice?) or chopping the
> dataset or both is the best way to proceed. I know that ice needs
> variables that are missing at random, but is there some way to
> impute the missing values if I know how they are structured.
> Thank you.
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Next Article (by Date): st: wald tests with mfx Rich Steinberg
Previous Article (by Date): st: Treatment for Missing Values - What Options ? Chao Yawo
Top of Thread: st: Imputation vs. Reducing Dataset? John Simpson
Articles sorted by: [Date] [Author] [Subject]

Go to Harvard School of Public Health LWGate Home Page.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index