Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Dmitriy Krichevskiy <krichevskyd@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: rectangulizing data |

Date |
Thu, 26 May 2011 13:44:57 -0400 |

Thanks for the links Austin, I am lucky to find an expert on both the matter and the data. I am basically finding similar results as your paper though my interest is in self-employment vs. wage work. I am still puzzled by the enormous volatility an average individual faces. Out of the small sample of people I personally know no one is subjected to such fluctuations, most certainly no one working for a wage. This makes me uncomfortable because either I and the people I know are not a good representative of an average individual or (more worrisome scenario) those participating in SIPP are some strange individuals self-selecting to participate. Back to the issue at hand: would you abandon attempts to calculate annual income, impute or drop people missing several months? By abandoning I presume switching to 4 month cumulative (or average) intervals. On 5/26/11, Austin Nichols <austinnichols@gmail.com> wrote: > Dmitriy Krichevskiy <krichevskyd@gmail.com>: > Nor is dropping cases harmless; there is some discussion at > http://www.urban.org/publications/411971.html > and slides 12-14 of > http://www-personal.umich.edu/~nicholsa/an_dds.pdf > > On Thu, May 26, 2011 at 12:52 PM, Dmitriy Krichevskiy > <krichevskyd@gmail.com> wrote: >> Thank you for you responses; I apologize for the confusion(s), >> >> Clarification then, >> >> The data comes from Survey of Income and Program Participation (SIPP) >> and my particular dataset combines 7 years of data. The data is >> collected quarterly and recorded monthly (via phone interviews). Hence >> time=14 is the second month of the second year. Many people in this >> sample miss interviews often, also income exhibits a lot of volatility >> (I still do not know why). My goal is to analyze income transitions >> from quintile to quintile (via -xttrans-) and for annual income I need >> to aggregate monthly income while differentiating between zero income >> from missing income. Hence, I am trying to drop people who only have >> few month of income on record for those years where their information >> is incomplete while keeping the same people for other years in which >> they have all the income information recorded. Given very large >> volatility and a lot of missing interviews I am not sure imputing >> income is harmless. >> >> On 5/26/11, Nick Cox <njcoxstata@gmail.com> wrote: >>> I think this might need to be >>> >>> bysort ID year: egen obs = count(month) >>> >>> -- perhaps after some work -- >>> >>> but as is agreed the example is unclear. >>> >>> On 26 May 2011, at 16:52, Oliver Jones <ojones@wiwi.uni-bielefeld.de> >>> wrote: >>> >>>> Hi, >>>> your example data structure is a bit confusing since you have month >>>> greater than 12... I'll assume you have at most 12 Month per person >>>> per year. >>>> >>>> Maybe this can help to drop people how have less than 12 observations >>>> for one particular year. Let's assume this year is 2006. >>>> >>>> bysort ID: egen obs = count(Month) >>>> drop if year == 2006 & obs < 12 >>>> >>>> Dose it work? >>>> >>>> Best >>>> Oliver >>>> >>>> Am 26.05.2011 17:19, schrieb Dmitriy Krichevskiy: >>>>> Dear Listers, >>>>> I am trying to figure out the simplest way to covert a large panel >>>>> dataset from monthly to annual income. The income is only reported >>>>> monthly and I would want to clean the data of anyone missing a month >>>>> in a particular year. I would like to drop observations for that >>>>> person-year only and keep that person if they are fully present in >>>>> some other year. Here is an equivalent data structure. As always, >>>>> that >>>>> a lot for your help. >>>>> Dmitriy >>>>> >>>>> ID Month Income >>>>> 1 1 1000 >>>>> 1 2 500 >>>>> 1 3 1000 >>>>> 1 13 0 >>>>> 1 14 0 >>>>> 1 15 0 >>>>> 1 16 0 >>>>> 1 17 600 >>>>> 1 18 1000 >>>>> 1 19 1000 >>>>> 1 20 1000 >>>>> 1 21 1000 >>>>> 1 22 1000 >>>>> 1 23 660 >>>>> 1 24 800 >>>>> 1 25 1200 >>>>> 2 1 2400 >>>>> 2 2 2400 >>>>> 2 5 2600 >>>>> * > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Dmitriy Krichevskiy Ph.D. Candidate Economics Department Florida International University www.fiu.edu/~dkrichev Research Associate, College of Education Lumina Foundation Project * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: rectangulizing data***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: rectangulizing data***From:*Austin Nichols <austinnichols@gmail.com>

**References**:**st: rectangulizing data***From:*Dmitriy Krichevskiy <krichevskyd@gmail.com>

**Re: st: rectangulizing data***From:*Oliver Jones <ojones@wiwi.uni-bielefeld.de>

**Re: st: rectangulizing data***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: rectangulizing data***From:*Dmitriy Krichevskiy <krichevskyd@gmail.com>

**Re: st: rectangulizing data***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: rectangulizing data** - Next by Date:
**Re: st: rectangulizing data** - Previous by thread:
**Re: st: rectangulizing data** - Next by thread:
**Re: st: rectangulizing data** - Index(es):