Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: rectangulizing data
From
Joerg Luedicke <[email protected]>
To
[email protected]
Subject
Re: st: rectangulizing data
Date
Thu, 26 May 2011 13:36:54 -0400
You first need to create a variable that is indicating the year and
then you can do a count within year (as Nick and Oliver suggested).
For example:
*---------
gen year=0
local m 0
forval i=12(12)84{
local ++m
replace year=`m' if Month>`i'
}
bysort ID year: egen obs = count(Month)
drop if obs<12
*---------
would probably be one way to produce the desired result. However, you
should be careful with throwing away information for no good reason
(see Austin's post).
High volatility in income across adjacent month looks kind of strange.
I would definitely check if that is true for the entire sample or only
for subgroup(s). Or maybe is it a highly selective sample already that
can explain this. You should also check back with the employment
information, if there, if that makes sense.
I would also check if it is really necessary to collapse your
information into (often arbitrary) calendar years which cuts down
potentially important details in your data.
Finally, the quintile approach seems also problematic as it further
reduces the information in your data for no real reason.
In sum, throwing away all incomplete years, collapsing monthly
information into years, and reducing continuous information to
categorical information looks like a huge waste of information to me.
J.
On Thu, May 26, 2011 at 12:52 PM, Dmitriy Krichevskiy
<[email protected]> wrote:
> Thank you for you responses; I apologize for the confusion(s),
>
> Clarification then,
>
> The data comes from Survey of Income and Program Participation (SIPP)
> and my particular dataset combines 7 years of data. The data is
> collected quarterly and recorded monthly (via phone interviews). Hence
> time=14 is the second month of the second year. Many people in this
> sample miss interviews often, also income exhibits a lot of volatility
> (I still do not know why). My goal is to analyze income transitions
> from quintile to quintile (via -xttrans-) and for annual income I need
> to aggregate monthly income while differentiating between zero income
> from missing income. Hence, I am trying to drop people who only have
> few month of income on record for those years where their information
> is incomplete while keeping the same people for other years in which
> they have all the income information recorded. Given very large
> volatility and a lot of missing interviews I am not sure imputing
> income is harmless.
>
> On 5/26/11, Nick Cox <[email protected]> wrote:
>> I think this might need to be
>>
>> bysort ID year: egen obs = count(month)
>>
>> -- perhaps after some work --
>>
>> but as is agreed the example is unclear.
>>
>> On 26 May 2011, at 16:52, Oliver Jones <[email protected]>
>> wrote:
>>
>>> Hi,
>>> your example data structure is a bit confusing since you have month
>>> greater than 12... I'll assume you have at most 12 Month per person
>>> per year.
>>>
>>> Maybe this can help to drop people how have less than 12 observations
>>> for one particular year. Let's assume this year is 2006.
>>>
>>> bysort ID: egen obs = count(Month)
>>> drop if year == 2006 & obs < 12
>>>
>>> Dose it work?
>>>
>>> Best
>>> Oliver
>>>
>>> Am 26.05.2011 17:19, schrieb Dmitriy Krichevskiy:
>>>> Dear Listers,
>>>> I am trying to figure out the simplest way to covert a large panel
>>>> dataset from monthly to annual income. The income is only reported
>>>> monthly and I would want to clean the data of anyone missing a month
>>>> in a particular year. I would like to drop observations for that
>>>> person-year only and keep that person if they are fully present in
>>>> some other year. Here is an equivalent data structure. As always,
>>>> that
>>>> a lot for your help.
>>>> Dmitriy
>>>>
>>>> ID Month Income
>>>> 1 1 1000
>>>> 1 2 500
>>>> 1 3 1000
>>>> 1 13 0
>>>> 1 14 0
>>>> 1 15 0
>>>> 1 16 0
>>>> 1 17 600
>>>> 1 18 1000
>>>> 1 19 1000
>>>> 1 20 1000
>>>> 1 21 1000
>>>> 1 22 1000
>>>> 1 23 660
>>>> 1 24 800
>>>> 1 25 1200
>>>> 2 1 2400
>>>> 2 2 2400
>>>> 2 5 2600
>>>> *
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/