Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SV: st: reshaping with multiple identifiers

From	Mintewab Bezabih <[email protected]>
To	"[email protected]" <[email protected]>
Subject	SV: st: reshaping with multiple identifiers
Date	Sun, 1 Apr 2012 15:16:50 +0200
Thanks Nick. Does it mean that reshape tries to find values by  crop hhid and kebele and if it cannot find repititions, it discards my observations?  what happens to those observations that do not show up? On your comment on the values, I will check the r code again. But I still hope for the reshape to work. 

Many thanks
mintewab
________________________________________
Från: [email protected] [[email protected]] f&#246;r Nick Cox [[email protected]]
Skickat: den 1 april 2012 14:32
Till: [email protected]
Ämne: Re: st: reshaping with multiple identifiers

Thanks for that, but what I commented on still seems surprising to me.
Why the ties?

On the other hand, as you don't have every crop for every household
and kebele  for a every year and month, lots of missings don't sound
puzzling at all.

On Sun, Apr 1, 2012 at 1:20 PM, Mintewab Bezabih
<[email protected]> wrote:

> Thanks Nick.
> I am pleased that you could tell this is data from Ethiopia. You are right the data seems strange but this is what I did. First off, I had meteorology station level data. So inorder to get the figres at a household level, I used GPS information to generate household level data (I coded that in R). Then I brought the data to stata and used that info to calculate growing degree days which are given zero value if optimal to the crop's growing condition, then the interpolated temperature value if it coencides within the range of optimal growth and if above that range the vallue minus a constant figure. So what I basically have is a crop based growing degree days measure (as you are aware Ethiopia's agriculture is multicropping- hence the multiple crops within a household). What I want to do now is to have a long term average for each crop as well as average for my survey years, hence my need to reshape it and then merge this to my survey data.
>
> Let me know if I can provide more details so you can help me get to my problem
> many thanks
> Mintewab
> ________________________________________
> Från: [email protected] [[email protected]] f&#246;r Nick Cox [[email protected]]
> Skickat: den 1 april 2012 14:07
> Till: [email protected]
> Ämne: Re: st: reshaping with multiple identifiers
>
> Thanks for this, but despite the detail I am little closer to
> understanding your situation.
>
> Any way, what you show us suggests major oddities about your data that
> need to be clarified before it seems sensible to proceed further.
>
> By the way, I have worked quite a lot with climatic data, and I have
> never seen monthly average temperatures quoted to 5 decimal places
> before!
>
> Do these come from meteorological observations, so that you have
> different thermometers for each household in each kebele (which seems
> extraordinary),  or are they predictions from some model?
>
> These data are from Ethiopia, I guess, and it can get cold in
> mountainous areas, but
>
> 1. Consider
>
> . sort crop mavg
>
> . l if both == "1990_10", sep(0)
>
>     +-------------------------------------------+
>     | hhid   kebele   crop      both       mavg |
>     |-------------------------------------------|
>  1. |   19        1      2   1990_10   .3813828 |
>  2. |   25        1      3   1990_10   .0508326 |
>  3. |   14        1      5   1990_10   12.61545 |
>  4. |   21        1     12   1990_10   17.74057 |
>  5. |   24        1     12   1990_10   17.88119 |
>  10. |   12        1     12   1990_10   18.11545 |
>  12. |    9        1     12   1990_10   18.11545 |
>  13. |   10        1     12   1990_10   18.11545 |
>  14. |   17        1     12   1990_10   18.36091 |
>  15. |   11        1     29   1990_10   12.95125 |
>  16. |   13        1     42   1990_10   18.11545 |
>     +-------------------------------------------+
>
> That is, for the same kebele, for the same month, you have monthly
> average temperatures that are variously near 0, near 12, and near 18.
> (I guess these are all Celsius temperatures.)
>
> Is that right?
>
> Also, in principle different households and different crops could be
> experiencing different temperatures, but how can the pattern here
> arise of (a) several ties to 5 d.p. and (b) quite different values in
> other cases?
>
> 2. Consider
>
> . tab mavg
>
>       mavg |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>   .0508326 |          1        5.88        5.88
>   .3813828 |          1        5.88       11.76
>   12.61545 |          1        5.88       17.65
>   12.95125 |          1        5.88       23.53
>   17.74057 |          1        5.88       29.41
>   17.88119 |          1        5.88       35.29
>   18.01798 |          1        5.88       41.18
>   18.08271 |          1        5.88       47.06
>   18.10748 |          1        5.88       52.94
>   18.11545 |          7       41.18       94.12
>   18.36091 |          1        5.88      100.00
> ------------+-----------------------------------
>      Total |         17      100.00
>
> One particular average recurs repeatedly. Not impossible, but it seems strange.
>
> Nick
>
> On Sun, Apr 1, 2012 at 12:01 PM, Mintewab Bezabih
> <[email protected]> wrote:
>> Dear Eric and Nick,
>>
>> Thanks  and apologies for not being clear about my problem.
>>
>> First the variables I have are the following :
>> s25q3 (crop- I have now renamed it as crop), kebele (village) hhid (household id) year, mo(month) and mavg (monthly average temperature)
>>
>> What I want to do is have a montly average temperature variable for each crop within a houshold and kebele  for a particular year and month.
>> So that entailed me creating  i(kebele hhid s25q3)  j(both) .
>>
>> My problem is when I run the command I end up with too few observations in the cells and many of them are just empty. To put this in perspective, I have a total of
>> 28430 observations of mavg in the long form but when I do the summary of mavg1-mavg325 in the wide form, I end up with 5312 total data points. That is 75 percent of my observations gone and I just could not figure out what went wrong. In the example you did for me , eric, you have all the values you created in the long form in the wide form as well.
>> Here is an example of what my data looks like:
>>
>> hhid    kebele  crop    both    mavg
>> 1       1       42      1990_1  18.11545
>> 2       1       12      1990_2  18.08271
>> 3       1       12      1990_3  18.11545
>> 4       1       12      1990_4  18.10748
>> 6       1       12      1990_7  18.11545
>> 7       1       12      1990_8  18.01798
>> 9       1       12      1990_10 18.11545
>> 10      1       12      1990_10 18.11545
>> 11      1       29      1990_10 12.95125
>> 12      1       12      1990_10 18.11545
>> 13      1       42      1990_10 18.11545
>> 14      1       5       1990_10 12.61545
>> 17      1       12      1990_10 18.36091
>> 19      1       2       1990_10 .3813828
>> 21      1       12      1990_10 17.74057
>> 24      1       12      1990_10 17.88119
>> 25      1       3       1990_10 .0508326
>>
>>
>> Mnay thanks in advance
>> mintewab
>> ________________________________________
>> Från: [email protected] [[email protected]] f&#246;r Eric Booth [[email protected]]
>> Skickat: den 31 mars 2012 21:53
>> Till: [email protected]
>> Ämne: Re: st: reshaping with multiple identifiers
>>
>> <>
>>
>> Mintewab:
>>
>> Running the commands you gave yesterday (with some clean-up and adding vars you forgot to include (like 'mavg')) gives:
>>
>> **********************!
>> clear
>> inp str11(s25q3)        hhid    kebele  year    mo
>> "maize" 1       1       2006    1
>> "potatoes"      1       1       2005    1
>> "grass" 1       1       2004    1
>> "sinar/ge"      1       1       2003    1
>> "sinar/ge"      1       1       2002    1
>> end
>> g mavg = runiform()
>>
>> egen both=group(year mo ), label //labels won't help you here
>> drop year mo
>> **you don't need 'new' since i() takes a varlist
>> reshape wide mavg , i(kebele hhid s25q3)  j(both)
>> list, noobs
>>
>> /* which gives:
>>  +--------------------------------------------------------------------------------+
>>  |    s25q3   hhid   kebele      mavg1      mavg2      mavg3     mavg4      mavg5 |
>>  |--------------------------------------------------------------------------------|
>>  |    grass      1        1          .          .   .3713805         .          . |
>>  |    maize      1        1          .          .          .         .   .1650207 |
>>  | potatoes      1        1          .          .          .   .760604          . |
>>  | sinar/ge      1        1   .9678735   .3795409          .         .          . |
>>  +--------------------------------------------------------------------------------+
>> */
>> **********************!
>> I'm not sure what you expected here, but you got what you asked for.  The reason 'mavg*' is missing in the wide version is that there was no data in the long version of the data.  That is, the 'mavg*' vars are non-missing in only 'mavg3' in the first observation because for the i() grass, hhid==1, kebele==1, you only have one observation in the original, long dataset (in 2004, month 1, which become the 3rd 'mavg' var in your wide data since it's the 3rd "group" of your j() var 'both').
>>
>> If you expected something else, please describe with more detail (or better give an example of what you are trying to get -reshape- to do for you) so that others can give advice.
>>
>> - Eric
>>
>> __
>> Eric A. Booth
>> Public Policy Research Institute
>> Texas A&M University
>> [email protected]
>> +979.845.6754
>>
>> On Mar 31, 2012, at 11:35 AM, Nick Cox wrote:
>>
>>> You asked the same question yesterday and there were no answers. When
>>> that happens it is best to assume that the original question was not
>>> clear enough. Actually this version is even less informative than
>>> yesterday's!
>>>
>>> In this case, I see only that you want to -reshape- and that your
>>> attempts to do that don't satisfy. But you don't explain what most of
>>> these variables are. I can guess at -hhid- and -year-. Perhaps -mo-
>>> means "month". Why should have to guess?
>>>
>>> Nor do you show us the structure you want. What would a typical
>>> observation look like in the ideal structure?
>>>
>>> Nick
>>>
>>> On Sat, Mar 31, 2012 at 2:00 PM, Mintewab Bezabih
>>> <[email protected]> wrote:
>>>
>>>> I was trying to reshape wide my data using the command below:
>>>>
>>>>
>>>> egen both=group (  year mo ), label
>>>> egen new=group ( kebele hhid s25q3 ), label
>>>>
>>>> reshape wide mavg , i(new)  j( both)
>>>>
>>>> my variables are : s25q3   hhid    kebele  year    mo
>>>>
>>>> but while my code runs fine, I just end up with almost no observations in the reshaped file. I would appreciate any suggestion on how I could do this right
>>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: reshaping with multiple identifiers
  - From: Nick Cox <[email protected]>
References:
- SV: st: reshaping with multiple identifiers
  - From: Mintewab Bezabih <[email protected]>
- Re: st: reshaping with multiple identifiers
  - From: Nick Cox <[email protected]>
- SV: st: reshaping with multiple identifiers
  - From: Mintewab Bezabih <[email protected]>
- Re: st: reshaping with multiple identifiers
  - From: Nick Cox <[email protected]>
Prev by Date: RE: st: Merge Files one to many
Next by Date: Re: st: reshaping with multiple identifiers
Previous by thread: Re: st: reshaping with multiple identifiers
Next by thread: Re: st: reshaping with multiple identifiers
Index(es):
- Date
- Thread