Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reshaping long panel into wide to get rowtotals


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: reshaping long panel into wide to get rowtotals
Date   Wed, 25 May 2011 18:52:52 +0100

I don't see that you need to -reshape-. It sounds as if you should be
using -collapse- to group related observations. But a deeper point is
that you shouldn't expect firm advice because you haven't explained
what you regard as a unit: is it household on a particular day? It is
not clear from this example whether you have repeated observations for
each household.

A minor point is that -rsum()- is now undocumented in favour of
-rowtotal()-, although the two are identical in effect.

FWIW, I continue my personal campaign against the expression "a data"
when you mean mean "a dataset".

Nick

On Wed, May 25, 2011 at 6:41 PM, ABDUL ADAM <bihiabdul@yahoo.com> wrote:

> I have a panel data with long format  that looks like this:
>
>     +---------------------------------------------------------------------------------+
>     |  hhnr      mydate    valc200   valc150    valcrest    tot_val |
>     |-----------------------------------------------------------------------------------|
> 18. | 16414   10jul2006          .          .          .                    0 |
> 19. | 16414   10jul2006          .   1120.958          .   1120.958 |
> 20. | 16531   10jul2006          .          .   1199.145   1199.145 |
> 21. | 16531   10jul2006          .          .          .                    0 |
> 22. | 16545   10jul2006          .          .   1535.672   1535.672 |
>      |-----------------------------------------------------------------------------------|
> 23. | 16820   10jul2006          .          .   1557.154   1557.154 |
> 24. | 17222   10jul2006          .          .          .                    0 |
> 25. | 17432   10jul2006          .          .   2796.389   2796.389 |
> 26. | 18116   10jul2006          .    3217.72          .    3217.72   |
> 27. | 18562   10jul2006          .          .    949.102    949.102   |
>     |------------------------------------------------------------------------------------|
> 28. | 18605   10jul2006          .          .   7903.555   7903.555 |
> 29. | 18753   10jul2006          .    1622.18          .    1622.18 |
> 30. | 18914   10jul2006          .   7723.083          .   7723.083 |
> 31. | 18985   10jul2006          .          .   7358.771   7358.771 |
> 32. | 18985   10jul2006          .   2766.125          .   2766.125 |
>     |------------------------------------------------------------------------------------|
> 33. | 19139   10jul2006          .          .          .                    0 |
> 34. | 19435   10jul2006          .          .          .                    0 |
> 35. | 19459   10jul2006          .   2181.597          .   2181.597 |
> 36. | 19467   10jul2006          .          .   1900.701   1900.701 |
> 37. | 19653   10jul2006          .          .   2373.175   2373.175 |
>     |------------------------------------------------------------------------------------|
> 38. | 20048   10jul2006          .   946.1188          .   946.1188 |
>
>
> I want to generate a new variable (tot_val) that is row sum of the three preceding variables (i.e valc200 valc150 valcrest). When I use egen tot_val=rsum(valc200 valc150 valcrest), as expected I get a sum which is equal to one of the variables because the other two have missing values. For instance in row 31 I get a total of 7358.771 which is the same as valc150 in that row.I think my problem is I need to get similar households(hhnr)to be in the same row (e.g. hhnr 18985 appears in rows 31 & 32 in the same day) to get their sum later. To do this I tried to reshape the data from long to wide but I am getting: hhnr not unique within mydate; this is because some households are reporting purchase of a given item twice within a same date.
>
> Apart from the reshape attempt I feel I could have generated the above variables in a better way instead of:
> gen  valc200 =  valuewSADT if  cc200==1
> gen  valc150 =  valuewSADT if  cc150==1
> gen  valcrest =  valuewSADT if  ccrest==1
>
> My final aim is to produce the totals and use them to derive expenditure shares
> I would really be GRATEFUL to any explanations/tips.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index