Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Navid Asgari <navidstatalist@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: doing the comparison for pairs of years |
Date | Sun, 13 May 2012 14:07:08 +0800 |
Thanks Nick, Yes, I missed your posting... there were some problem with my subscription into the statalist... I am running the code... Thanks a lot! Navid On Sat, May 12, 2012 at 11:48 PM, Nick Cox <njcoxstata@gmail.com> wrote: > You missed my correction at > > http://www.stata.com/statalist/archive/2012-05/msg00484.html > > from which the suggested code follows as > > contract company Year P , zero > bysort Company P (Y) : gen new = _freq > 0 & (_n == 1 | _freq[_n-1]== 0) > tab Company Y if new > > Do note that if any case "doesn't work" is difficult to respond to > without seeing any details of what that means. > > Nick > > On Sat, May 12, 2012 at 1:04 PM, Navid Asgari <navidstatalist@gmail.com> wrote: >> Hi Nick, >> >> Thanks, >> >> Yes, I made a mistake... after change it worked. >> >> Now, I am facing another problem. If I want to do the same thing >> (comparing values of "P" across years) for each group of rows (grouped >> by a variables called, say, "Company"), the following code doesn't >> work: >> >> contract company Year P , zero >> bysort Company P (Y) : gen new = _n == 1 | (_freq > 0 & _freq[_n-1]== 0) >> tab Company Y if new >> >> >> Sorry for frequent question. I am an Stata newbie >> >> ---------------------+ >> | company Year P | >> |---------------------| >> 1. | Company1 1995 A | >> 2. | Company1 1995 A | >> 3. | Company1 1995 A | >> 4. | Company1 1995 A | >> 5. | Company1 1995 B | >> |---------------------| >> 6. | Company1 1995 C | >> 7. | Company1 1995 D | >> 8. | Company1 1995 E | >> 9. | Company1 1996 A | >> 10. | Company1 1996 A | >> |---------------------| >> 11. | Company1 1996 A | >> 12. | Company1 1996 A | >> 13. | Company1 1996 B | >> 14. | Company1 1996 C | >> 15. | Company1 1996 H | >> |---------------------| >> 16. | Company1 1996 M | >> 17. | Company2 1993 A | >> 18. | Company2 1993 B | >> 19. | Company2 1993 G | >> 20. | Company2 1993 G | >> |---------------------| >> 21. | Company2 1993 K | >> 22. | Company2 1993 M | >> 23. | Company2 1998 C | >> 24. | Company2 1998 K | >> 25. | Company2 1998 L | >> |---------------------| >> 26. | Company2 1998 M | >> +---------------------+ > > > On Sat, May 12, 2012 at 4:53 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> My code compares each year with the previous, which is I think exactly what >>> you ask, so I don't see any sense in which the logic fails. >>> >>> I think you need to substantiate your criticism. > > > On 12 May 2012, at 09:27, Navid Asgari <navidstatalist@gmail.com> wrote: >>> >>>> Hi Nick, >>>> >>>> Thanks for your quick and helpful response, >>>> >>>> The logic that you suggested works fine for comparison across only two >>>> years. However, if I want to compare new "P" values in ,say, 1995 with >>>> values of "P" in 1994 and then do the same but comparing only 1996 >>>> with 1995 and then 1997 with 1996, the logic fails. >>>> >>>> I was thinking of a "foreach" loop over "Year" can work. But, it does >>>> not... >>>> >>>> What other ways are possible? >>>> >>>> Thanks, >>>> Navid >>>> >>>> >>>> --------------------------------------------------------------------------------------------------------------------- >>>> >>>> >>>> I can't make sense of your -reshape-. Your structure is already -long- >>>> and there is just one variable that is -P*-. As it is, the -reshape- >>>> command is illegal in the context you give. It seems quite unneeded, >>>> so I start afresh. >>>> >>>> I first read in your dataset. >>>> >>>> . input Year str1 P >>>> >>>> Year P >>>> 1. 1995 A >>>> 2. 1995 B >>>> 3. 1995 A >>>> 4. 1995 C >>>> 5. 1995 D >>>> 6. 1995 A >>>> 7. 1995 E >>>> 8. 1995 A >>>> 9. 1996 B >>>> 10. 1996 A >>>> 11. 1996 A >>>> 12. 1996 M >>>> 13. 1996 A >>>> 14. 1996 H >>>> 15. 1996 A >>>> 16. 1996 C >>>> 17. end >>>> >>>> Then we reduce the dataset to a set of counts. >>>> >>>> . contract Year P , zero >>>> >>>> . l >>>> >>>> +------------------+ >>>> | Year P _freq | >>>> |------------------| >>>> 1. | 1995 A 4 | >>>> 2. | 1995 B 1 | >>>> 3. | 1995 C 1 | >>>> 4. | 1995 D 1 | >>>> 5. | 1995 E 1 | >>>> |------------------| >>>> 6. | 1995 H 0 | >>>> 7. | 1995 M 0 | >>>> 8. | 1996 A 4 | >>>> 9. | 1996 B 1 | >>>> 10. | 1996 C 1 | >>>> |------------------| >>>> 11. | 1996 D 0 | >>>> 12. | 1996 E 0 | >>>> 13. | 1996 H 1 | >>>> 14. | 1996 M 1 | >>>> +------------------+ >>>> >>>> Then a -P- is new if it wasn't observed the previous year. Notice that >>>> I define "new" as including the first time any value of -P- is >>>> observed. >>>> >>>> . bysort P (Y) : gen new = _n == 1 | (_freq > 0 & _freq[_n-1] == 0) >>>> >>>> . l >>>> >>>> +------------------------+ >>>> | Year P _freq new | >>>> |------------------------| >>>> 1. | 1995 A 4 1 | >>>> 2. | 1996 A 4 0 | >>>> 3. | 1995 B 1 1 | >>>> 4. | 1996 B 1 0 | >>>> 5. | 1995 C 1 1 | >>>> |------------------------| >>>> 6. | 1996 C 1 0 | >>>> 7. | 1995 D 1 1 | >>>> 8. | 1996 D 0 0 | >>>> 9. | 1995 E 1 1 | >>>> 10. | 1996 E 0 0 | >>>> |------------------------| >>>> 11. | 1995 H 0 1 | >>>> 12. | 1996 H 1 1 | >>>> 13. | 1995 M 0 1 | >>>> 14. | 1996 M 1 1 | >>>> +------------------------+ >>>> >>>> Then we count how many new categories there are each year. >>>> >>>> . tab Y if new >>>> >>>> Year | Freq. Percent Cum. >>>> ------------+----------------------------------- >>>> 1995 | 7 77.78 77.78 >>>> 1996 | 2 22.22 100.00 >>>> ------------+----------------------------------- >>>> Total | 9 100.00 >>>> >>>> The generalization to include -Company- should be something like this, >>>> but I didn't test it. >>>> >>>> contract Company Year P , zero >>>> bysort Company P (Y) : gen new = _n == 1 | (_freq > 0 & _freq[_n-1] >>>> == 0) tab Company Y if new >>>> >>>> Nick >>>> n.j.cox@durham.ac.uk >>>> >>>> Navid Asgari >>>> >>>> I have a dataset which looks like this: >>>> >>>> >>>> Year P | >>>> |----------| >>>> 1. | 1995 A | >>>> 2. | 1995 B | >>>> 3. | 1995 A | >>>> 4. | 1995 C | >>>> 5. | 1995 D | >>>> |----------| >>>> 6. | 1995 A | >>>> 7. | 1995 E | >>>> 8. | 1995 A | >>>> 9. | 1996 B | >>>> 10. | 1996 A | >>>> |----------| >>>> 11. | 1996 A | >>>> 12. | 1996 M | >>>> 13. | 1996 A | >>>> 14. | 1996 H | >>>> 15. | 1996 A | >>>> |----------| >>>> 16. | 1996 C >>>> >>>> I use the following to count number of new values under variable "P" >>>> that exists in the year 1996, but not 1995: >>>> >>>> gen id = _n >>>>> >>>>> reshape long P , i(id) >>>>> bysort P (Year id) : gen seq = _n >>>> >>>> >>>> Count if Year==1996 & seq==1 >>>> >>>> Now I want to do the same thing for more than 2 successive years (e.g. >>>> 1993,1994,1995,1996). So, values of variable "P" in every year will be >>>> compared with the value of its previous year (1994 to 1993, then 1995 >>>> to 1994, and so forth.... >>>> >>>> The complexity of this lies in the fact that this comparison has to be >>>> done by each unique value of another variable and the starting year >>>> and ending year varies in each group. In fact this is how the >>>> structure of the real data looks like: >>>> >>>> >>>> | Year P company | >>>> |---------------------| >>>> 1. | 1995 A Company1 | >>>> 2. | 1995 B Company1 | >>>> 3. | 1995 A Company1 | >>>> 4. | 1995 C Company1 | >>>> 5. | 1995 D Company1 | >>>> |---------------------| >>>> 6. | 1995 A Company1 | >>>> 7. | 1995 E Company1 | >>>> 8. | 1995 A Company1 | >>>> 9. | 1996 B Company1 | >>>> 10. | 1996 A Company1 | >>>> |---------------------| >>>> 11. | 1996 A Company1 | >>>> 12. | 1996 M Company1 | >>>> 13. | 1996 A Company1 | >>>> 14. | 1996 H Company1 | >>>> 15. | 1996 A Company1 | >>>> |---------------------| >>>> 16. | 1996 C Company1 | >>>> 17. | 1993 G Company2 | >>>> 18. | 1993 G Company2 | >>>> 19. | 1993 M Company2 | >>>> 20. | 1993 K Company2 | >>>> |---------------------| >>>> 21. | 1993 A Company2 | >>>> 22. | 1993 B Company2 | >>>> 23. | 1994 C Company2 | >>>> 24. | 1994 M Company2 | >>>> 25. | 1994 K Company2 | >>>> |---------------------| >>>> 26. | 1994 L Company2 | >>>> +---------------------+ >>>> >>>> So for every group under variable company the code will count number >>>> of new values of variable "P" in every year that did not exist a year >>>> before... > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/