Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: RE: doing the comparison for pairs of years

 From Navid Asgari To statalist@hsphsun2.harvard.edu Subject Re: st: RE: doing the comparison for pairs of years Date Sun, 13 May 2012 14:07:08 +0800

```Thanks Nick,

Yes, I missed your posting... there were some problem with my
subscription into the statalist...

I am running the code... Thanks a lot!

Navid

On Sat, May 12, 2012 at 11:48 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> You missed my correction at
>
> http://www.stata.com/statalist/archive/2012-05/msg00484.html
>
> from which the suggested code follows as
>
> contract company Year P , zero
> bysort Company P (Y) : gen new =  _freq > 0 & (_n == 1 |  _freq[_n-1]== 0)
> tab Company Y if new
>
> Do note that if any case "doesn't work" is difficult to respond to
> without seeing any details of what that means.
>
> Nick
>
> On Sat, May 12, 2012 at 1:04 PM, Navid Asgari <navidstatalist@gmail.com> wrote:
>> Hi Nick,
>>
>> Thanks,
>>
>> Yes, I made a mistake... after change it worked.
>>
>> Now, I am facing another problem. If I want to do the same thing
>> (comparing values of "P" across years) for each group of rows (grouped
>> by a variables called, say, "Company"), the following code doesn't
>> work:
>>
>> contract company Year P , zero
>> bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]== 0)
>> tab Company Y if new
>>
>>
>> Sorry for frequent question. I am an Stata newbie
>>
>> ---------------------+
>>     |  company   Year   P |
>>     |---------------------|
>>  1. | Company1   1995   A |
>>  2. | Company1   1995   A |
>>  3. | Company1   1995   A |
>>  4. | Company1   1995   A |
>>  5. | Company1   1995   B |
>>     |---------------------|
>>  6. | Company1   1995   C |
>>  7. | Company1   1995   D |
>>  8. | Company1   1995   E |
>>  9. | Company1   1996   A |
>>  10. | Company1   1996   A |
>>     |---------------------|
>>  11. | Company1   1996   A |
>>  12. | Company1   1996   A |
>>  13. | Company1   1996   B |
>>  14. | Company1   1996   C |
>>  15. | Company1   1996   H |
>>     |---------------------|
>>  16. | Company1   1996   M |
>>  17. | Company2   1993   A |
>>  18. | Company2   1993   B |
>>  19. | Company2   1993   G |
>>  20. | Company2   1993   G |
>>     |---------------------|
>>  21. | Company2   1993   K |
>>  22. | Company2   1993   M |
>>  23. | Company2   1998   C |
>>  24. | Company2   1998   K |
>>  25. | Company2   1998   L |
>>     |---------------------|
>>  26. | Company2   1998   M |
>>     +---------------------+
>
>
> On Sat, May 12, 2012 at 4:53 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> My code compares each year with the previous, which is I think exactly what
>>> you ask, so I don't see any sense in which the logic fails.
>>>
>>> I think you need to substantiate your criticism.
>
>
> On 12 May 2012, at 09:27, Navid Asgari <navidstatalist@gmail.com> wrote:
>>>
>>>> Hi Nick,
>>>>
>>>>
>>>> The logic that you suggested works fine for comparison across only two
>>>> years. However, if I want to compare new "P" values in ,say, 1995 with
>>>> values of "P" in 1994 and then do the same but comparing only 1996
>>>> with 1995 and then 1997 with 1996, the logic fails.
>>>>
>>>> I was thinking of a "foreach" loop over "Year" can work. But, it does
>>>> not...
>>>>
>>>> What other ways are possible?
>>>>
>>>> Thanks,
>>>> Navid
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> and there is just one variable that is -P*-. As it is, the -reshape-
>>>> command is illegal in the context you give. It seems quite unneeded,
>>>> so I start afresh.
>>>>
>>>>
>>>> . input      Year str1  P
>>>>
>>>>         Year          P
>>>>  1.  1995   A
>>>>  2.  1995   B
>>>>  3.  1995   A
>>>>  4.  1995   C
>>>>  5.  1995   D
>>>>  6.  1995   A
>>>>  7.  1995   E
>>>>  8.  1995   A
>>>>  9.  1996   B
>>>> 10.  1996   A
>>>> 11.  1996   A
>>>> 12.  1996   M
>>>> 13.  1996   A
>>>> 14.  1996   H
>>>> 15.  1996   A
>>>> 16.  1996   C
>>>> 17. end
>>>>
>>>> Then we reduce the dataset to a set of counts.
>>>>
>>>> . contract Year P , zero
>>>>
>>>> . l
>>>>
>>>>    +------------------+
>>>>    | Year   P   _freq |
>>>>    |------------------|
>>>>  1. | 1995   A       4 |
>>>>  2. | 1995   B       1 |
>>>>  3. | 1995   C       1 |
>>>>  4. | 1995   D       1 |
>>>>  5. | 1995   E       1 |
>>>>    |------------------|
>>>>  6. | 1995   H       0 |
>>>>  7. | 1995   M       0 |
>>>>  8. | 1996   A       4 |
>>>>  9. | 1996   B       1 |
>>>> 10. | 1996   C       1 |
>>>>    |------------------|
>>>> 11. | 1996   D       0 |
>>>> 12. | 1996   E       0 |
>>>> 13. | 1996   H       1 |
>>>> 14. | 1996   M       1 |
>>>>    +------------------+
>>>>
>>>> Then a -P- is new if it wasn't observed the previous year. Notice that
>>>> I define "new" as including the first time any value of -P- is
>>>> observed.
>>>>
>>>> . bysort P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1] == 0)
>>>>
>>>> . l
>>>>
>>>>    +------------------------+
>>>>    | Year   P   _freq   new |
>>>>    |------------------------|
>>>>  1. | 1995   A       4     1 |
>>>>  2. | 1996   A       4     0 |
>>>>  3. | 1995   B       1     1 |
>>>>  4. | 1996   B       1     0 |
>>>>  5. | 1995   C       1     1 |
>>>>    |------------------------|
>>>>  6. | 1996   C       1     0 |
>>>>  7. | 1995   D       1     1 |
>>>>  8. | 1996   D       0     0 |
>>>>  9. | 1995   E       1     1 |
>>>> 10. | 1996   E       0     0 |
>>>>    |------------------------|
>>>> 11. | 1995   H       0     1 |
>>>> 12. | 1996   H       1     1 |
>>>> 13. | 1995   M       0     1 |
>>>> 14. | 1996   M       1     1 |
>>>>    +------------------------+
>>>>
>>>> Then we count how many new categories there are each year.
>>>>
>>>> . tab Y if new
>>>>
>>>>      Year |      Freq.     Percent        Cum.
>>>> ------------+-----------------------------------
>>>>      1995 |          7       77.78       77.78
>>>>      1996 |          2       22.22      100.00
>>>> ------------+-----------------------------------
>>>>     Total |          9      100.00
>>>>
>>>> The generalization to include -Company- should be something like this,
>>>> but I didn't test it.
>>>>
>>>> contract Company Year P , zero
>>>> bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]
>>>> == 0) tab Company Y if new
>>>>
>>>> Nick
>>>> n.j.cox@durham.ac.uk
>>>>
>>>> Navid Asgari
>>>>
>>>> I have a dataset which looks like this:
>>>>
>>>>
>>>>     Year   P |
>>>>    |----------|
>>>>  1. | 1995   A |
>>>>  2. | 1995   B |
>>>>  3. | 1995   A |
>>>>  4. | 1995   C |
>>>>  5. | 1995   D |
>>>>    |----------|
>>>>  6. | 1995   A |
>>>>  7. | 1995   E |
>>>>  8. | 1995   A |
>>>>  9. | 1996   B |
>>>> 10. | 1996   A |
>>>>    |----------|
>>>> 11. | 1996   A |
>>>> 12. | 1996   M |
>>>> 13. | 1996   A |
>>>> 14. | 1996   H |
>>>> 15. | 1996   A |
>>>>    |----------|
>>>> 16. | 1996   C
>>>>
>>>> I use the following to count number of new values under variable "P"
>>>> that exists in the year 1996, but not 1995:
>>>>
>>>> gen id = _n
>>>>>
>>>>> reshape long P , i(id)
>>>>> bysort P (Year id) : gen seq = _n
>>>>
>>>>
>>>> Count if Year==1996 & seq==1
>>>>
>>>> Now I want to do the same thing for more than 2 successive years (e.g.
>>>> 1993,1994,1995,1996). So, values of variable "P" in every year will be
>>>> compared with the value of its previous year (1994 to 1993, then 1995
>>>> to 1994, and so forth....
>>>>
>>>> The complexity of this lies in the fact that this comparison has to be
>>>> done by each unique value of another variable and the starting year
>>>> and ending year varies in each group. In fact this is how the
>>>> structure of the real data looks like:
>>>>
>>>>
>>>>    | Year   P    company |
>>>>    |---------------------|
>>>>  1. | 1995   A   Company1 |
>>>>  2. | 1995   B   Company1 |
>>>>  3. | 1995   A   Company1 |
>>>>  4. | 1995   C   Company1 |
>>>>  5. | 1995   D   Company1 |
>>>>    |---------------------|
>>>>  6. | 1995   A   Company1 |
>>>>  7. | 1995   E   Company1 |
>>>>  8. | 1995   A   Company1 |
>>>>  9. | 1996   B   Company1 |
>>>> 10. | 1996   A   Company1 |
>>>>    |---------------------|
>>>> 11. | 1996   A   Company1 |
>>>> 12. | 1996   M   Company1 |
>>>> 13. | 1996   A   Company1 |
>>>> 14. | 1996   H   Company1 |
>>>> 15. | 1996   A   Company1 |
>>>>    |---------------------|
>>>> 16. | 1996   C   Company1 |
>>>> 17. | 1993   G   Company2 |
>>>> 18. | 1993   G   Company2 |
>>>> 19. | 1993   M   Company2 |
>>>> 20. | 1993   K   Company2 |
>>>>    |---------------------|
>>>> 21. | 1993   A   Company2 |
>>>> 22. | 1993   B   Company2 |
>>>> 23. | 1994   C   Company2 |
>>>> 24. | 1994   M   Company2 |
>>>> 25. | 1994   K   Company2 |
>>>>    |---------------------|
>>>> 26. | 1994   L   Company2 |
>>>>    +---------------------+
>>>>
>>>> So for every group under variable company the code will count number
>>>> of new values of variable "P" in every year that did not exist a year
>>>> before...
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```