Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# Re: st: RE: doing the comparison for pairs of years

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: RE: doing the comparison for pairs of years Date Sat, 12 May 2012 16:48:21 +0100

```You missed my correction at

http://www.stata.com/statalist/archive/2012-05/msg00484.html

from which the suggested code follows as

contract company Year P , zero
bysort Company P (Y) : gen new =  _freq > 0 & (_n == 1 |  _freq[_n-1]== 0)
tab Company Y if new

Do note that if any case "doesn't work" is difficult to respond to
without seeing any details of what that means.

Nick

On Sat, May 12, 2012 at 1:04 PM, Navid Asgari <navidstatalist@gmail.com> wrote:
> Hi Nick,
>
> Thanks,
>
> Yes, I made a mistake... after change it worked.
>
> Now, I am facing another problem. If I want to do the same thing
> (comparing values of "P" across years) for each group of rows (grouped
> by a variables called, say, "Company"), the following code doesn't
> work:
>
> contract company Year P , zero
> bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]== 0)
> tab Company Y if new
>
>
> Sorry for frequent question. I am an Stata newbie
>
> ---------------------+
>     |  company   Year   P |
>     |---------------------|
>  1. | Company1   1995   A |
>  2. | Company1   1995   A |
>  3. | Company1   1995   A |
>  4. | Company1   1995   A |
>  5. | Company1   1995   B |
>     |---------------------|
>  6. | Company1   1995   C |
>  7. | Company1   1995   D |
>  8. | Company1   1995   E |
>  9. | Company1   1996   A |
>  10. | Company1   1996   A |
>     |---------------------|
>  11. | Company1   1996   A |
>  12. | Company1   1996   A |
>  13. | Company1   1996   B |
>  14. | Company1   1996   C |
>  15. | Company1   1996   H |
>     |---------------------|
>  16. | Company1   1996   M |
>  17. | Company2   1993   A |
>  18. | Company2   1993   B |
>  19. | Company2   1993   G |
>  20. | Company2   1993   G |
>     |---------------------|
>  21. | Company2   1993   K |
>  22. | Company2   1993   M |
>  23. | Company2   1998   C |
>  24. | Company2   1998   K |
>  25. | Company2   1998   L |
>     |---------------------|
>  26. | Company2   1998   M |
>     +---------------------+

On Sat, May 12, 2012 at 4:53 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>> My code compares each year with the previous, which is I think exactly what
>> you ask, so I don't see any sense in which the logic fails.
>>
>> I think you need to substantiate your criticism.

On 12 May 2012, at 09:27, Navid Asgari <navidstatalist@gmail.com> wrote:
>>
>>> Hi Nick,
>>>
>>> Thanks for your quick and helpful response,
>>>
>>> The logic that you suggested works fine for comparison across only two
>>> years. However, if I want to compare new "P" values in ,say, 1995 with
>>> values of "P" in 1994 and then do the same but comparing only 1996
>>> with 1995 and then 1997 with 1996, the logic fails.
>>>
>>> I was thinking of a "foreach" loop over "Year" can work. But, it does
>>> not...
>>>
>>> What other ways are possible?
>>>
>>> Thanks,
>>> Navid
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> I can't make sense of your -reshape-. Your structure is already -long-
>>> and there is just one variable that is -P*-. As it is, the -reshape-
>>> command is illegal in the context you give. It seems quite unneeded,
>>> so I start afresh.
>>>
>>> I first read in your dataset.
>>>
>>> . input      Year str1  P
>>>
>>>         Year          P
>>>  1.  1995   A
>>>  2.  1995   B
>>>  3.  1995   A
>>>  4.  1995   C
>>>  5.  1995   D
>>>  6.  1995   A
>>>  7.  1995   E
>>>  8.  1995   A
>>>  9.  1996   B
>>> 10.  1996   A
>>> 11.  1996   A
>>> 12.  1996   M
>>> 13.  1996   A
>>> 14.  1996   H
>>> 15.  1996   A
>>> 16.  1996   C
>>> 17. end
>>>
>>> Then we reduce the dataset to a set of counts.
>>>
>>> . contract Year P , zero
>>>
>>> . l
>>>
>>>    +------------------+
>>>    | Year   P   _freq |
>>>    |------------------|
>>>  1. | 1995   A       4 |
>>>  2. | 1995   B       1 |
>>>  3. | 1995   C       1 |
>>>  4. | 1995   D       1 |
>>>  5. | 1995   E       1 |
>>>    |------------------|
>>>  6. | 1995   H       0 |
>>>  7. | 1995   M       0 |
>>>  8. | 1996   A       4 |
>>>  9. | 1996   B       1 |
>>> 10. | 1996   C       1 |
>>>    |------------------|
>>> 11. | 1996   D       0 |
>>> 12. | 1996   E       0 |
>>> 13. | 1996   H       1 |
>>> 14. | 1996   M       1 |
>>>    +------------------+
>>>
>>> Then a -P- is new if it wasn't observed the previous year. Notice that
>>> I define "new" as including the first time any value of -P- is
>>> observed.
>>>
>>> . bysort P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1] == 0)
>>>
>>> . l
>>>
>>>    +------------------------+
>>>    | Year   P   _freq   new |
>>>    |------------------------|
>>>  1. | 1995   A       4     1 |
>>>  2. | 1996   A       4     0 |
>>>  3. | 1995   B       1     1 |
>>>  4. | 1996   B       1     0 |
>>>  5. | 1995   C       1     1 |
>>>    |------------------------|
>>>  6. | 1996   C       1     0 |
>>>  7. | 1995   D       1     1 |
>>>  8. | 1996   D       0     0 |
>>>  9. | 1995   E       1     1 |
>>> 10. | 1996   E       0     0 |
>>>    |------------------------|
>>> 11. | 1995   H       0     1 |
>>> 12. | 1996   H       1     1 |
>>> 13. | 1995   M       0     1 |
>>> 14. | 1996   M       1     1 |
>>>    +------------------------+
>>>
>>> Then we count how many new categories there are each year.
>>>
>>> . tab Y if new
>>>
>>>      Year |      Freq.     Percent        Cum.
>>> ------------+-----------------------------------
>>>      1995 |          7       77.78       77.78
>>>      1996 |          2       22.22      100.00
>>> ------------+-----------------------------------
>>>     Total |          9      100.00
>>>
>>> The generalization to include -Company- should be something like this,
>>> but I didn't test it.
>>>
>>> contract Company Year P , zero
>>> bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]
>>> == 0) tab Company Y if new
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> Navid Asgari
>>>
>>> I have a dataset which looks like this:
>>>
>>>
>>>     Year   P |
>>>    |----------|
>>>  1. | 1995   A |
>>>  2. | 1995   B |
>>>  3. | 1995   A |
>>>  4. | 1995   C |
>>>  5. | 1995   D |
>>>    |----------|
>>>  6. | 1995   A |
>>>  7. | 1995   E |
>>>  8. | 1995   A |
>>>  9. | 1996   B |
>>> 10. | 1996   A |
>>>    |----------|
>>> 11. | 1996   A |
>>> 12. | 1996   M |
>>> 13. | 1996   A |
>>> 14. | 1996   H |
>>> 15. | 1996   A |
>>>    |----------|
>>> 16. | 1996   C
>>>
>>> I use the following to count number of new values under variable "P"
>>> that exists in the year 1996, but not 1995:
>>>
>>> gen id = _n
>>>>
>>>> reshape long P , i(id)
>>>> bysort P (Year id) : gen seq = _n
>>>
>>>
>>> Count if Year==1996 & seq==1
>>>
>>> Now I want to do the same thing for more than 2 successive years (e.g.
>>> 1993,1994,1995,1996). So, values of variable "P" in every year will be
>>> compared with the value of its previous year (1994 to 1993, then 1995
>>> to 1994, and so forth....
>>>
>>> The complexity of this lies in the fact that this comparison has to be
>>> done by each unique value of another variable and the starting year
>>> and ending year varies in each group. In fact this is how the
>>> structure of the real data looks like:
>>>
>>>
>>>    | Year   P    company |
>>>    |---------------------|
>>>  1. | 1995   A   Company1 |
>>>  2. | 1995   B   Company1 |
>>>  3. | 1995   A   Company1 |
>>>  4. | 1995   C   Company1 |
>>>  5. | 1995   D   Company1 |
>>>    |---------------------|
>>>  6. | 1995   A   Company1 |
>>>  7. | 1995   E   Company1 |
>>>  8. | 1995   A   Company1 |
>>>  9. | 1996   B   Company1 |
>>> 10. | 1996   A   Company1 |
>>>    |---------------------|
>>> 11. | 1996   A   Company1 |
>>> 12. | 1996   M   Company1 |
>>> 13. | 1996   A   Company1 |
>>> 14. | 1996   H   Company1 |
>>> 15. | 1996   A   Company1 |
>>>    |---------------------|
>>> 16. | 1996   C   Company1 |
>>> 17. | 1993   G   Company2 |
>>> 18. | 1993   G   Company2 |
>>> 19. | 1993   M   Company2 |
>>> 20. | 1993   K   Company2 |
>>>    |---------------------|
>>> 21. | 1993   A   Company2 |
>>> 22. | 1993   B   Company2 |
>>> 23. | 1994   C   Company2 |
>>> 24. | 1994   M   Company2 |
>>> 25. | 1994   K   Company2 |
>>>    |---------------------|
>>> 26. | 1994   L   Company2 |
>>>    +---------------------+
>>>
>>> So for every group under variable company the code will count number
>>> of new values of variable "P" in every year that did not exist a year
>>> before...

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index