Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: doing the comparison for pairs of years

From	Navid Asgari <[email protected]>
To	[email protected]
Subject	Re: st: RE: doing the comparison for pairs of years
Date	Sat, 12 May 2012 20:04:17 +0800

Hi Nick,

Thanks,

Yes, I made a mistake... after change it worked.

Now, I am facing another problem. If I want to do the same thing
(comparing values of "P" across years) for each group of rows (grouped
by a variables called, say, "Company"), the following code doesn't
work:

contract company Year P , zero
bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]== 0)
tab Company Y if new


Sorry for frequent question. I am an Stata newbie

---------------------+
     |  company   Year   P |
     |---------------------|
  1. | Company1   1995   A |
  2. | Company1   1995   A |
  3. | Company1   1995   A |
  4. | Company1   1995   A |
  5. | Company1   1995   B |
     |---------------------|
  6. | Company1   1995   C |
  7. | Company1   1995   D |
  8. | Company1   1995   E |
  9. | Company1   1996   A |
 10. | Company1   1996   A |
     |---------------------|
 11. | Company1   1996   A |
 12. | Company1   1996   A |
 13. | Company1   1996   B |
 14. | Company1   1996   C |
 15. | Company1   1996   H |
     |---------------------|
 16. | Company1   1996   M |
 17. | Company2   1993   A |
 18. | Company2   1993   B |
 19. | Company2   1993   G |
 20. | Company2   1993   G |
     |---------------------|
 21. | Company2   1993   K |
 22. | Company2   1993   M |
 23. | Company2   1998   C |
 24. | Company2   1998   K |
 25. | Company2   1998   L |
     |---------------------|
 26. | Company2   1998   M |
     +---------------------+






Thanks,
Navid


On Sat, May 12, 2012 at 4:53 PM, Nick Cox <[email protected]> wrote:
> My code compares each year with the previous, which is I think exactly what
> you ask, so I don't see any sense in which the logic fails.
>
> I think you need to substantiate your criticism.
>
> Nick
>
>
>
> On 12 May 2012, at 09:27, Navid Asgari <[email protected]> wrote:
>
>> Hi Nick,
>>
>> Thanks for your quick and helpful response,
>>
>> The logic that you suggested works fine for comparison across only two
>> years. However, if I want to compare new "P" values in ,say, 1995 with
>> values of "P" in 1994 and then do the same but comparing only 1996
>> with 1995 and then 1997 with 1996, the logic fails.
>>
>> I was thinking of a "foreach" loop over "Year" can work. But, it does
>> not...
>>
>> What other ways are possible?
>>
>> Thanks,
>> Navid
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------
>>
>>
>> I can't make sense of your -reshape-. Your structure is already -long-
>> and there is just one variable that is -P*-. As it is, the -reshape-
>> command is illegal in the context you give. It seems quite unneeded,
>> so I start afresh.
>>
>> I first read in your dataset.
>>
>> . input      Year str1  P
>>
>>         Year          P
>>  1.  1995   A
>>  2.  1995   B
>>  3.  1995   A
>>  4.  1995   C
>>  5.  1995   D
>>  6.  1995   A
>>  7.  1995   E
>>  8.  1995   A
>>  9.  1996   B
>> 10.  1996   A
>> 11.  1996   A
>> 12.  1996   M
>> 13.  1996   A
>> 14.  1996   H
>> 15.  1996   A
>> 16.  1996   C
>> 17. end
>>
>> Then we reduce the dataset to a set of counts.
>>
>> . contract Year P , zero
>>
>> . l
>>
>>    +------------------+
>>    | Year   P   _freq |
>>    |------------------|
>>  1. | 1995   A       4 |
>>  2. | 1995   B       1 |
>>  3. | 1995   C       1 |
>>  4. | 1995   D       1 |
>>  5. | 1995   E       1 |
>>    |------------------|
>>  6. | 1995   H       0 |
>>  7. | 1995   M       0 |
>>  8. | 1996   A       4 |
>>  9. | 1996   B       1 |
>> 10. | 1996   C       1 |
>>    |------------------|
>> 11. | 1996   D       0 |
>> 12. | 1996   E       0 |
>> 13. | 1996   H       1 |
>> 14. | 1996   M       1 |
>>    +------------------+
>>
>> Then a -P- is new if it wasn't observed the previous year. Notice that
>> I define "new" as including the first time any value of -P- is
>> observed.
>>
>> . bysort P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1] == 0)
>>
>> . l
>>
>>    +------------------------+
>>    | Year   P   _freq   new |
>>    |------------------------|
>>  1. | 1995   A       4     1 |
>>  2. | 1996   A       4     0 |
>>  3. | 1995   B       1     1 |
>>  4. | 1996   B       1     0 |
>>  5. | 1995   C       1     1 |
>>    |------------------------|
>>  6. | 1996   C       1     0 |
>>  7. | 1995   D       1     1 |
>>  8. | 1996   D       0     0 |
>>  9. | 1995   E       1     1 |
>> 10. | 1996   E       0     0 |
>>    |------------------------|
>> 11. | 1995   H       0     1 |
>> 12. | 1996   H       1     1 |
>> 13. | 1995   M       0     1 |
>> 14. | 1996   M       1     1 |
>>    +------------------------+
>>
>> Then we count how many new categories there are each year.
>>
>> . tab Y if new
>>
>>      Year |      Freq.     Percent        Cum.
>> ------------+-----------------------------------
>>      1995 |          7       77.78       77.78
>>      1996 |          2       22.22      100.00
>> ------------+-----------------------------------
>>     Total |          9      100.00
>>
>> The generalization to include -Company- should be something like this,
>> but I didn't test it.
>>
>> contract Company Year P , zero
>> bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]
>> == 0) tab Company Y if new
>>
>> Nick
>> [email protected]
>>
>> Navid Asgari
>>
>> I have a dataset which looks like this:
>>
>>
>>     Year   P |
>>    |----------|
>>  1. | 1995   A |
>>  2. | 1995   B |
>>  3. | 1995   A |
>>  4. | 1995   C |
>>  5. | 1995   D |
>>    |----------|
>>  6. | 1995   A |
>>  7. | 1995   E |
>>  8. | 1995   A |
>>  9. | 1996   B |
>> 10. | 1996   A |
>>    |----------|
>> 11. | 1996   A |
>> 12. | 1996   M |
>> 13. | 1996   A |
>> 14. | 1996   H |
>> 15. | 1996   A |
>>    |----------|
>> 16. | 1996   C
>>
>> I use the following to count number of new values under variable "P"
>> that exists in the year 1996, but not 1995:
>>
>> gen id = _n
>>>
>>> reshape long P , i(id)
>>> bysort P (Year id) : gen seq = _n
>>
>>
>> Count if Year==1996 & seq==1
>>
>> Now I want to do the same thing for more than 2 successive years (e.g.
>> 1993,1994,1995,1996). So, values of variable "P" in every year will be
>> compared with the value of its previous year (1994 to 1993, then 1995
>> to 1994, and so forth....
>>
>> The complexity of this lies in the fact that this comparison has to be
>> done by each unique value of another variable and the starting year
>> and ending year varies in each group. In fact this is how the
>> structure of the real data looks like:
>>
>>
>>    | Year   P    company |
>>    |---------------------|
>>  1. | 1995   A   Company1 |
>>  2. | 1995   B   Company1 |
>>  3. | 1995   A   Company1 |
>>  4. | 1995   C   Company1 |
>>  5. | 1995   D   Company1 |
>>    |---------------------|
>>  6. | 1995   A   Company1 |
>>  7. | 1995   E   Company1 |
>>  8. | 1995   A   Company1 |
>>  9. | 1996   B   Company1 |
>> 10. | 1996   A   Company1 |
>>    |---------------------|
>> 11. | 1996   A   Company1 |
>> 12. | 1996   M   Company1 |
>> 13. | 1996   A   Company1 |
>> 14. | 1996   H   Company1 |
>> 15. | 1996   A   Company1 |
>>    |---------------------|
>> 16. | 1996   C   Company1 |
>> 17. | 1993   G   Company2 |
>> 18. | 1993   G   Company2 |
>> 19. | 1993   M   Company2 |
>> 20. | 1993   K   Company2 |
>>    |---------------------|
>> 21. | 1993   A   Company2 |
>> 22. | 1993   B   Company2 |
>> 23. | 1994   C   Company2 |
>> 24. | 1994   M   Company2 |
>> 25. | 1994   K   Company2 |
>>    |---------------------|
>> 26. | 1994   L   Company2 |
>>    +---------------------+
>>
>> So for every group under variable company the code will count number
>> of new values of variable "P" in every year that did not exist a year
>> before...
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: doing the comparison for pairs of years
  - From: Nick Cox <[email protected]>

References:
- st: RE: doing the comparison for pairs of years
  - From: Navid Asgari <[email protected]>
- Re: st: RE: doing the comparison for pairs of years
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: data transformation or probably reshape
Next by Date: Re: st: date functions: from monthly to quarterly only through daily?
Previous by thread: Re: st: RE: doing the comparison for pairs of years
Next by thread: Re: st: RE: doing the comparison for pairs of years
Index(es):
- Date
- Thread