Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: doing the comparison for pairs of years

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: RE: doing the comparison for pairs of years
Date	Sat, 12 May 2012 09:53:26 +0100

My code compares each year with the previous, which is I think exactlywhat you ask, so I don't see any sense in which the logic fails.


I think you need to substantiate your criticism.

Nick


On 12 May 2012, at 09:27, Navid Asgari <[email protected]> wrote:

Hi Nick,

Thanks for your quick and helpful response,

The logic that you suggested works fine for comparison across only two
years. However, if I want to compare new "P" values in ,say, 1995 with
values of "P" in 1994 and then do the same but comparing only 1996
with 1995 and then 1997 with 1996, the logic fails.

I was thinking of a "foreach" loop over "Year" can work. But, itdoes not...


What other ways are possible?

Thanks,
Navid

---------------------------------------------------------------------------------------------------------------------



I can't make sense of your -reshape-. Your structure is already -long-
and there is just one variable that is -P*-. As it is, the -reshape-
command is illegal in the context you give. It seems quite unneeded,
so I start afresh.

I first read in your dataset.

. input      Year str1  P

         Year          P
 1.  1995   A
 2.  1995   B
 3.  1995   A
 4.  1995   C
 5.  1995   D
 6.  1995   A
 7.  1995   E
 8.  1995   A
 9.  1996   B
10.  1996   A
11.  1996   A
12.  1996   M
13.  1996   A
14.  1996   H
15.  1996   A
16.  1996   C
17. end

Then we reduce the dataset to a set of counts.

. contract Year P , zero

. l

    +------------------+
    | Year   P   _freq |
    |------------------|
 1. | 1995   A       4 |
 2. | 1995   B       1 |
 3. | 1995   C       1 |
 4. | 1995   D       1 |
 5. | 1995   E       1 |
    |------------------|
 6. | 1995   H       0 |
 7. | 1995   M       0 |
 8. | 1996   A       4 |
 9. | 1996   B       1 |
10. | 1996   C       1 |
    |------------------|
11. | 1996   D       0 |
12. | 1996   E       0 |
13. | 1996   H       1 |
14. | 1996   M       1 |
    +------------------+

Then a -P- is new if it wasn't observed the previous year. Notice that
I define "new" as including the first time any value of -P- is
observed.

. bysort P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1] == 0)

. l

    +------------------------+
    | Year   P   _freq   new |
    |------------------------|
 1. | 1995   A       4     1 |
 2. | 1996   A       4     0 |
 3. | 1995   B       1     1 |
 4. | 1996   B       1     0 |
 5. | 1995   C       1     1 |
    |------------------------|
 6. | 1996   C       1     0 |
 7. | 1995   D       1     1 |
 8. | 1996   D       0     0 |
 9. | 1995   E       1     1 |
10. | 1996   E       0     0 |
    |------------------------|
11. | 1995   H       0     1 |
12. | 1996   H       1     1 |
13. | 1995   M       0     1 |
14. | 1996   M       1     1 |
    +------------------------+

Then we count how many new categories there are each year.

. tab Y if new

      Year |      Freq.     Percent        Cum.
------------+-----------------------------------
      1995 |          7       77.78       77.78
      1996 |          2       22.22      100.00
------------+-----------------------------------
     Total |          9      100.00

The generalization to include -Company- should be something like this,
but I didn't test it.

contract Company Year P , zero
bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]
== 0) tab Company Y if new

Nick
[email protected]

Navid Asgari

I have a dataset which looks like this:


     Year   P |
    |----------|
 1. | 1995   A |
 2. | 1995   B |
 3. | 1995   A |
 4. | 1995   C |
 5. | 1995   D |
    |----------|
 6. | 1995   A |
 7. | 1995   E |
 8. | 1995   A |
 9. | 1996   B |
10. | 1996   A |
    |----------|
11. | 1996   A |
12. | 1996   M |
13. | 1996   A |
14. | 1996   H |
15. | 1996   A |
    |----------|
16. | 1996   C

I use the following to count number of new values under variable "P"
that exists in the year 1996, but not 1995:

gen id = _n

reshape long P , i(id)
bysort P (Year id) : gen seq = _n


Count if Year==1996 & seq==1

Now I want to do the same thing for more than 2 successive years (e.g.
1993,1994,1995,1996). So, values of variable "P" in every year will be
compared with the value of its previous year (1994 to 1993, then 1995
to 1994, and so forth....

The complexity of this lies in the fact that this comparison has to be
done by each unique value of another variable and the starting year
and ending year varies in each group. In fact this is how the
structure of the real data looks like:


    | Year   P    company |
    |---------------------|
 1. | 1995   A   Company1 |
 2. | 1995   B   Company1 |
 3. | 1995   A   Company1 |
 4. | 1995   C   Company1 |
 5. | 1995   D   Company1 |
    |---------------------|
 6. | 1995   A   Company1 |
 7. | 1995   E   Company1 |
 8. | 1995   A   Company1 |
 9. | 1996   B   Company1 |
10. | 1996   A   Company1 |
    |---------------------|
11. | 1996   A   Company1 |
12. | 1996   M   Company1 |
13. | 1996   A   Company1 |
14. | 1996   H   Company1 |
15. | 1996   A   Company1 |
    |---------------------|
16. | 1996   C   Company1 |
17. | 1993   G   Company2 |
18. | 1993   G   Company2 |
19. | 1993   M   Company2 |
20. | 1993   K   Company2 |
    |---------------------|
21. | 1993   A   Company2 |
22. | 1993   B   Company2 |
23. | 1994   C   Company2 |
24. | 1994   M   Company2 |
25. | 1994   K   Company2 |
    |---------------------|
26. | 1994   L   Company2 |
    +---------------------+

So for every group under variable company the code will count number
of new values of variable "P" in every year that did not exist a year
before...
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: doing the comparison for pairs of years
  - From: Navid Asgari <[email protected]>

References:
- st: RE: doing the comparison for pairs of years
  - From: Navid Asgari <[email protected]>

Prev by Date: Re: st: mata: minindex vs permutation vector for finding closest values
Next by Date: st: estat class for stcox command
Previous by thread: st: RE: doing the comparison for pairs of years
Next by thread: Re: st: RE: doing the comparison for pairs of years
Index(es):
- Date
- Thread