Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Characteristics of median observation


From   Tim <lists@timbp.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Characteristics of median observation
Date   Tue, 18 Aug 2009 18:57:19 +1000

Maarten posted some example code to help the original poster.

I made a couple of additions to Maarten's code to try to aid my understanding, and I just became more confused. I have not changed any of his lines except the final -list- (and I commented out the -drop-). I have added a few lines to generate variables -- and I don't understand the values of these variables.


The modified code:
*---------- begin example (modified by timbp) --------------
clear
input ///
age     sales   ind    year id
  2     1.04   3339    1991 1
  3     1.75   3339    1991 2
  3     3.08   3339    1991 3
 31     .496   3339    1991 4
 42     .546   3339    1991 5
 42      1.5   3339    1991 6
  5        .   3411    1991 7
  8     .584   3411    1991 8
 30     .491   3411    1991 9
 19     .944   3411    1991 10
 20     .692   3411    1991 11
 28     1.81   3411    1991 12
 29     .601   3411    1991 13
 32     .509   3411    1991 14
 42     .938   3411    1991 15
 42     .886   3411    1991 16
end

gen byte miss = missing(age, sales, ind, year)
bysort miss ind year (sales): gen long n =  ///
    _N - 1 if miss == 0
bysort miss ind year (sales): gen medage =  ///
    (age[`= floor(n/2)'] + age[`= ceil(n/2)'])/2 ///
     if miss == 0
bysort miss ind year (sales): gen agef = age[`= floor(n/2)'] if miss == 0 // added by timbp bysort miss ind year (sales): gen agec = age[`= ceil(n/2)'] if miss == 0 // added by timbp bysort miss ind year (sales): gen fl = floor(n/2) if miss == 0 // added by timbp bysort miss ind year (sales): gen ce = ceil(n/2) if miss == 0 // added by timbp

* drop miss n
list, sepby(miss ind) //modified by timbp
*--------------- end example ----------------------

The output:

+----------------------------------------------------------------------------+ | age sales ind year id miss n medage agef agec fl ce | |----------------------------------------------------------------------------| 1. | 31 .496 3339 1991 4 0 5 22 42 2 2 3 | 2. | 42 .546 3339 1991 5 0 5 22 42 2 2 3 | 3. | 2 1.04 3339 1991 1 0 5 22 42 2 2 3 | 4. | 42 1.5 3339 1991 6 0 5 22 42 2 2 3 | 5. | 3 1.75 3339 1991 2 0 5 22 42 2 2 3 | 6. | 3 3.08 3339 1991 3 0 5 22 42 2 2 3 | |----------------------------------------------------------------------------| 7. | 30 .491 3411 1991 9 0 8 20 32 8 4 4 | 8. | 32 .509 3411 1991 14 0 8 20 32 8 4 4 | 9. | 8 .584 3411 1991 8 0 8 20 32 8 4 4 | 10. | 29 .601 3411 1991 13 0 8 20 32 8 4 4 | 11. | 20 .692 3411 1991 11 0 8 20 32 8 4 4 | 12. | 42 .886 3411 1991 16 0 8 20 32 8 4 4 | 13. | 42 .938 3411 1991 15 0 8 20 32 8 4 4 | 14. | 19 .944 3411 1991 10 0 8 20 32 8 4 4 | 15. | 28 1.81 3411 1991 12 0 8 20 32 8 4 4 | |----------------------------------------------------------------------------| 16. | 5 . 3411 1991 7 1 . . . . . . | +----------------------------------------------------------------------------+

My questions:

1. For ind==3411, fl and ce are both 4, so why are agef and agec different?

2. For ind==3411, medage appears to be average of age[2] and age[3] ([ ] numbers relating to the by group).
How does Stata get those index values when fl==4 and ce==4?

3. For ind==3339, fl==2 and ce==3, and medage appears to be average of age[2] and age[3], but for ind==3411, fl==4 and ce==4 and medage appears to be average of age[2] and age[3]. Why the difference?

Thanks,

tim
lists@timbp.com

Maarten buis wrote:
--- On Tue, 18/8/09, John Hund wrote:
 As an example, I have data (sales and ages) on firms by
year in different industries. I would like to find the
age of the firm with the median value on sales for each
year and industry.

*---------- begin example --------------
clear
input ///
age     sales   ind    year id
   2     1.04   3339    1991 1
   3     1.75   3339    1991 2
   3     3.08   3339    1991 3
  31     .496   3339    1991 4
  42     .546   3339    1991 5
  42      1.5   3339    1991 6
   5        .   3411    1991 7
   8     .584   3411    1991 8
  30     .491   3411    1991 9
  19     .944   3411    1991 10
  20     .692   3411    1991 11
  28     1.81   3411    1991 12
  29     .601   3411    1991 13
  32     .509   3411    1991 14
  42     .938   3411    1991 15
  42     .886   3411    1991 16
end

gen byte miss = missing(age, sales, ind, year)
bysort miss ind year (sales): gen long n =  ///
     _N - 1 if miss == 0
bysort miss ind year (sales): gen medage =  ///
     (age[`= floor(n/2)'] + age[`= ceil(n/2)'])/2 ///
      if miss == 0
drop miss n
list
*--------------- end example ----------------------

Hope this helps,
Maarten

-----------------------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index