Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: creating a new variable


From   Amal Khanolkar <Amal.Khanolkar@ki.se>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: creating a new variable
Date   Wed, 18 Jul 2012 12:02:51 +0000

Thank you Nick, Maarten & steve for your suggestions. 

The tabstat command is the perfect way to get a descriptive take on what I wanted.

I tried the following and find a discrepency in the number of subjects:

. egen mean_bw = mean(bw),  by(gestwk)

. tab mean_bw

    mean_bw |      Freq.     Percent        Cum.
------------+-----------------------------------
   559.5574 |        134        0.00        0.00
   616.5096 |        387        0.01        0.02
   699.3734 |        738        0.02        0.04
   790.9377 |      1,235        0.04        0.08
   902.7249 |      1,688        0.06        0.14
   1014.961 |      2,125        0.07        0.21
   1138.658 |      2,723        0.09        0.30
   1295.815 |      3,415        0.11        0.42
   1461.302 |      4,481        0.15        0.57
   1655.637 |      5,876        0.20        0.76
   1858.227 |      8,533        0.29        1.05
   2092.705 |     12,958        0.43        1.48
   2325.826 |     21,420        0.72        2.20
   2592.584 |     36,710        1.23        3.42
   2837.138 |     70,297        2.35        5.77
   3081.272 |    151,310        5.06       10.83
   3309.638 |      9,763        0.33       11.16
   3313.268 |    373,660       12.49       23.65
   3488.345 |    660,536       22.08       45.73
   3627.659 |      1,648        0.06       45.78
   3637.902 |    822,376       27.49       73.28
   3698.833 |      5,470        0.18       73.46
   3755.764 |    542,442       18.13       91.59
   3791.726 |     31,928        1.07       92.66
   3826.705 |    219,603        7.34      100.00
------------+-----------------------------------
      Total |  2,991,456      100.00

 . tabstat bw, by(gestwk) stat (mean n sd)

Summary for variables: bw
     by categories of: gestwk 

  gestwk |      mean         N        sd
---------+------------------------------
      22 |  559.5574       122  209.6139
      23 |  616.5096       365  134.5845
      24 |  699.3734       691  135.2207
      25 |  790.9377      1171   147.066
      26 |  902.7248      1610  189.5523
      27 |  1014.961      2024   201.809
      28 |  1138.658      2613   238.724
      29 |  1295.815      3316  278.1803
      30 |  1461.302      4367  299.6202
      31 |  1655.637      5732  345.8412
      32 |  1858.227      8369  359.1699
      33 |  2092.704     12771   402.861
      34 |  2325.826     21149  416.8742
      35 |  2592.584     36451  458.3818
      36 |  2837.138     69940  464.2042
      37 |  3081.272    150767  465.5551
      38 |  3313.268    372601   453.221
      39 |  3488.345    658969  445.2462
      40 |  3637.902    820460  453.1178
      41 |  3755.764    541160  467.3571
      42 |  3826.705    219074  485.0738
      43 |  3791.726     31859  507.7569
      44 |  3698.833      5454  512.7899
      45 |  3627.659      1631  531.2405
---------+------------------------------
   Total |  3502.912   2972666  575.2709
----------------------------------------


As one can see from above the N for each gestational week isn't the same for the two tabs. I get the same problem when using:

bys gestwk : egen mean1 = mean(bw) 

The N's are almost the same for most gestwk thus giving the same mean BW. But in some cases the N's differ quite a bit giving larger differences in mean BW.


Thanks,
/Amal

________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Nick Cox [njcoxstata@gmail.com]
Sent: 18 July 2012 13:40
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: creating a new variable

Here are five solutions for a similar problem.

. sysuse auto

. tab rep78, su(mpg)

     Repair |      Summary of Mileage (mpg)
Record 1978 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          1 |          21   4.2426407           2
          2 |      19.125   3.7583241           8
          3 |   19.433333   4.1413252          30
          4 |   21.666667   4.9348699          18
          5 |   27.363636   8.7323849          11
------------+------------------------------------
      Total |   21.289855   5.8664085          69

. tabstat mpg , by(rep78)

Summary for variables: mpg
     by categories of: rep78 (Repair Record 1978)

   rep78 |      mean
---------+----------
       1 |        21
       2 |    19.125
       3 |  19.43333
       4 |  21.66667
       5 |  27.36364
---------+----------
   Total |  21.28986
--------------------

. graph dot (mean) mpg, over(rep78) vertical

. egen mean_mpg = mean(mpg),  by(rep78)

. scatter mean_mpg rep78

. dotplot mpg, over(rep78) bar


On Wed, Jul 18, 2012 at 11:34 AM, Amal Khanolkar <Amal.Khanolkar@ki.se> wrote:

> I have a very simple problem that I'm unable to find a simple solution for:
>
> Below is the data concerned:
>
> Gestational age in weeks:
>
>  tab gestwk
>
>      gestwk |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>          22 |        134        0.00        0.00
>          23 |        387        0.01        0.02
>          24 |        738        0.02        0.04
>          25 |      1,235        0.04        0.08
>          26 |      1,688        0.06        0.14
>          27 |      2,125        0.07        0.21
>          28 |      2,723        0.09        0.30
>          29 |      3,415        0.11        0.42
>          30 |      4,481        0.15        0.57
>          31 |      5,876        0.20        0.76
>          32 |      8,533        0.29        1.05
>          33 |     12,958        0.43        1.49
>          34 |     21,420        0.72        2.20
>          35 |     36,710        1.23        3.44
>          36 |     70,297        2.36        5.79
>          37 |    151,310        5.07       10.87
>          38 |    373,660       12.53       23.40
>          39 |    660,536       22.15       45.55
>          40 |    822,376       27.58       73.13
>          41 |    542,442       18.19       91.33
>          42 |    219,603        7.37       98.69
>          43 |     31,928        1.07       99.76
>          44 |      5,470        0.18       99.94
>          45 |      1,648        0.06      100.00
> ------------+-----------------------------------
>       Total |  2,981,693      100.00
>
>
> Mean birth weight of my study sample:
>
> . sum bw
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           bw |   2980093    3502.431    575.7603        300       6780
>
> sum bw if gestwk==26
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           bw |      1610    902.7248    189.5523        350       1970
>
> . sum bw if gestwk==26
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           bw |      1610    902.7248    189.5523        350       1970
>
>
> Below, if I would like to look at the mean birth weight for a particular gestational week:
>
> . sum bw if gestwk==27
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           bw |      2024    1014.961     201.809        380       1920
>
> . sum bw if gestwk==28
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           bw |      2613    1138.658     238.724        370       2000
>
> . sum bw if gestwk==29
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           bw |      3316    1295.815    278.1803        370       2480
>
>
> What I would like to do is to create a single continuous variable that would give me the mean birth weight for each gestational week so that I don't have to look at it individually as above. I would like to ideally be able to use this variable in scatter plots.
>
> If I plot as follows:
>
> scatter twoway bw gestwk
>
> I of course don't get a single estimate for each gestational week, but instaed the entire range of birth weight for a particular week is plotted.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index