Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Bug? proportion not using value labels


From   Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Bug? proportion not using value labels
Date   Thu, 25 Feb 2010 13:14:51 +0530

It does say in the manual ([R] proportion):

"When this option is supplied with one variable name, such as
over(varname), the value labels of
varname are used to identify the subpopulations. If varname does not
have labeled values (or there
are unlabeled values), the values themselves are used, provided that
they are nonnegative integers.
Noninteger values, negative values, and[...]

_labels that are not valid Stata names are substituted with a default
identifier._"

so it probably is intended behaviour. See [U] 11.3 Naming conventions
for what constitutes a legal Stata name.

T


2010/2/25 Zoe Hyde <zhyde@meddent.uwa.edu.au>:
> Hello all,
>
> I am trying to get the proportion command to use the value labels associated with a variable to identify subpopulations, but have possibly run into a bug.  It seems the value labels are only used if they only contain characters in the sets A-Za-z, 0-9 and these sets do not overlap.
>
> For example, this doesn't work:
>
> label define agegroup_cat 1 "75-79" 2 "80-84" 3 "85-89" 4 "90+"
> label values agegroup agegroup_cat
> tab agegroup
>
>   Age (w3) |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>      75-79 |      1,316       40.16       40.16
>      80-84 |      1,335       40.74       80.90
>      85-89 |        516       15.75       96.64
>        90+ |        110        3.36      100.00
> ------------+-----------------------------------
>      Total |      3,277      100.00
>
> proportion had_sex, over(agegroup)
> Proportion estimation               Number of obs    =    2783
>      _prop_1: had_sex = 0
>      _prop_2: had_sex = 1
>    _subpop_1: agegroup = 75-79
>    _subpop_2: agegroup = 80-84
>    _subpop_3: agegroup = 85-89
>    _subpop_4: agegroup = 90+
> --------------------------------------------------------------
>        Over | Proportion   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
> _prop_1      |
>   _subpop_1 |   .6043668   .0144572      .5760189    .6327147
>   _subpop_2 |   .7196429   .0134276      .6933137     .745972
>   _subpop_3 |   .8142202   .0186477      .7776554    .8507849
>   _subpop_4 |   .8902439   .0347317      .8221413    .9583465
> -------------+------------------------------------------------
> _prop_2      |
>   _subpop_1 |   .3956332   .0144572      .3672853    .4239811
>   _subpop_2 |   .2803571   .0134276       .254028    .3066863
>   _subpop_3 |   .1857798   .0186477      .1492151    .2223446
>   _subpop_4 |   .1097561   .0347317      .0416535    .1778587
> --------------------------------------------------------------
>
>
> But if groups 1 and 4 only contain characters from a single set, then it does work:
>
>
> label define agegroup_cat2 1 "75" 2 "80to84" 3 "85 to 89" 4 "Ninetyplus"
> label values agegroup agegroup_cat2
>
> . tab agegroup
>   Age (w3) |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>         75 |      1,316       40.16       40.16
>     80to84 |      1,335       40.74       80.90
>   85 to 89 |        516       15.75       96.64
>  Ninetyplus |        110        3.36      100.00
> ------------+-----------------------------------
>      Total |      3,277      100.00
>
> proportion had_sex, over(agegroup)
> Proportion estimation               Number of obs    =    2783
>      _prop_1: had_sex = 0
>      _prop_2: had_sex = 1
>           75: agegroup = 75
>    _subpop_2: agegroup = 80to84
>    _subpop_3: agegroup = 85 to 89
>   Ninetyplus: agegroup = Ninetyplus
> --------------------------------------------------------------
>        Over | Proportion   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
> _prop_1      |
>          75 |   .6043668   .0144572      .5760189    .6327147
>   _subpop_2 |   .7196429   .0134276      .6933137     .745972
>   _subpop_3 |   .8142202   .0186477      .7776554    .8507849
>  Ninetyplus |   .8902439   .0347317      .8221413    .9583465
> -------------+------------------------------------------------
> _prop_2      |
>          75 |   .3956332   .0144572      .3672853    .4239811
>   _subpop_2 |   .2803571   .0134276       .254028    .3066863
>   _subpop_3 |   .1857798   .0186477      .1492151    .2223446
>  Ninetyplus |   .1097561   .0347317      .0416535    .1778587
> --------------------------------------------------------------
>
>
> Although I could use the key to work out which groups are which, I am sending this output off to another dataset (with parmest) to produce some graphs.  It's a real pain if I have to manually edit the dataset/re-label variables for every graph I want to produce.
>
> Does anyone have any ideas on how I can get proportion to use the value labels I have defined, no matter what characters they contain?
>
>
> Zoe.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index