Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: right-skewed proportion data

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: right-skewed proportion data Date Sat, 7 Jul 2012 12:08:14 +0100

```On getting the number of variables in the dataset:

I like -ds- too, but pushing the entire varlist through -ds- is over
the top. The number of variables is directly accessible as c(k).

. sysuse auto
(1978 Automobile Data)

. di c(k)
12

No; with -glm- and -link(logit)- the default family is binomial, and
vice versa.

That may boggle the mind if you were raised on some dogma that
discrete is discrete and continuous is continuous and ne'er the twain
shall meet. (I can't do that in idiomatic German, or indeed any kind
of German.) But we often cross this boundary. Just as -poisson- is
often a good model for a continuous response that is non-negative, so
binomial is a good first approximation model for continuous
proportions. Consider the variance-mean relationship for _any_
variable on [0, 1]. If that's the support then if the mean is 0 the
variance is also 0, and if the mean is 1 the variance is also 0. (The
mean can only be 0 or 1 if _all_ values are 0 or 1, and in each case
the variance is thus 0.) It is not axiomatic that for _any_ continuous
proportion the variance is greatest for a mean of 0.5, so far as I
know, but this arm-waving shows that the binomial has qualitatively
much of the right kind of behaviour.

Note also that for p near 0 (1- p) is near 1, so a variance of p(1-p)
is close to a variance of p. So, for a mean proportion near 0,
variance can be Poisson-like even though the variable is most
definitely bounded at 0 and 1.

On Sat, Jul 7, 2012 at 11:02 AM, Jörg Eulenberger <j.eulenberger@web.de> wrote:
>
> Dear Francisco,
> Yes i want to model it. I want to do a missing-analysis. Dependvar is
> the right-skewed and the undependvars are differed survey methods
> (online vs paper and pencil) under control of gender,  etc.
>
> Dear Maarten,
> thanks a lot.  So, ifI understand youproperly the correct family is
> gaussian?
>
> glm av uv uv, link(logit) vce(robust)

> Am 07.07.2012 11:40, schrieb Francisco Rowe:
>> What do you want to do? Do you want to model it?
>>
>> Francisco.
>>
>> On 07/07/2012, at 4:50 PM, Jörg Eulenberger wrote:
>>
>>>
>>>
>>> Dear Statalisters,
>>>
>>> i have a problem with right-skewed dependvariable. The range of
>>> this variable are 0-1 (proportion data 0%-100%). The Distribution looks like poisson, but the values are not discret.
>>>
>>> I created the dependvariable by counting the missings (item-non-responce) row wise. Then i
>>> standardize this variable by the number of  all possible variables
>>> (automatic filtering causes different count of vars).
>>>
>>> *************
>>> ds
>>> local varnumber `: word count `r(varlist)''
>>> gen varnumber_without_filter_missing = `varnumber'-filter_missing
>>> gen item_non_response_percent = (100/varnumber_without_filter_missing)*number_item_non_response
>>> gen item_non_response_percent_r = item_non_response_percent /100    /* range 0-1 */
>>> ******************
>>>
>>> I found the article http://www.stata-journal.com/sjpdf.html?articlenum=st0147for handling
>>> proportion data. But what is the right way to handling a right-skewed
>>> (proportion) dependvar?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```