Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SE and CI by mrtab


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: SE and CI by mrtab
Date   Tue, 15 May 2012 19:54:42 +0100

My constructive suggestion for getting standard errors and confidence
intervals for percentage of mentions is to base them on bootstrap
samples of persons.

Nick

On Tue, May 15, 2012 at 10:17 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> That sounds a plausible first approximation, but the data generation
> process varies. To make matters concrete, let's imagine a question
> asked of n persons:
>
> Which statistical software do you use routinely?
>
> However, the protocol can vary:
>
> 1. People can name as many distinct programs as they like.
>
> 2. People should name precisely k distinct programs. (Perhaps a bit
> unlikely with this particular question, but bear with me.)
>
> 3. People may name up to k distinct programs.
>
> 1', 2', 3'. As above, but the order of mention is important.
>
> In practice these protocols all lead to datasets in which responses
> are stored as several variables for analysis. (The exception in which
> (e.g.) "Stata R SAS" is packed into a single string variable is not
> much of an exception, as the contents need to be unpacked to do much
> with them.)
>
> Now, it seems key to me that the number of persons is an upper limit
> on the number of mentions of a particular program, so the percent of
> mentions of a particular answer is not bounded by 100, but by a lower
> limit. (What to do about the enthusiast who says "Stata Stata Stata"
> is a practical point, just as what to do about the enthusiast who says
> "Manchester City".)
>
> Also, although this may well be a much smaller point, the
> interpretation of missing values differs between these protocols.
>
> Nick
>
> On Mon, May 14, 2012 at 10:43 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
>> Each "percentage" has the form P = (mentions of category X)/(number of mentions).  Numerator and denominator are random for each person, so the percentages are actually
>> ratios:
>>
>> ******************************
>>  use http://fmwww.bc.edu/RePEc/bocode/d/drugs.dta, clear
>>  mrtab inco1-inco7, include title(Sources of income) width(24)
>>  egen sumi = rowtotal(inco*)
>>  ratio inco1/sumi
>> *****************************
>>
>> Since Abu is knowledgeable about SPSS, I'd appreciate a reference to the confidence interval formulas that SPSS uses when percentages add to more than 100%.  (I couldn't find one in the SPSS 16 algorithms manual.)  I'd appreciate it also if he would compare the calculation above to the one that SPSS reports.
>>
>> Steve
>> sjsamuels@gmail.com
>>
>>
>> On May 14, 2012, at 8:40 AM, Nick Cox wrote:
>>
>> If a program is counting mentions, they are not people. Either way, I stand by what I said. I don't think even "sample size" is well defined for such data, so I don't see how inference is well defined.
>>
>> I can't comment on what SPSS does, but I repeat my request. I would be grateful for literature references showing that SPSS, or anybody else, really has a solution for this problem. Just counting mentions regardless of where they come from sounds somewhere between dubious and fallacious to me.
>>
>> The author of -mrtab- is Ben Jann, who is not a member of Statalist. If you want his answers, you need to write to him directly.
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Abu Camara
>>
>> Thanks Nick. Consider the table below and you want to get the "se"
>> & "ci"for the responses variable which are in percentages. I was able
>> to do this for
>> other survey questions which are not multiple responses. Perhaps the
>> author might consider
>> including standard errors & confidence interval generation in his
>> program. I will have to turn to
>> SPSS which has the facility.
>>
>> --------------------------------------------------------------------------------------------------------.
>>
>> mrtab inco1-inco7, include title(Sources of income) width(24)
>>
>> Pct. of     Pct. of
>> Sources of income       Freq.   responses       cases
>> -------------------------------+-----------------------------------
>> inco1          private support         226       12.83       23.25
>> (partner, family,
>> friends)
>> inco2           public support         607       34.47       62.45
>> (unemployment insurance,
>> social benefits)
>> inco3             drug dealing         293       16.64       30.14
>> inco4    housebreaking, theft,          50        2.84        5.14
>> robbery
>> inco5             prostitution          82        4.66        8.44
>> inco6       "mischeln"/begging         151        8.57       15.53
>> inco7         legal occupation         352       19.99       36.21
>> -------------------------------+-----------------------------------
>> Total        1761      100.00      181.17
>>
>>
>> On 14 May 2012 14:35, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>>> I don't really have further comments. I was half-assuming that you know exactly what you seek, but if so you are not spelling it out.
>>>
>>> As I see it, you would need to specify what data generation process you expect to apply and e.g. how confidence intervals are to be defined and calculated.
>>>
>>> For example, if the question is mode of transport to work and the answers look like
>>>
>>> Car
>>> Car, train, walk
>>> Walk
>>> Yak
>>> Horse
>>> Camel
>>> Personal helicopter
>>> ...
>>>
>>> it is not clear to me what meaning there could be to a standard error around the percent of people who say "walk". If the principle is that people can specify a variety of answers, the associated data generation process seems elusive to me. You can always count "mentions" rather than "people" but the inference for that I don't think is obvious.
>>>
>>> So, I don't think you can blame Stata for neglecting this area unless you can point to literature in which the logic is explained.
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> Abu Camara
>>>
>>> Hi Nick,
>>>
>>> Thanks for the reply.
>>> I have no idea of writing my own program for "mrtab" to compute "se" &
>>> "ci". Further help/suggestion would be appreciated.
>>> Official Stata appears to be weak in complex tabulation.
>>> Abu.
>>>
>>> On 14 May 2012 12:18, Nick Cox <njcoxstata@gmail.com> wrote:
>>>> SJ-5-1  st0082  . . . . . . . . . . . . . . . Tabulation of multiple responses
>>>>        (help _mrsvmat, mrgraph, mrtab if installed)  . . . . . . . .  B. Jann
>>>>        Q1/05   SJ 5(1):92--122
>>>>        introduces new commands for the computation of one- and
>>>>        two-way tables of multiple responses
>>>>
>>>> You are correct, I think. -mrtab- doesn't provide these, so you may
>>>> need to write your own program.
>>>>
>>>> Nick
>>>>
>>>> On Mon, May 14, 2012 at 10:09 AM, Abu Camara <abucamara@gmail.com> wrote:
>>>>
>>>>> I am running one and two way tables of multiple response using the
>>>>> user-written command "mrtab" (Stata 11.2). I tried to generate both
>>>>> standard errors and
>>>>> confidence intervals for tables of percentages but I could not find
>>>>> this as an option.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index