Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: SE and CI by mrtab |

Date |
Tue, 15 May 2012 19:54:42 +0100 |

My constructive suggestion for getting standard errors and confidence intervals for percentage of mentions is to base them on bootstrap samples of persons. Nick On Tue, May 15, 2012 at 10:17 AM, Nick Cox <njcoxstata@gmail.com> wrote: > That sounds a plausible first approximation, but the data generation > process varies. To make matters concrete, let's imagine a question > asked of n persons: > > Which statistical software do you use routinely? > > However, the protocol can vary: > > 1. People can name as many distinct programs as they like. > > 2. People should name precisely k distinct programs. (Perhaps a bit > unlikely with this particular question, but bear with me.) > > 3. People may name up to k distinct programs. > > 1', 2', 3'. As above, but the order of mention is important. > > In practice these protocols all lead to datasets in which responses > are stored as several variables for analysis. (The exception in which > (e.g.) "Stata R SAS" is packed into a single string variable is not > much of an exception, as the contents need to be unpacked to do much > with them.) > > Now, it seems key to me that the number of persons is an upper limit > on the number of mentions of a particular program, so the percent of > mentions of a particular answer is not bounded by 100, but by a lower > limit. (What to do about the enthusiast who says "Stata Stata Stata" > is a practical point, just as what to do about the enthusiast who says > "Manchester City".) > > Also, although this may well be a much smaller point, the > interpretation of missing values differs between these protocols. > > Nick > > On Mon, May 14, 2012 at 10:43 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Each "percentage" has the form P = (mentions of category X)/(number of mentions). Numerator and denominator are random for each person, so the percentages are actually >> ratios: >> >> ****************************** >> use http://fmwww.bc.edu/RePEc/bocode/d/drugs.dta, clear >> mrtab inco1-inco7, include title(Sources of income) width(24) >> egen sumi = rowtotal(inco*) >> ratio inco1/sumi >> ***************************** >> >> Since Abu is knowledgeable about SPSS, I'd appreciate a reference to the confidence interval formulas that SPSS uses when percentages add to more than 100%. (I couldn't find one in the SPSS 16 algorithms manual.) I'd appreciate it also if he would compare the calculation above to the one that SPSS reports. >> >> Steve >> sjsamuels@gmail.com >> >> >> On May 14, 2012, at 8:40 AM, Nick Cox wrote: >> >> If a program is counting mentions, they are not people. Either way, I stand by what I said. I don't think even "sample size" is well defined for such data, so I don't see how inference is well defined. >> >> I can't comment on what SPSS does, but I repeat my request. I would be grateful for literature references showing that SPSS, or anybody else, really has a solution for this problem. Just counting mentions regardless of where they come from sounds somewhere between dubious and fallacious to me. >> >> The author of -mrtab- is Ben Jann, who is not a member of Statalist. If you want his answers, you need to write to him directly. >> >> Nick >> n.j.cox@durham.ac.uk >> >> Abu Camara >> >> Thanks Nick. Consider the table below and you want to get the "se" >> & "ci"for the responses variable which are in percentages. I was able >> to do this for >> other survey questions which are not multiple responses. Perhaps the >> author might consider >> including standard errors & confidence interval generation in his >> program. I will have to turn to >> SPSS which has the facility. >> >> --------------------------------------------------------------------------------------------------------. >> >> mrtab inco1-inco7, include title(Sources of income) width(24) >> >> Pct. of Pct. of >> Sources of income Freq. responses cases >> -------------------------------+----------------------------------- >> inco1 private support 226 12.83 23.25 >> (partner, family, >> friends) >> inco2 public support 607 34.47 62.45 >> (unemployment insurance, >> social benefits) >> inco3 drug dealing 293 16.64 30.14 >> inco4 housebreaking, theft, 50 2.84 5.14 >> robbery >> inco5 prostitution 82 4.66 8.44 >> inco6 "mischeln"/begging 151 8.57 15.53 >> inco7 legal occupation 352 19.99 36.21 >> -------------------------------+----------------------------------- >> Total 1761 100.00 181.17 >> >> >> On 14 May 2012 14:35, Nick Cox <n.j.cox@durham.ac.uk> wrote: >>> I don't really have further comments. I was half-assuming that you know exactly what you seek, but if so you are not spelling it out. >>> >>> As I see it, you would need to specify what data generation process you expect to apply and e.g. how confidence intervals are to be defined and calculated. >>> >>> For example, if the question is mode of transport to work and the answers look like >>> >>> Car >>> Car, train, walk >>> Walk >>> Yak >>> Horse >>> Camel >>> Personal helicopter >>> ... >>> >>> it is not clear to me what meaning there could be to a standard error around the percent of people who say "walk". If the principle is that people can specify a variety of answers, the associated data generation process seems elusive to me. You can always count "mentions" rather than "people" but the inference for that I don't think is obvious. >>> >>> So, I don't think you can blame Stata for neglecting this area unless you can point to literature in which the logic is explained. >>> >>> Nick >>> n.j.cox@durham.ac.uk >>> >>> Abu Camara >>> >>> Hi Nick, >>> >>> Thanks for the reply. >>> I have no idea of writing my own program for "mrtab" to compute "se" & >>> "ci". Further help/suggestion would be appreciated. >>> Official Stata appears to be weak in complex tabulation. >>> Abu. >>> >>> On 14 May 2012 12:18, Nick Cox <njcoxstata@gmail.com> wrote: >>>> SJ-5-1 st0082 . . . . . . . . . . . . . . . Tabulation of multiple responses >>>> (help _mrsvmat, mrgraph, mrtab if installed) . . . . . . . . B. Jann >>>> Q1/05 SJ 5(1):92--122 >>>> introduces new commands for the computation of one- and >>>> two-way tables of multiple responses >>>> >>>> You are correct, I think. -mrtab- doesn't provide these, so you may >>>> need to write your own program. >>>> >>>> Nick >>>> >>>> On Mon, May 14, 2012 at 10:09 AM, Abu Camara <abucamara@gmail.com> wrote: >>>> >>>>> I am running one and two way tables of multiple response using the >>>>> user-written command "mrtab" (Stata 11.2). I tried to generate both >>>>> standard errors and >>>>> confidence intervals for tables of percentages but I could not find >>>>> this as an option. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: SE and CI by mrtab***From:*Steve Samuels <sjsamuels@gmail.com>

**References**:**st: SE and CI by mrtab***From:*Abu Camara <abucamara@gmail.com>

**Re: st: SE and CI by mrtab***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: SE and CI by mrtab***From:*Abu Camara <abucamara@gmail.com>

**RE: st: SE and CI by mrtab***From:*Nick Cox <n.j.cox@durham.ac.uk>

**Re: st: SE and CI by mrtab***From:*Abu Camara <abucamara@gmail.com>

**RE: st: SE and CI by mrtab***From:*Nick Cox <n.j.cox@durham.ac.uk>

**Re: st: SE and CI by mrtab***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: SE and CI by mrtab***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: limit to number of digits that can be precisely input into a Stata** - Next by Date:
**RE: st: Creating an index** - Previous by thread:
**Re: st: SE and CI by mrtab** - Next by thread:
**Re: st: SE and CI by mrtab** - Index(es):