Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SE and CI by mrtab


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: SE and CI by mrtab
Date   Tue, 15 May 2012 23:42:51 +0100

I am glad we agree in principle. But I don't think this could be done
by just invoking -ratio-. Getting the bootstrap samples requires
working on the original variables; getting the percent of mentions
requires working with a restructuring of the same data reduced to a
two-way table. Some programming would be required to connect the two.

Nick

On Tue, May 15, 2012 at 11:33 PM, Steve Samuels <[email protected]> wrote:
>
> That's a good suggestion and is one of the vce() options for -ratio-.
>
> In my experience, these kinds of questions often present problems, especially when there is a list of alternatives. The interpretation and treatment depend on the
> availability of "No" and "Don't Know" responses. Also, interview mode and
> presentation order can affect responses.
>
> Steve
> [email protected]
>
>
>
>
> On May 15, 2012, at 2:54 PM, Nick Cox wrote:
>
> My constructive suggestion for getting standard errors and confidence
> intervals for percentage of mentions is to base them on bootstrap
> samples of persons.
>
> Nick
>
> On Tue, May 15, 2012 at 10:17 AM, Nick Cox <[email protected]> wrote:
> That sounds a plausible first approximation, but the data generation
> process varies. To make matters concrete, let's imagine a question
> asked of n persons:
>
> Which statistical software do you use routinely?
>
> However, the protocol can vary:
>
> 1. People can name as many distinct programs as they like.
>
> 2. People should name precisely k distinct programs. (Perhaps a bit
> unlikely with this particular question, but bear with me.)
>
> 3. People may name up to k distinct programs.
>
> 1', 2', 3'. As above, but the order of mention is important.
>
> In practice these protocols all lead to datasets in which responses
> are stored as several variables for analysis. (The exception in which
> (e.g.) "Stata R SAS" is packed into a single string variable is not
> much of an exception, as the contents need to be unpacked to do much
> with them.)
>
> Now, it seems key to me that the number of persons is an upper limit
> on the number of mentions of a particular program, so the percent of
> mentions of a particular answer is not bounded by 100, but by a lower
> limit. (What to do about the enthusiast who says "Stata Stata Stata"
> is a practical point, just as what to do about the enthusiast who says
> "Manchester City".)
>
> Also, although this may well be a much smaller point, the
> interpretation of missing values differs between these protocols.
>
> Nick
>
> On Mon, May 14, 2012 at 10:43 PM, Steve Samuels <[email protected]> wrote:
> Each "percentage" has the form P = (mentions of category X)/(number of mentions). Numerator and denominator are random for each person, so the percentages are actually
> ratios:
>
> ******************************
>  use http://fmwww.bc.edu/RePEc/bocode/d/drugs.dta, clear
>  mrtab inco1-inco7, include title(Sources of income) width(24)
>  egen sumi = rowtotal(inco*)
>  ratio inco1/sumi
> *****************************
>
> Since Abu is knowledgeable about SPSS, I'd appreciate a reference to the confidence interval formulas that SPSS uses when percentages add to more than 100%. (I couldn't find one in the SPSS 16 algorithms manual.) I'd appreciate it also if he would compare the calculation above to the one that SPSS reports.
>
> Steve
> [email protected]
>
>
> On May 14, 2012, at 8:40 AM, Nick Cox wrote:
>
> If a program is counting mentions, they are not people. Either way, I stand by what I said. I don't think even "sample size" is well defined for such data, so I don't see how inference is well defined.
>
> I can't comment on what SPSS does, but I repeat my request. I would be grateful for literature references showing that SPSS, or anybody else, really has a solution for this problem. Just counting mentions regardless of where they come from sounds somewhere between dubious and fallacious to me.
>
> The author of -mrtab- is Ben Jann, who is not a member of Statalist. If you want his answers, you need to write to him directly.
>
> Nick
> [email protected]
>
> Abu Camara
>
> Thanks Nick. Consider the table below and you want to get the "se"
> & "ci"for the responses variable which are in percentages. I was able
> to do this for
> other survey questions which are not multiple responses. Perhaps the
> author might consider
> including standard errors & confidence interval generation in his
> program. I will have to turn to
> SPSS which has the facility.
>
> --------------------------------------------------------------------------------------------------------.
>
> mrtab inco1-inco7, include title(Sources of income) width(24)
>
> Pct. of Pct. of
> Sources of income Freq. responses cases
> -------------------------------+-----------------------------------
> inco1 private support 226 12.83 23.25
> (partner, family,
> friends)
> inco2 public support 607 34.47 62.45
> (unemployment insurance,
> social benefits)
> inco3 drug dealing 293 16.64 30.14
> inco4 housebreaking, theft, 50 2.84 5.14
> robbery
> inco5 prostitution 82 4.66 8.44
> inco6 "mischeln"/begging 151 8.57 15.53
> inco7 legal occupation 352 19.99 36.21
> -------------------------------+-----------------------------------
> Total 1761 100.00 181.17
>
>
> On 14 May 2012 14:35, Nick Cox <[email protected]> wrote:
> I don't really have further comments. I was half-assuming that you know exactly what you seek, but if so you are not spelling it out.
>
> As I see it, you would need to specify what data generation process you expect to apply and e.g. how confidence intervals are to be defined and calculated.
>
> For example, if the question is mode of transport to work and the answers look like
>
> Car
> Car, train, walk
> Walk
> Yak
> Horse
> Camel
> Personal helicopter
> ...
>
> it is not clear to me what meaning there could be to a standard error around the percent of people who say "walk". If the principle is that people can specify a variety of answers, the associated data generation process seems elusive to me. You can always count "mentions" rather than "people" but the inference for that I don't think is obvious.
>
> So, I don't think you can blame Stata for neglecting this area unless you can point to literature in which the logic is explained.
>
> Nick
> [email protected]
>
> Abu Camara
>
> Hi Nick,
>
> Thanks for the reply.
> I have no idea of writing my own program for "mrtab" to compute "se" &
> "ci". Further help/suggestion would be appreciated.
> Official Stata appears to be weak in complex tabulation.
> Abu.
>
> On 14 May 2012 12:18, Nick Cox <[email protected]> wrote:
> SJ-5-1 st0082 . . . . . . . . . . . . . . . Tabulation of multiple responses
>  (help _mrsvmat, mrgraph, mrtab if installed) . . . . . . . . B. Jann
>  Q1/05 SJ 5(1):92--122
>  introduces new commands for the computation of one- and
>  two-way tables of multiple responses
>
> You are correct, I think. -mrtab- doesn't provide these, so you may
> need to write your own program.
>
> Nick
>
> On Mon, May 14, 2012 at 10:09 AM, Abu Camara <[email protected]> wrote:
>
> I am running one and two way tables of multiple response using the
> user-written command "mrtab" (Stata 11.2). I tried to generate both
> standard errors and
> confidence intervals for tables of percentages but I could not find
> this as an option.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index