Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SE and CI by mrtab

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: SE and CI by mrtab
Date	Tue, 15 May 2012 18:33:36 -0400

That's a good suggestion and is one of the vce() options for -ratio-.

In my experience, these kinds of questions often present problems, especially when there is a list of alternatives. The interpretation and treatment depend on the
availability of "No" and "Don't Know" responses. Also, interview mode and
presentation order can affect responses.

Steve
[email protected]

On May 15, 2012, at 2:54 PM, Nick Cox wrote:

My constructive suggestion for getting standard errors and confidence
intervals for percentage of mentions is to base them on bootstrap
samples of persons.

Nick

On Tue, May 15, 2012 at 10:17 AM, Nick Cox <[email protected]> wrote:
That sounds a plausible first approximation, but the data generation
process varies. To make matters concrete, let's imagine a question
asked of n persons:

Which statistical software do you use routinely?

However, the protocol can vary:

1. People can name as many distinct programs as they like.

2. People should name precisely k distinct programs. (Perhaps a bit
unlikely with this particular question, but bear with me.)

3. People may name up to k distinct programs.

1', 2', 3'. As above, but the order of mention is important.

In practice these protocols all lead to datasets in which responses
are stored as several variables for analysis. (The exception in which
(e.g.) "Stata R SAS" is packed into a single string variable is not
much of an exception, as the contents need to be unpacked to do much
with them.)

Now, it seems key to me that the number of persons is an upper limit
on the number of mentions of a particular program, so the percent of
mentions of a particular answer is not bounded by 100, but by a lower
limit. (What to do about the enthusiast who says "Stata Stata Stata"
is a practical point, just as what to do about the enthusiast who says
"Manchester City".)

Also, although this may well be a much smaller point, the
interpretation of missing values differs between these protocols.

Nick

On Mon, May 14, 2012 at 10:43 PM, Steve Samuels <[email protected]> wrote:
Each "percentage" has the form P = (mentions of category X)/(number of mentions). Numerator and denominator are random for each person, so the percentages are actually
ratios:

******************************
use http://fmwww.bc.edu/RePEc/bocode/d/drugs.dta, clear
mrtab inco1-inco7, include title(Sources of income) width(24)
egen sumi = rowtotal(inco*)
ratio inco1/sumi
*****************************

Since Abu is knowledgeable about SPSS, I'd appreciate a reference to the confidence interval formulas that SPSS uses when percentages add to more than 100%. (I couldn't find one in the SPSS 16 algorithms manual.) I'd appreciate it also if he would compare the calculation above to the one that SPSS reports.

Steve
[email protected]

On May 14, 2012, at 8:40 AM, Nick Cox wrote:

If a program is counting mentions, they are not people. Either way, I stand by what I said. I don't think even "sample size" is well defined for such data, so I don't see how inference is well defined.

I can't comment on what SPSS does, but I repeat my request. I would be grateful for literature references showing that SPSS, or anybody else, really has a solution for this problem. Just counting mentions regardless of where they come from sounds somewhere between dubious and fallacious to me.

The author of -mrtab- is Ben Jann, who is not a member of Statalist. If you want his answers, you need to write to him directly.

Nick
[email protected]

Abu Camara

Thanks Nick. Consider the table below and you want to get the "se"
& "ci"for the responses variable which are in percentages. I was able
to do this for
other survey questions which are not multiple responses. Perhaps the
author might consider
including standard errors & confidence interval generation in his
program. I will have to turn to
SPSS which has the facility.

--------------------------------------------------------------------------------------------------------.

mrtab inco1-inco7, include title(Sources of income) width(24)

Pct. of Pct. of
Sources of income Freq. responses cases
-------------------------------+-----------------------------------
inco1 private support 226 12.83 23.25
(partner, family,
friends)
inco2 public support 607 34.47 62.45
(unemployment insurance,
social benefits)
inco3 drug dealing 293 16.64 30.14
inco4 housebreaking, theft, 50 2.84 5.14
robbery
inco5 prostitution 82 4.66 8.44
inco6 "mischeln"/begging 151 8.57 15.53
inco7 legal occupation 352 19.99 36.21
-------------------------------+-----------------------------------
Total 1761 100.00 181.17

On 14 May 2012 14:35, Nick Cox <[email protected]> wrote:
I don't really have further comments. I was half-assuming that you know exactly what you seek, but if so you are not spelling it out.

As I see it, you would need to specify what data generation process you expect to apply and e.g. how confidence intervals are to be defined and calculated.

For example, if the question is mode of transport to work and the answers look like

Car
Car, train, walk
Walk
Yak
Horse
Camel
Personal helicopter
...

it is not clear to me what meaning there could be to a standard error around the percent of people who say "walk". If the principle is that people can specify a variety of answers, the associated data generation process seems elusive to me. You can always count "mentions" rather than "people" but the inference for that I don't think is obvious.

So, I don't think you can blame Stata for neglecting this area unless you can point to literature in which the logic is explained.

Nick
[email protected]

Abu Camara

Hi Nick,

Thanks for the reply.
I have no idea of writing my own program for "mrtab" to compute "se" &
"ci". Further help/suggestion would be appreciated.
Official Stata appears to be weak in complex tabulation.
Abu.

On 14 May 2012 12:18, Nick Cox <[email protected]> wrote:
SJ-5-1 st0082 . . . . . . . . . . . . . . . Tabulation of multiple responses
(help _mrsvmat, mrgraph, mrtab if installed) . . . . . . . . B. Jann
Q1/05 SJ 5(1):92--122
introduces new commands for the computation of one- and
two-way tables of multiple responses

You are correct, I think. -mrtab- doesn't provide these, so you may
need to write your own program.

Nick

On Mon, May 14, 2012 at 10:09 AM, Abu Camara <[email protected]> wrote:

I am running one and two way tables of multiple response using the
user-written command "mrtab" (Stata 11.2). I tried to generate both
standard errors and
confidence intervals for tables of percentages but I could not find
this as an option.

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>

References:
- st: SE and CI by mrtab
  - From: Abu Camara <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Abu Camara <[email protected]>
- RE: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Abu Camara <[email protected]>
- RE: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Steve Samuels <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>

Prev by Date: st: RE: How do I obtain the long run elasticities in a Vector Error Correction Model using the VEC command
Next by Date: Re: st: SE and CI by mrtab
Previous by thread: Re: st: SE and CI by mrtab
Next by thread: Re: st: SE and CI by mrtab
Index(es):
- Date
- Thread