[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Standard error of a ratio of two random variables

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: Standard error of a ratio of two random variables
Date	Thu, 15 Jan 2009 12:08:12 -0500

Austin-


Sergiy defined the denominator population to be:

[Y] population at least 10 years old and less than 11 AND in the 5thgrade

Therefore for Y<., we know whether the child is in school, and, ifso, what the current grade is. This suggests to me that [X] could bepredicted from [Y] and, possibly, from other factors in therestricted sample. If [Y] is false because the child is in the 5thgrade, but is <10, I would expect P(X =1) to be higher than itsoverall value in the restricted sample. If [Y] is false because thechild is >=11 although in the 5th grade, I would expect P(X=1) to belower than its overall value in the restricted sample. Lastly, if[Y] is true, I would expect P(X =1) to be very high. Thus multipleimputation of X in the restricted sample might improve on your upperand lower bounds.

"Y" missing is more troublesome. I like the idea of weighting by the1/P(Y<.), if the data support it.


-Steve

On Jan 15, 2009, at 10:51 AM, Austin Nichols wrote:

Sergiy and Steven--

I think more information is needed, but I would probably avoidimputation.

Let's say the basic setup is to estimate totals X,Y, and X/Y for:
[X] new entrant (of any age) to 5th grade
[Y] population at least 10 years old and less than 11
Being a new entrant is known if response to the "grade in previous
year" is known, which is subject to recall error, has more missings
than age, etc.

It seems likely that those kids never in school are more likely to
have no response to the "grade in previous
year" i.e. to have missing X.  So those who have true X=0 will be more
likely to have observed X=. which means I think you have problems
estimating this no matter how you do it.  I think a plausible lower
bound might be constructed by estimating new weights for those cases
with Y<. and then estimating the total of X0=max(0,X) and Y and X0/Y
in that sample.  Then a fairly plausible upper bound might be
constructed by estimating new weights for those cases with Y<. and X<.
and then estimating the total of X and Y and X/Y in that much more
restricted sample.  The estimation of new weights w1 for the first
case can just inflate each weight by the ratio of the sum of weights
within the restricted sample to the sum of weights for whole survey,
and in the second case inflate each w1 by the ratio of the sum of w1
for each single year of age within the restricted sample to the sum of
w1 for each single year of age.  If you have variables that are never
missing in the whole survey, you can use those in a parametric model
to inflate the weights, e.g. estimate a logit of 1(Y<.) on Z to
predict phat and then multiply weights by 1/phat.

Maybe Manski would have better ideas:
http://faculty.wcas.northwestern.edu/~cfm754/

On Thu, Jan 15, 2009 at 10:20 AM, Steven Samuels
<[email protected]> wrote:

Sergiy-
The -ratio- and -svy: ratio- commands in Stata ignore observationswithmissing values in X or Y. So do the Stata programs that you andAustinwrote. You risk bias if you estimate numerator and denominatorseparately(ignoring missing values in each); and, if you do that, I don'tknow ofsimple way of getting standard errors. Of course you also riskbias if youuse only complete cases. I think that the best approach would bemultipleimputation of X and Y. Then use the standard commands. I'd guessthat there
are good prospects for predicting your variable "A", at least.
By the way, I often compute SE's for log(X/Y) (use -nlcom- after -ratio- or-svy: ratio-) and transform to CI's for X/Y. That way, a CI for X/Y is
consistent with that for Y/X.

-Steve




On Jan 14, 2009, at 3:36 PM, Sergiy Radyakin wrote:
On Tue, Jan 13, 2009 at 12:01 PM, Austin Nichols
<[email protected]> wrote:
Sergiy Radyakin <[email protected]>:
With different N's you would divide by the product of the sqrtof N1and N2 instead. My point was just to point out what you wouldneed tomultiply rho if you wanted to keep rho for some reason. Andyes, this
is all approximation--if X and Y were normal you might consult
http://www.jstor.org/stable/pdfplus/2334671.pdf

But the example does not seem to bear very well on your actual
application--are the X and Y two variables on the same survey with
different degrees of missingness? The approach given so fardoes not
seem the optimal solution in that case...
Thank you Austin, the X and Y are coming from the same survey.

To make things more clear, and since Steven Samuels asked
specifically, here is what I am doing:

I need the mean and the SE for the following indicator "Primary
Completion Rate is: new entrants in the last grade divided by
population being in the last grade and of the proper age (for thelastgrade)". (with a slightly different wording the definition of PCRand
details are here:
[http://www.un.org/esa/sustdev/natlinfo/indicators/methodology_sheets/education/intake_education.pdf]
on the first page of the document(see "(b) brief definition")

The definition uses the following characteristics:
[A] being a new entrant (of any age)
[B] being of proper age, e.g. 12y.o., and

Being a new entrant is known if response to the "grade in previous
year" is known, which is subject to recall error, has more missings
than age, etc.

Not all As are Bs and not all Bs are As.

So far I am estimating separately (with svy:total) the numerator and
denominator. Stata returns me the two numbers for each: the mean and
SE.
I then need the mean and SE of the ratio of these two. They are
expected to be well away from zero (if this constitutes a problem),
but note that the ratio is not a proportion, it can be more than100%.
I then use the formula (see the program in the first post and the
quoted PDF file) to construct the SE for the ratio of the two.

Austin mentioned that there might be a better solution. I hope this
information may help to determine if there is really somethingbetter
to do. I don't care about computing time, but  correctness and
conceptual unambiguity are of highest importance.

Thank you, Sergiy Radyakin

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Standard error of a ratio of two random variables
  - From: "Austin Nichols" <[email protected]>

References:
- st: Standard error of a ratio of two random variables
  - From: "Sergiy Radyakin" <[email protected]>
- Re: st: Standard error of a ratio of two random variables
  - From: Jeph Herrin <[email protected]>
- RE: st: Standard error of a ratio of two random variables
  - From: "Feiveson, Alan H. (JSC-SK311)" <[email protected]>
- Re: st: Standard error of a ratio of two random variables
  - From: Stas Kolenikov <[email protected]>
- Re: st: Standard error of a ratio of two random variables
  - From: Sergiy Radyakin <[email protected]>
- Re: st: Standard error of a ratio of two random variables
  - From: "Austin Nichols" <[email protected]>
- Re: st: Standard error of a ratio of two random variables
  - From: Sergiy Radyakin <[email protected]>
- Re: st: Standard error of a ratio of two random variables
  - From: Steven Samuels <[email protected]>
- Re: st: Standard error of a ratio of two random variables
  - From: "Austin Nichols" <[email protected]>

Prev by Date: st: How to compare performance (goodness-of-fit) of very different modelling approaches?
Next by Date: Re: Re: st: matsize & variables,
Previous by thread: Re: st: Standard error of a ratio of two random variables
Next by thread: Re: st: Standard error of a ratio of two random variables
Index(es):
- Date
- Thread