Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: best way to estimate overall mean of clustered, stratified data using xtreg


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: best way to estimate overall mean of clustered, stratified data using xtreg
Date   Thu, 13 Sep 2012 21:04:26 -0400

I don't know what "prospectively chosen" means, but from your choice of
words, I assume that clusters were not sampled randomly. Therefore, I'm
not sure what you intend for the target population of the mean.

1) If the target is just the 18 control clusters and nothing else, then you you
have 100% of the target population. Considering the mean as a
descriptive statistic, the standard error will be zero, and -summarize-
will give you that mean.


2) If the 12 clusters happened to be selected randomly, then the target
population is all births in the three districts during the study period.
In that case, then you must compute the probability of selection for
each cluster and, as I suggested, use -svy: mean- with districts as
strata and clusters as PSUs. If you know the total number of births and
other vital statistics for each district, then you can do
post-stratification adjustments, including raking (-survwgt- from SSC)
and calibration (-calibrate and -calibest- from SSC).

If clusters were not of similar size, but were chosen with
simple random sampling, then the post-stratification adjustments are a must.

3) If you did not randomly select cluster, you can still attempt to
estimate the total for all births in the same district, with the same
techniques as in (2), but you must state that the assumptions of random
sampling are not met.  


Note that the intervention evaluation will require of the
clustering within district, whether you base inference 
on the randomization distribution or on some other data generating mechanism.

Steve

n Sep 13, 2012, at 5:15 AM, Pagel, Christina wrote:

Dear Steve, Thanks for replying!

The data come from a cluster randomised controlled trial. Basically
three different districts were involved. In each district 12 clusters
were prospectively chosen and then (within districts) randomised to
control or intervention.

Within each cluster all births were recorded as well as various
protective birth practices associated with each birth. I want to
calculate the cluster adjusted mean of the count of birth practices for
the control arm only (ie 18 clusters, 6 in each district). The number of
births in each cluster ranges from about 350 to 650...

does that make it clearer ? Thanks Christina

-----Original Message----- From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Steve Samuels
Sent: 12 September 2012 11:26 PM To: statalist@hsphsun2.harvard.edu
Subject: Re: st: best way to estimate overall mean of clustered,
stratified data using xtreg


You are describing what was apparently a sample survey of three
districts. I would recommend that you -svyset- your data and use -svy:
mean-. At the very minimum, you would:

svyset village [pweight =??], stratum(district).

You will have to supply the probability weight. This advice might change
if you describe the study design, sampling process, and purpose in more
detail.


Steve


On Sep 12, 2012, at 1:51 PM, Pagel, Christina wrote:

I've got data (9000 ish records) that was collected in 18 clusters
(villages) in 3 geographical districts (6 clusters in each district).

I've got a variable that is an integer count variable and I want to
estimate its mean across all the data, taking clustering into account
(since there is definitely intra cluster correlation).

If there were no districts I would simply do:

Xtreg CountVar, i(TrialCluster) re

And then the returned constant would be the mean and I'd also get
confidence intervals.

To take districts into account (the variable is quite dependent on
district), I thought I would do:

Xtreg CountVar i.District1 i.District2 i.District3, i(TrialCluster) re

Where the District variables are mutually exclusive binary variables
saying which district the record is in...

The question is how do I now get an overall estimate for the mean from
the results?

One way I thought of is to generate the estimated value for each record
and take the mean of that:

Gen
EstimatedCount=coeff1*District1+coeff2*District2+coeff3*District3+const

And then do:

Means EstimatedCount

To get the estimate of the mean - this works (as in generates a
plausible mean) but the condifidence intervals are far too small to be
realistic for this data... which makes me think there must be a better
way of doing it!

Any suggestions would be gratefully received!

Christina



* * For searches and help try: * http://www.stata.com/help.cgi?search *
http://www.stata.com/support/statalist/faq *
http://www.ats.ucla.edu/stat/stata/


* * For searches and help try: * http://www.stata.com/help.cgi?search *
http://www.stata.com/support/statalist/faq *
http://www.ats.ucla.edu/stat/stata/



* * For searches and help try: * http://www.stata.com/help.cgi?search *
http://www.stata.com/support/statalist/faq *
http://www.ats.ucla.edu/stat/stata/

 
On Sep 13, 2012, at 5:15 AM, Pagel, Christina wrote:

Dear Steve,
Thanks for replying!

The data come from a cluster randomised controlled trial. Basically three different districts were involved. In each district 12 clusters were prospectively chosen and then (within districts) randomised to control or intervention.

Within each cluster all births were recorded as well as various protective birth practices associated with each birth. I want to calculate the cluster adjusted mean of the count of birth practices for the control arm only (ie 18 clusters, 6 in each district). The number of births in each cluster ranges from about 350 to 650...

does that make it clearer ?
Thanks
Christina

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Steve Samuels
Sent: 12 September 2012 11:26 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: best way to estimate overall mean of clustered, stratified data using xtreg


You are describing what was apparently a sample survey of three
districts. I would recommend that you -svyset- your data and use -svy:
mean-. At the very minimum, you would:

svyset village [pweight =??], stratum(district).

You will have to supply the probability weight. This advice might change
if you describe the study design, sampling process, and purpose in more
detail.


Steve


On Sep 12, 2012, at 1:51 PM, Pagel, Christina wrote:

I've got data (9000 ish records) that was collected in 18 clusters (villages) in 3 geographical districts (6 clusters in each district). 

I've got a variable that is an integer count variable and I want to estimate its mean across all the data, taking clustering into account (since there is definitely intra cluster correlation). 

If there were no districts I would simply do:

Xtreg CountVar, i(TrialCluster) re

And then the returned constant would be the mean and I'd also get confidence intervals. 

To take districts into account (the variable is quite dependent on district), I thought I would do:

Xtreg CountVar i.District1 i.District2 i.District3, i(TrialCluster) re 

Where the District variables are mutually exclusive binary variables saying which district the record is in...

The question is how do I now get an overall estimate for the mean from the results? 

One way I thought of is to generate the estimated value for each record and take the mean of that:

Gen  EstimatedCount=coeff1*District1+coeff2*District2+coeff3*District3+const 

And then do:

Means EstimatedCount

To get the estimate of the mean - this works (as in generates a plausible mean) but the condifidence intervals are far too small to be realistic for this data... which makes me think there must be a better way of doing it!

Any suggestions would be gratefully received! 

Christina 



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index