Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Sample Wegihts

From	"Michael I. Lichter" <[email protected]>
To	[email protected]
Subject	Re: st: Sample Wegihts
Date	Tue, 09 Mar 2010 15:07:41 -0500

Jason,

In general, probability weights are equal to 1/(probability of inclusionin the sample), so your 5% sample gets a weight 20 and if you sampledthe 4 urban areas at a 10% rate, the weight for those cases should be10. This is a stratified design and should ideally be analyzed as suchusing -svyset [pw=your-pweight], strata(your-stratum-id)- whereyour-pweight is the weight you construct and your-stratum-id is avariable with a category for each stratum. If the sampling rate differsbetween the cities; e.g., if you sampled 1000 people regardless of thecity size, you would need a different weight for each city and adifferent stratum ID.

Now, I wonder what you mean about having dropped "duplicateobservations". Do you mean that you dropped the observations of Torontofrom your first sample and are substituting those from the second, or doyou mean that you combined the two samples and literally dropped onlythose observations that appeared in both? (And I wonder what kind ofdata you have that you would know they were duplicates.) If the former,what I said above applies; if the latter ... you probably shouldn't.

The other alternative is simply to combine the samples without droppingobservations. In that case, you would need to decide how much relativeweight to give to the "regular" sample vs. the "oversample"; if you wanteach to be weighed equally, you just divide their "natural "weights bytwo; that is, your-pweight = 10 instead of 20 for the 5% sample, andyour-pweight = 5 instead of 10 for the oversample. Somebody who knowsmore than me can comment on the advisability of this course; it meansthat a sampling without replacement design (which is what I assume youhave in each of the two datasets) becomes sampling with (limited)replacement.

I agree with Guang Dai (I saw his message after writing this) that howyour samples are designed is important; you haven't given us a lot ofinformation to go on.


Michael

Jason Dean, Mr wrote:

I have a quick question. I currently have a 5% random sample of Canada. I also have 4 extra random samples of only the four largest urban cities (I have dropped duplicate observations between samples).

What is the best strategy to include these extra samples and keep the sample representative of the country. I intend to conditon on these cities with dummy variable in my regression.  However, I would prefer to use sample weights but I am not sure the best way to go about creating them. Any suggestions would be greatly appreciated.

Jason


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


--
Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: Sample Wegihts
  - From: "Jason Dean, Mr" <[email protected]>

References:
- st: Sample Wegihts
  - From: "Jason Dean, Mr" <[email protected]>

Prev by Date: st: RE: Re: lrtest of nested logistic models with vce(cluster) specified
Next by Date: st: RE: about macro's double quotes
Previous by thread: Re: st: Sample Wegihts
Next by thread: RE: st: Sample Wegihts
Index(es):
- Date
- Thread