Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Weighting on Sub-samples of Complex Survey Data and Specifying Correlation for PA Models


From   Ryan McCann <rmccann@keybridgeresearch.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Weighting on Sub-samples of Complex Survey Data and Specifying Correlation for PA Models
Date   Thu, 6 May 2010 11:37:43 -0400

Dear Statalist Community Members,

I?m working with a small business firm survey (a 4 year panel of about 5000
small business start-ups which includes financial, geographic, and owner
data).  I am trying to assess the impact of credit card use on revenues. 
The regression at present looks like this:

xtreg lnRevenue  lnCreditCard lnAssests AssetTurnover NetMargin
HumanCapitalVars [pweight=final longitudinal weight], pa corr(exchangeable)

 I am running into two significant problems: 

Firstly, there are a large number of missing values, so that when the
regression is fully specified, I am left with about 1200 observations out of
a total of 24,000 when the data is in long form.  Since the data comes from
a complex survey we need to use weights.  Given the fact that the regression
is only being run on small subset of the full sample (and a t-test of means
shows there is most likely some selection bias) it seems intuitive that the
weights will not provide an accurate metric for arriving at unbiased
estimates. Is there any consensus on how to handle this type of situation?
(Imputations of missing data have already been done to the extent I am
comfortable, and the resulting subsample is still very small compared to the
original).

Secondly, the random effects model would seem more appropriate than fixed
effects because most of the variation in the sample is between as opposed to
within  (the panel is not that wide to begin with (average time series for
an individual is only 2.5 periods).  STATA does not allow for the use of
weights with RE so we are using a Pooled Average regression.  At this point
I?m trying to determine the type of autocorrelation that is present.  The
?pa? regression in STATA allows for Independent, Exchangeable, Unstructured,
and AR error correlations over time.  I?ve run regressions by year and
predicted the error terms for each time period.  I then regressed these
errors on their lags and (t-2) lags and have come out with fairly consistent
coefficients on the lag term (around .57, a couple of the coeffecients came
out to be around .3) (I used this method in the absence of knowing and
Durbin-Watson type test that allows for weights).  The error correlations I
get by using the exchangeable option come out around .59.  It appears that
the independent option (i.e. no autocorrelation) is not appropriate, but I?m
wondering how I choose between Exchangeable and Unstructured (not sure if AR
process is present).

Any suggestions are greatly appreciated.

Best Regards,
Ryan


Ryan McCann
Senior Analyst
Keybridge Research LLC
Office: 202.965.9487 | Mobile: 774.521.8874




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index