Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: non integer fweights or alternatives to deflate N for significance tests in logistic regression


From   Teresio Poggio <terlist@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: non integer fweights or alternatives to deflate N for significance tests in logistic regression
Date   Wed, 30 Nov 2005 12:24:25 +0100

Dear statalisters,

A question that may have an easy answer (but that I don't know yet...):

I'm modeling a logistic regression on a pooled data set with both
census and survey data, for many years,  in order to analize change
over time on a few variables.
Survey data comes from electronic datasets and are at household level
(about 10 thousands records/households per year), while census data
come from old books and are collapsed according to relevant variables
and their frequency distribution (and, of course, availability on
printed tables; about 10 millions households per year). For instance:

SOURCE/YEAR       CLASS             REGION        FWEIGHT
------------------------------------------------------------------------------------
survey_1999        middle class      South                     1
survey_1999        middle class      North                      1
...                       ...                      ...                
        ...
census_1951       service class      North                  234
census_1951       blue collars       Center        1.145,434
...                       ...                      ...                
        ...

Using fweight (that is =1 for survey households-level records) in such
a way is leading me to N= 28,000,000.
As a result, even parameters' estimates in the 0.0004 magnitudo are
supposed to be statistically different from 0. This clearly make no
substantive sense, but I'ld prefer to see in the output no N-inflated
significance tests.

One pratical  rule that has been suggested to me is to divide census
records fweight by a constant (let's say 10,000). In such a  way I'll
have a smaller N while preserving my variables distribution within
census year.

Unfortunately, following this rule, I'll have non integer fweights
that are not handled by Stata. Other available type of weights in
Stata are not related to frequencies.

Do you have any idea on how to handle this problem or how to deflate N
for significance test in logit & mlogit?
Any help would be greatly appreciated. Thanks in advance,

Teresio

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index