Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: A methodological problem


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: A methodological problem
Date   Thu, 30 Oct 2008 10:33:18 -0000

I think it's fair to say that few statistically-minded people would say
exactly the same thing on this question and that such people can
disagree strongly on it. I will make a few points, but would not be
surprised at partial or complete dissent over any one. I also underline
that a complete discussion would include many more points that arise
directly or indirectly here, although several of them would be the same
points approached or phrased differently. 

1. This attitude makes more sense for simple descriptive measures than
it does for the kind of modelling you intend. Thus (naively or not) you
can say that with a complete population the observed mean is _the_ mean
and no inferential issues arise. But even for simple regression once you
postulate an error term then at least tacitly it has to have certain
properties for any estimation method to work well and that kind of
statement goes beyond the observable data. And even calculation of means
is also arguably based on a similar model, regardless of whether the
researcher knows that or makes it explicit. So, if you have an error
term, that makes what you do inferential and not just descriptive,
regardless of whether there are, or are not, more data out there that
you might have collected. The point can be generalised to whatever
probability distribution someone is working with. 

2. Alternatively, any model you specify is likely to be incomplete in
the sense that it does not capture all aspects of the process generating
your data, e.g. all possible predictors, cluster or time or space
structure, etc. So, there is an inferential aspect from that point of
view as well. 

3. What you are hoping is that results for a small sample [which happens
to be the population]  will behave like those from an arbitrarily large
sample. But what mechanism makes that happen? Say I toss a coin 20
times, and then I lose it, so that there is no scope for taking any
further measurements with that coin. Does that affect the variability of
the data? In what sense does the sample know that it is as large as
possible, and behave accordingly? I don't think it does. A sample of 20
is a sample of 20!

4. Sampling error is not the only kind of variability. There is also
measurement error as well in most problems, although exceptionally
perhaps not in many kinds of sports data. 

A more general point is that imagining what would be nice for your
problem doesn't make it come true for your data. 

Nick 
n.j.cox@durham.ac.uk 

Carlo Amenta

I am studying a sport team league with 20 teams in a specific years. I
am using a 2sls estimator because of simultaneity problem with a
specific variable which was confirmed using the -ivendog- procedure.
At this stage the study is cross sectional and regards a specific
season. It is correct to say that I have not any efficiency or
consistency problem wuth the estimator considering the fact that I am
studying the entire populationa and not a sample? As a matter of fact
n=20 even if very small it is not the number of observation but all
the teams in the league so the entire population. I think I have not
to worry about any inference problem. Can someone confirm that or
indicate any specific references?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index