# st: RE: Re: RE: when your sample is the entire population

 From "McKenna, Timothy" <[email protected]> To <[email protected]> Subject st: RE: Re: RE: when your sample is the entire population Date Fri, 18 Jan 2008 18:00:06 -0500

```The issue of having the entire population as your sample, and, along the
same line, whether to apply a finite population correction to your std
errors, depends on the goal of your analysis.

For example:
Say one wanted to know the ratio of girls to boys on sports teams in a
school then taking the entire population as your sample would give you
the exact answer.  There would be nothing more to examine, you would
have exactly estimated the ratio.  Similarly, you would apply finite
population correction to your std errors if you only sampled a fraction
of the students in the school.  However you get to do this because you
have a narrow goal, inferring what the ratio actually is.

If you wanted to look at the ratio of girls to boys as part of a
statistical model of whether there gender discrimination then you need
to leave the finite population world.  Essentially the model would
suggest that in a world without gender discrimination you would expect a
certain ratio.  One would then look at your sample (i.e. the
population), a realization of this school specific process where girls
and boys choose or don't choose to play sports, and then use some sort
of statistical test to update your opinion on whether there is gender
discrimination.  Among the statistical approaches I can think of, the
size of your sample, not the size of your sample relative to the
population, will help with the precision of your method.

-Tim

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael
Blasnik
Sent: Friday, January 18, 2008 4:31 PM
To: [email protected]
Subject: st: Re: RE: when your sample is the entire population

I would just say -- "what Nick says" ;)

But I'd like to emphasize one aspect related to his points 3 (and/or 4)
--
measurement error.  In many real applications, the outcome (and,
unfortunately,
the predictors) are measured with error.  Therefore, you have
uncertainty even
with data for the full population.  Also, the superpopulation concept (
point 1)
seems quite reasonable -- at least for most program evaluation questions
where
you may collect data for all program participants (or kids in a school)
but they
can be considered a sample of some larger potential population.  Of
course in
program evaluation you also still have uncertainty introduced by any
comparison/control group employed in the analysis.

Michael Blasnik

----- Original Message -----
From: "Nick Cox" <[email protected]>
To: <[email protected]>
Sent: Friday, January 18, 2008 3:02 PM
Subject: st: RE: when your sample is the entire population

>I guess most people will have a short answer and a long answer
> to this one. You are going to get my short answer.
>
> Also, in statistical science, it seems that most people who think they
> have a reasonably smart, or at least sensible, answer think some of
the
> other guys' reasonably smart answers are really fairly stupid, or at
> least difficult to understand. So it may be colourful if and when
people
> start telling me that after a few decades of sweat and toil I _still_
> don't understand statistics at all.
>
> If the question is what meaning is attached to a P-value, then there
> seem many possible partial answers.
>
> 1. I am looking only at a sample of size n and I think of this as only
> one of many possible samples of the same size from a larger
population.
> That is most plausible if someone really did select that sample using
> random numbers, or something equivalent, and it's a greater or lesser
> stretch otherwise. In many cases the sample you have just fell into
your
> lap somehow
> and the whole exercise is to treat the data _as if_ it were a random
> sample, partly because that's a calculation you can do. There's
usually
> some wishful thinking involved. Both texts and teachers vary
enormously
> on how candidly they discuss what is going on. This seems to be what
is
> most emphasised in most introductory courses and texts, but it may be
> the least applicable story in statistical practice!
>
> 2. I am looking at a sample of size n and I am willing to think of
this
> as one possible outcome among many. I can get a reference population
by
> resampling the data I have repeatedly. Permutation and bootstrap
methods
> fit under this heading. I think it wry that in less than 30 years
> bootstrap methods have gone from being widely regarded as a form of
> cheating to being widely considered as the best way to get a P-value
in
> many problems.
>
> 3. I have a model, at its simplest response a function of predictors
> plus some error term, and the uncertainty comes from the fact that the
> model is always a approximation and stochastic by virtue of its error
> term. Whether your n is the whole N is immaterial, because the
> uncertainty is not about sampling at all.
>
> 4. What I have I regard as the realisation of a stochastic process
> (usually in time, or space, or both). The realisation is unique, but
at
> least in principle there could have been other realisations.
>
> I won't quarrel with anyone who thinks #3 and #4 sound the same.
>
> 5. Bayesians have other stories.
>
> 6. I must have forgotten or be unaware of yet other stories. Bill
Gould
> has tried to explain quantum mechanics to me several times. I am
pretty
> clear that he understands it very well.
>
> In these terms you seem to be saying #1 does not apply in your case,
but
>
> that still leaves other arguments, and there is a lot of scope for
> arguing what is central to #1 in any case.
>
> Nick
> [email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```