Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: drop 'em OR it depends


From   Sven-Oliver Spieß <mail@svenoliverspiess.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: drop 'em OR it depends
Date   Sun, 20 Jul 2008 20:29:23 +0200

Tim,
"It depends" is usually a safe answer. You might want to keep them for
example if you run several analysis with different variables and there are
reasons why you wouldn't want to have identical samples for whatever
reasons. Or if you wanted to impute the missing values of course. Or simply
because it's easier to deal with the data when every subject has the same
number of observations.--After all, the missings were in the data all along.

Other than that I don't own the book you mention and can only assume the
same is true for at least some other members of statalist, too. Also the url
in your post contains two typos. So unfortunately I can't provide a more
specific answer.
Generally speaking, in many cases Stata simply "ignores" missing values in
analyses and therefore they do not affect the results (see below). To better
understand your specific problem it would be helpful if you could provide
more details, like what analysis in particular they perform in section 9.6
and an excerpt of the relevant lines from your log file.

Best,
Sven-Oliver




-------example: summary statistics reshaped vs. original online
data---------



. use "C:\downloads\fevwide.dta", clear
(Repeated measurements of FEV for three groups, coded wide)

. reshape long fev, i(id)
(note: j = 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48)

Data                               wide   ->   long
----------------------------------------------------------------------------
-
Number of obs.                       57   ->     969
Number of variables                  19   ->       4
j variable (17 values)                    ->   _j
xij variables:
                    fev0 fev3 ... fev48   ->   fev
----------------------------------------------------------------------------
-

. rename _j month

. d, s

Contains data
  obs:           969                          Repeated measurements of FEV
for t
> hree groups, coded wide
 vars:             4                          
 size:        20,349 (99.9% of memory free)
Sorted by:  id  month
     Note:  dataset has changed since last saved

. sum fev

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |       663    42.59765    18.51655      10.12     110.81

. bysort grp: sum fev

----------------------------------------------------------------------------
----
-> grp = 1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |       459    47.64026    18.58769      14.28     110.81

----------------------------------------------------------------------------
----
-> grp = 2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |       146    28.86562    9.250675      10.12      65.02

----------------------------------------------------------------------------
----
-> grp = 3

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |        58    37.25828    16.47463      16.59       81.8


. use "C:\downloads\fevlong.dta", clear
(Repeated measurements of FEV for three groups, coded long)

. d, s

Contains data from C:\downloads\fevlong.dta
  obs:           663                          Repeated measurements of FEV
for t
> hree groups, coded long
 vars:             4                          20 Apr 2002 21:43
 size:        14,586 (99.9% of memory free)
Sorted by:  id  month



*** #obs in long data set = #non-missing in reshaped wide data set!


. sum fev

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |       663    42.59765    18.51655      10.12     110.81

. bysort grp: sum fev

----------------------------------------------------------------------------
----
-> grp = 1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |       459    47.64026    18.58769      14.28     110.81

----------------------------------------------------------------------------
----
-> grp = 2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |       146    28.86562    9.250675      10.12      65.02

----------------------------------------------------------------------------
----
-> grp = 3

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         fev |        58    37.25828    16.47463      16.59       81.8



*** ==>statistics identical regardless if missings are dropped!


-------end example---------





> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> statalist@hsphsun2.harvard.edu] On Behalf Of Tim
> Sent: Sonntag, 20. Juli 2008 06:57
> To: statalist@hsphsun2.harvard.edu
> Subject: st: drop 'em OR it depends
> 
> New semester starts in about a week.
> One thing I had difficulty with last semester was getting the data
> provided into the form needed for the analysis. I could get reshape to
> work, but had to look it up every time, and it still took several
> attempts every time.
> So I've been looking again at Hills and De Stavola, "A short
> introduction to Stata for biostatistics", chapter 9. (files at net from
> http://ww.stata.com.data/hs/; net get book)
> In section 9.5 they cover reshape.
> In section 9.6 they cover _N and _n.
> The examples in section 9.6 use the fevlong dataset. When I tried using
> fevwide reshaped to long, I did not get the results in the book. Only
> after dropping missing observations did it work.
> 
> So my question is, should dropping missing obs be normal practice after
> reshaping from wide to long, or does it depend on what I want to do
> with
> the long dataset?
> And if I dont' drop 'em always, when do I keep them?
> 
> Tim
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index