Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: weird behavior of append


From   Joerg Luedicke <[email protected]>
To   [email protected]
Subject   Re: st: weird behavior of append
Date   Wed, 12 Sep 2012 10:20:52 -0500

I cannot spot a problem here? You have 11165 observations in one file
and 259 observations in the other. Then 11165 + 259 = 11424
observations is what you end up with after appending?

Joerg

On Wed, Sep 12, 2012 at 9:50 AM, Feiveson, Alan H. (JSC-SK311)
<[email protected]> wrote:
> Hello  - In Stata 12 IC, I am trying to append a file of 259 observations to one of 11165 observations. Both files contain only one variable named "id" (see below). After appending, rather than having 259 new observations, it appears that 259 observations have been lost, yet if I reduce the size of the first file to 10000, the append seems to work. Also if the variables have different names, I get even more weird results (see below). Anyone have an explanation?
>
> Thanks,
>
> Al Feiveson
>
> ========================================================================
> . use temp1,clear
> . summ
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           id |     11165    5205.994    98.91063       5000       5389
>
> . use temp2,clear
> . summ
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           id |       259    5206.846    101.6719       5000       5388
>
> . append using temp1
> . summ
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           id |     11424    5206.013     98.9696       5000       5389
>
>
> ========================================================================
> Now cut out some observations
>
> . use temp1,clear
> . keep in 1/10000
> (1165 observations deleted)
>
> . summ
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           id |     10000    5189.761    91.46947       5000       5331
>
> . append using temp2
> . summ
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           id |     10259    5190.192    91.77468       5000       5388
>
> This appears to be correct.
>
> ========================================================================
> Now rename the variable in one of the files
> . use temp2,clear
> . des
>
> Contains data from temp2.dta
>   obs:           259
>  vars:             1                          12 Sep 2012 09:32
>  size:           518
> ----------------------------------------------------------------------------------------------
>               storage  display     value
> variable name   type   format      label      variable label
> ----------------------------------------------------------------------------------------------
> id              int    %10.0g                 ID
> ----------------------------------------------------------------------------------------------
> Sorted by:  id
>
> . rename id id2
> . append using temp1
> . summ
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>          id2 |       259    5206.846    101.6719       5000       5388
>           id |     11165    5205.994    98.91063       5000       5389
>
> . count if id==. & id2<.
>   259
>
> . count if id2==. & id<.
> 11165
>
> So it appears that there should be 259 + 11165 observations, since both conditions are exclusive. Yet
>
>
> . des
>
> Contains data from temp2.dta
>   obs:        11,424
>  vars:             2                          12 Sep 2012 09:32
>  size:        45,696
> ----------------------------------------------------------------------------------------------
>               storage  display     value
> variable name   type   format      label      variable label
> ----------------------------------------------------------------------------------------------
> id2             int    %10.0g                 ID
> id              int    %10.0g                 ID
> ----------------------------------------------------------------------------------------------
> Sorted by:
>      Note:  dataset has changed since last saved
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index