Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Appending files when variables differ in their types


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Appending files when variables differ in their types
Date   Wed, 9 Oct 2002 17:31:58 +0100

gcruces@worldbank.org
 
> This is a relatively simple question. I am appending data 
> from the same survey
> for different years. The problem is that some variables 
> appear as strings in
> some datasets and as bytes/numbers in others. Perhaps this 
> is due from the
> translation from dbf to dta with stattransfer in my 
> specific case, and may be
> due to the presence of typos, eg: ` instead of 1, etc.. 
> When appending, only the
> original data is used, the new data is lost.
> The following two mock datasets illustrate the situation:
> 1.dta:
> var1 [byte]
> 1
> 2
> 3
> and
> 2.dta:
> var1 [str1]
> `
> 1
> 2
> 
> When appending the two (for instance append using 2.dta) 
> Stata warns you about
> this:
> (note: var1 is str1 in using data but will be byte now)
> and the result is:
> var1
> 1
> 2
> 3
> .
> .
> .
> 
> 
> So far I've managed with a set of destring, replace force 
> on each separate file.
> But when working with a large number of files and 
> variables, it may become
> cumbersome. What I wanted to know is if there is a way to 
> tell Stata to append
> every variable as the most general format, that is, str*, 
> when there is a problem like this.

No, there isn't, as far as I know. 

One might ask "Why not?" and the best answer I can 
think of grows from what you say here: you had 
to use -destring, replace force- to coerce 
at least some of the string variables. As you say, 
there are problems in some observations in 
treating these strings as numeric. 

Stata in essence is not in the business of making
strong assumptions about what your data really 
mean or really should be. It devolves all 
responsibility for such decisions to you. 

Also, while there is a good case for what you 
suggest, Stata is not in the business 
of making unilateral changes from numeric 
to string or from string to numeric. 

There is more discussion about numbers 
and strings, together with one person's 
attempt to explain Stata philosophy here,
in Stata Journal 2(3), 314-329 (2002). 
The paper might be of interest, although 
I don't think it offers any quick and easy 
alternatives to the problem here. 

Nick 
n.j.cox@durham.ac.uk 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index