[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Suzy <scott_788@wowway.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: destringing values led to Stata recoding them as missing |

Date |
Fri, 27 Aug 2004 20:54:12 -0400 |

Hi John, This is all very interesting. I think I did lose the values as I did indeed use the force option. My variables came into Stata just like this:

Patient var1 var2 var3 1001 1235- V2347 456 1002 1233 143135 E28950 1003 38568 05476- 89076 1004 126 333 V5678

All these datapoints are explicitly coded - meaning that a 0 value at the end is different from a - at the end. This is exactly how it came in and looked in Stata when I imported the file. I needed to be able to have Stata convert the string datapoints to numeric. I just didn't understand quite how to do it - so I read the help online for destring. I didn't think it would convert the string datapoints to missing, but it did. There was nothing in the instructions to lead me to believe that it would convert the values to missing. This is the result of my error:

Patient var1 var2 var3 1001 . . 456 1002 1233 143135 . 1003 38568 . 89076 1004 126 333 .

I guess I won't be able to restring...is there anything I can do...other than starting from scratch?

Wallace, John wrote:

Hi Suzy, Stata only recognizes variables as being either string or numeric - even if the data look like numbers, they could be stored as strings. Stata tries to decide on how to store the data as it is imported; from what I can tell it looks at the first record in each variable (column) as it comes in and stores it as numeric only if all the characters are numbers. If a later record in the variable has non-numeric characters in it, Stata will store it as . (missing) and it should alert you that variable x has non-numeric characters. If you lucked out and the first record has non-numeric characters in it, Stata will store the values in the entire variable as strings. It seems likely that this is what happened in your case - in other words what you see as 1234 is actually stored as "1234". When you destring that quantity you get 1234, as expected. However when Stata comes across "V234", that doesn't resolve to a number after destringing, so it puts a . (missing) value in that place, exactly as it would have done on importing.generating new variables with destring, you -replace-d your originals (andFrom the way your question is phrased, it looks as though rather than

used the -force- option?), in which case you're out of luck unfortunately

(hopefully you still have the original source?)

It might help if you submit a toy dataset to describe how your data look.

For example

Patient var1 var2 var3 1001 1234 V234 med 1002 1233 1431 small 1003 65 14-1 small 1004 2.4 333 large

In this case, Stata would bring "Patient" in as a numeric variable, var2 and

var3 as string variables, and var1 as numeric.

Now, had the data looked like

Patient var1 var2 var3 1001 1234 1234 med 1002 1233 V234 small 1003 65 14-1 small 1004 2.4 333 large Stata would have decided that var2 should come in as a numeric variable, and

you would end up with

patient var1 var2 var3 1001 1234 1234 med 1002 1233 . small 1003 65 . small 1004 2.4 333 large

(notice also that Stata will change Patient to patient, although it will

store "Patient" as a variable label)

Likewise, if you took the first example and process it with

.destring, replace force (which is pretty reckless, data integrity-wise), you'll end up with

patient var1 var2 var3 1001 1234 . . 1002 1233 1431 . 1003 65 . . 1004 2.4 333 .

Is this along the lines of your situation?

-JW

-----Original Message-----

From: Suzy [mailto:scott_788@wowway.com] Sent: Friday, August 27, 2004 4:06 PM

To: statalist@hsphsun2.harvard.edu

Subject: Re: st: RE: destringing values led to Stata recoding them as

missing

I meant to say - would the restring option restore my datapoints?

Suzy wrote:

Hi John,*

I don't know if this matters but I'm not starting with purely string variables. I have variables that have datapoints of which some are string and some are numeric. Also John, would the destring option restore my original values as now the destringed values are missing....

Suzy

Wallace, John wrote:

Suzy

You might want to consider -encode- instead of destring. Presumably you're

starting with string variables. Encode will create a new variable with

incrementing value in (I believe) alphabetical order of the original

variable, plus it will make a value label corresponding to the original

string. This is useful if you need to be able to relate the new variable

value back to the original string.

e.g.

.encode var1, gen(code1)

-JW

-----Original Message-----

From: Suzy [mailto:scott_788@wowway.com] Sent: Friday, August 27, 2004 2:45 PM

To: statalist@hsphsun2.harvard.edu

Subject: st: destringing values led to Stata recoding them as missing

Dear Statalisters;

I have seven variables of over 300,000 observations each. Within each variable, I have over 2000 different values. These datapoints represent specific codes - for example : (72200 = intervertebral disc disorder). Within each of these seven variables, there are datapoints (values) with dashes or alphabets (Ie: 4109- or V2389). The majority of the values though, are purely numeric (23405). I used the destring option so that I could analyze the data and Stata treated all those datapoints that included dashes and alphabets as missing. Now there is a period . where there used to be a value. I have two questions:

1. Will the restring option restore the datapoints?

2. How can I successfully "destring" these values so that I can include them in my analysis?

Any help and/or specific code would be very helpful as I am only marginally competent with Stata basics.

Thank you!

Suzy

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: RE: destringing values led to Stata recoding them as missing***From:*"Wallace, John" <John_Wallace@affymetrix.com>

- Prev by Date:
**[no subject]** - Next by Date:
**st: Re: destringing values led to Stata recoding them as missing** - Previous by thread:
**RE: st: RE: destringing values led to Stata recoding them as missing** - Next by thread:
**st: Does treatreg has a panel version?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |