[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Babigumira Ronnie <rutaremwa_rb@yahoo.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
RE: st: another data cleaning question |

Date |
Mon, 24 Jun 2002 05:55:45 -0700 (PDT) |

Hi Nick Thanks for the help. The code produces the same result as what Hakon suggested however you make a comment “More generally, whenever codes are pseudo-numeric, there are several advantages to holding them as strings” which has generated interest in me so I would like to pursue it further. You suggest list if string(cropcode) != substr(string(varcode,1,3) Now that the code works, I would like to know the underlying principles. Please throw some more light especially on the right hand side of the =. Will look forward to hearing from you Roni --- Nick Cox <n.j.cox@durham.ac.uk> wrote: > Babigumira Ronnie replied to Roger Newson's suggestion: > > > > Use the -int- function (see -help functions-). In this case, you > might > > > type > > > > > > gene cropcod2=int(varcode/100) > > > list if cropcod2!=cropcode > > > assert cropcod2==cropcode > > > drop cropcod2 > > > > > > and Stata will generate a new variable -cropcod2-, which should be > equal > > > to -cropcode- if the data are consistent. Stata will then list > > consistent > > > cases, and halt execution if there are any inconsistent cases, and > drop > > > the variable -cropcod2- otherwise. > > > > > > I hope this helps. > > > > > > Many thanks. It works however, I have a slight complication, some of > the > > variety codes are more than 5. Here is an example (also included is > the > > new variable cropcod2) > > > > villcode cropcode cropcod2 varcode > > 1. 1531002 110 110 11001 > > 2. 1531002 110 110 11001 > > 377. 1360100 110 1101 110104 > > 378. 1362000 110 1101 110104 > > > > The first 2 varcodes have 5 digits and hence > > > > . gen cropcod2=int(varcode/100) > > > > would work just fine, however, the last two have varcodes with 6 > digits > > which would therefore mean that I would need to rewrite the syntax to > > > > . gen cropcod2=int(varcode/1000) > > > > This would however truncate that 5 digit varcodes and the resulting > > cropcod2 would have 2 digits. > > > > I would therefore like to put in a condition that would allow the > first > > truncation to be done only if the number of digits in varcode is 5 and > > then replace cropcod2 with a new truncated figure if the varcode has 6 > > digits. I have searched the manual and I still haven't found it. Might > you > > (or anyone else reading) have any idea how I can do this. > > > I suggest that any test be in terms of the first three > characters of the string equivalent. > > list if string(cropcode) != substr(string(varcode,1,3)) > > More generally, whenever codes are pseudo-numeric, there > are several advantages to holding them as strings. > > Nick > n.j.cox@durham.ac.uk > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

