[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: another data cleaning question
Babigumira Ronnie replied to Roger Newson's suggestion:
> > Use the -int- function (see -help functions-). In this case, you might
> > type
> > gene cropcod2=int(varcode/100)
> > list if cropcod2!=cropcode
> > assert cropcod2==cropcode
> > drop cropcod2
> > and Stata will generate a new variable -cropcod2-, which should be equal
> > to -cropcode- if the data are consistent. Stata will then list
> > cases, and halt execution if there are any inconsistent cases, and drop
> > the variable -cropcod2- otherwise.
> > I hope this helps.
> Many thanks. It works however, I have a slight complication, some of the
> variety codes are more than 5. Here is an example (also included is the
> new variable cropcod2)
> villcode cropcode cropcod2 varcode
> 1. 1531002 110 110 11001
> 2. 1531002 110 110 11001
> 377. 1360100 110 1101 110104
> 378. 1362000 110 1101 110104
> The first 2 varcodes have 5 digits and hence
> . gen cropcod2=int(varcode/100)
> would work just fine, however, the last two have varcodes with 6 digits
> which would therefore mean that I would need to rewrite the syntax to
> . gen cropcod2=int(varcode/1000)
> This would however truncate that 5 digit varcodes and the resulting
> cropcod2 would have 2 digits.
> I would therefore like to put in a condition that would allow the first
> truncation to be done only if the number of digits in varcode is 5 and
> then replace cropcod2 with a new truncated figure if the varcode has 6
> digits. I have searched the manual and I still haven't found it. Might you
> (or anyone else reading) have any idea how I can do this.
I suggest that any test be in terms of the first three
characters of the string equivalent.
list if string(cropcode) != substr(string(varcode,1,3))
More generally, whenever codes are pseudo-numeric, there
are several advantages to holding them as strings.
* For searches and help try: