[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Data manipulation question
This example suggests various kinds of problems.
Whenever CarManufacturer is empty, you could
pull across the value from the second variable
replace CarManufacturer = CarModel if mi(CarManufacturer)
but that leaves e.g. "Ford Excursion" as both CarManufacturer
and CarModel, which replaces one problem by another.
I would try another way: concatenate all these into a single
variable, and then start again.
gen Car = CarManufacturer + " " + CarModel + " " + CarEngine
egen Car = concat(CarManufacturer CarModel CarEngine), p(" ")
Then two simple clean-ups are to trim spaces
replace Car = trim(Car)
and perhaps to remove isolated periods
replace Car = subinstr(Car, " .", " ",.)
Now it starts getting serious. Two tools that might come
in handy are the -word()- function and the -split- command.
will -split- the variable into several, each containing
will expose problems like "318" in obs 8
and the inconsistency between "Alfa" and "Alfra".
You are probably going to end up with a .do
file mixing all sorts of general and detailed
I have discovered errors in my dataset, and it seems some of my data
are recorded in the wrong variable. The variable the data should have
been recorded as, is left missing. A few examples: (Missing values
marked as "")
Record CarManufacturer CarModel
1 Ford Mustang
2 Chevrolet Starcraft
3 Ford Galaxy
4 Honda Civic
1.4 I S
5 Toyota Avensis
6 "" Ford
7 "" BMW 520 I
Touring 520 I
8 "" 318
9 BMW 320 I
10 Alfra Romeo Spider
11 "" Alfa Romeo
What I wish to do is to search for an expression in each record that
can also be observed as a distinct value in CarManufacturer, and then
replace it into CarManufacturer. I have failed in both creating tests
across records and on an attempt to fetch the unique values of
CarManufacturer into an object which I then can perform checks
against. But then again, I'm no seasoned veteran in this game.
Is there any way of pulling this off in Stata?
* For searches and help try: