Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing
Date   Mon, 17 May 2010 10:01:56 -0400

-----

Martin--

Mandy's original Stata code don't work, because it assumes that the
maximum of the "real "data values can be identified; they cannot be.
Your code might fail if the highest value for a variable is a multiple
of 9, like 81.  Also, it doesn't easily generalize to the other
missing value types.

Tony and I are suggesting that Mandy determine the proper coding from
the study documentation, that she classify the variables accordingly
and apply -mvdecode to each group. If she doesn't do this, the
following part of a -foreach- loop is an algorithmic solution.

sum `v', meanonly
mvdecode `v' if inlist(`r(max)',7,8,9), mv(7=.a\8=.b\9=.c)
mvdecode `v' if inlist(`r(max)',97,98,99), mv(97=.a\98=.b\99=.c)
mvdecode `v' if inlist (`r(max)', 997, 998, 999), mv(997=.a\998 =.b\999 =.c)
mvdecode `v' if inlist (`r(max)', 9997, 9998, 9999), mv(9997=.a\9998
=.b\9999 =.c)

(I would reserve "."  for the truly missing (i.e. blank) entries.)

To depend on a such a purely algorithmic solution, Mandy has to be
quite confident of the accuracy of the coding. What happens, for
example, if 997 is  entered as 987?
 I have never been that confident., unless the field supervisor
scanned each questionnaire and there was double-verified independent
entry. Even then, I know of a case where the data entry company
corrected coding to something more logical, without telling anybody.
Or, worse, our long-time reliable data entry service started cutting
corners and did not do the verified entry we had payed for.

In the past I've built up the variable lists needed for each set of
missing value assignments by hand, based on inspection of the
questionnaire  and code book.. To keep track, I've tried to make the
assignments to variables in questionnaire order, at the expense of
some duplicate statements.. Luckily, similarly coded questions often
occur in blocks. Also, there are various tricks to streamline the
process, like capturing each -mvdecode- statement in a macro..

Steve
-
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax: 206-202-4783

On Mon, May 17, 2010 at 3:35 AM, Martin Weiss <martin.weiss1@gmx.de> wrote:
>
> <>
> What does the -mvdecode- solution look like then? Like this?
>
>
> *************
> clear*
>
> inp byte(var1 var2) int(var3 var4)
> 1 1 1 1
> 2 2 2 2
> 3 3 3 3
> 4 8 99 999
> 5 9 100 1000
> 6 10 101 1001
> 7 11 150 5000
> 9 12 999 9999
> end
>
> foreach var of varlist *{
>        sum `var', mean
>        if r(max)<=9 mvdecode `var', mv(9)
>        else if inrange(r(max),10,99) mvdecode `var', mv(99)
>        else if inrange(r(max),100,999) mvdecode `var', mv(999)
>        else mvdecode `var', mv(9999)
> }
>
> li, noo
> *************
>
>
> HTH
> Martin
>
> -----Ursprüngliche Nachricht-----
> Von: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels
> Gesendet: Montag, 17. Mai 2010 03:00
> An: statalist@hsphsun2.harvard.edu
> Betreff: Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing
>
> Mandy, if you know this much about each variable, I see no advantaqe
> or necessity to your approach.  -mvdecode- appears to be superior in
> every way.  It is not only more direct,  clearer, and  will  handle
> all the other "non-data" codes. Clarity is very important: other
> people (and you, perhaps, in the future) will be able to understand
> your Stata statements without any lengthy explanation.  None of the
> other solutions can claim that.
>
> Steve
>
>
> On Sun, May 16, 2010 at 8:33 PM, Amanda Fu <mandy.fu1@gmail.com> wrote:
>> Dear Mr. Weiss and Lachenbruch,
>>
>> I am sorry that I should be more clear when describing my question. In
>> my opinion, I need to be careful about this problem : for example, for
>> a variable  that has 10 scales, the 9 value means a real scale and 99
>> in that case means "not answered".
>>
>> The pattern is like this:
>> (1) if the maximum value  of a variable is smaller than 9 , then the
>> "not answered" takes the value 9;
>> (2) if the maximum value  of a variable is smaller than 99 but greater
>> than 10, then the "not answered"   takes the value 99;
>> (3) if the maximum value  of a variable is smaller than 999 but
>> greater than 100, then the "not answered"  takes the value 999;
>> and so on.
>>
>> (And you are absolutely right for the reminder that there are values
>> such as 7,8, 98, or 97 to indicate "refused to answer" "invalid
>> answer". Here I would like to keep focus on one example of "not
>> answered" , because the other values could be dealt with using the
>> same way.)
>>
>> Thanks for help from both of you!
>>
>> Best regards,

-

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index