Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing |

Date |
Mon, 17 May 2010 11:55:24 -0400 |

I agree with everything you say about my code, Nick. Obviously I didn't try it out! Martin, the reference I made to your code was not to your most recent version, but to an earlier one. I apologize. I should have added that the last line of our defenses against miscoding, before data editing, has been a data entry service that, in addition to doing independent verified double entry, tesst the questionnaires, alerts us to issues, and flags problems during production runs. To researchers who cannot afford double-entry, I suggest 100% read back by one person to another, a process that is fast and does not need high level skills. For a study that abstracted shift staffing numbers in hospital units, there were hundreds of similar counts per record. To save time, we employed a system that could recognize scanned numbers. I found it to have the accuracy of an untrained typist! Steve On Mon, May 17, 2010 at 10:19 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > Your code here looks problematic to me. > > su <varname>, meanonly > mvdecode <varname> if inlist(`r(max)', 7, 8, 9) ... > > should work, because the -if- condition if true will be true for all observations and if false will be false for all observations. > > But it's not good style and the better way to code this would be > > if inlist(`r(max)', 7, 8, 9) mvdecode ... > > The difference between the first and the second forms is that Stata checks the -if- clause _N times the first form and just 1 time the second. > > Beyond that, the coding for the second -mvdecode- onwards presumes that the r(max) from -summarize- remains accessible. Not so, I think, as what is done within -mvdecode- zaps those r-class results. (It is true that -mvdecode- bails out early if there's nothing to do, but the checking whether there's something to do hinges on a -count-.) > > Martin's code does look better to me. I haven't tested either. > > Nick > n.j.cox@durham.ac.uk > > Steve Samuels > > Martin-- > > Mandy's original Stata code don't work, because it assumes that the > maximum of the "real "data values can be identified; they cannot be. > Your code might fail if the highest value for a variable is a multiple > of 9, like 81. Also, it doesn't easily generalize to the other > missing value types. > > Tony and I are suggesting that Mandy determine the proper coding from > the study documentation, that she classify the variables accordingly > and apply -mvdecode to each group. If she doesn't do this, the > following part of a -foreach- loop is an algorithmic solution. > > sum `v', meanonly > mvdecode `v' if inlist(`r(max)',7,8,9), mv(7=.a\8=.b\9=.c) > mvdecode `v' if inlist(`r(max)',97,98,99), mv(97=.a\98=.b\99=.c) > mvdecode `v' if inlist (`r(max)', 997, 998, 999), mv(997=.a\998 =.b\999 =.c) > mvdecode `v' if inlist (`r(max)', 9997, 9998, 9999), mv(9997=.a\9998 > =.b\9999 =.c) > > (I would reserve "." for the truly missing (i.e. blank) entries.) > > To depend on a such a purely algorithmic solution, Mandy has to be > quite confident of the accuracy of the coding. What happens, for > example, if 997 is entered as 987? > I have never been that confident., unless the field supervisor > scanned each questionnaire and there was double-verified independent > entry. Even then, I know of a case where the data entry company > corrected coding to something more logical, without telling anybody. > Or, worse, our long-time reliable data entry service started cutting > corners and did not do the verified entry we had payed for. > > In the past I've built up the variable lists needed for each set of > missing value assignments by hand, based on inspection of the > questionnaire and code book.. To keep track, I've tried to make the > assignments to variables in questionnaire order, at the expense of > some duplicate statements.. Luckily, similarly coded questions often > occur in blocks. Also, there are various tricks to streamline the > process, like capturing each -mvdecode- statement in a macro.. > > > On Mon, May 17, 2010 at 3:35 AM, Martin Weiss <martin.weiss1@gmx.de> wrote: >> >> <> >> What does the -mvdecode- solution look like then? Like this? >> >> >> ************* >> clear* >> >> inp byte(var1 var2) int(var3 var4) >> 1 1 1 1 >> 2 2 2 2 >> 3 3 3 3 >> 4 8 99 999 >> 5 9 100 1000 >> 6 10 101 1001 >> 7 11 150 5000 >> 9 12 999 9999 >> end >> >> foreach var of varlist *{ >> sum `var', mean >> if r(max)<=9 mvdecode `var', mv(9) >> else if inrange(r(max),10,99) mvdecode `var', mv(99) >> else if inrange(r(max),100,999) mvdecode `var', mv(999) >> else mvdecode `var', mv(9999) >> } >> >> li, noo >> ************* >> >> >> HTH >> Martin >> >> -----Ursprüngliche Nachricht----- >> Von: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels >> Gesendet: Montag, 17. Mai 2010 03:00 >> An: statalist@hsphsun2.harvard.edu >> Betreff: Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing >> >> Mandy, if you know this much about each variable, I see no advantaqe >> or necessity to your approach. -mvdecode- appears to be superior in >> every way. It is not only more direct, clearer, and will handle >> all the other "non-data" codes. Clarity is very important: other >> people (and you, perhaps, in the future) will be able to understand >> your Stata statements without any lengthy explanation. None of the >> other solutions can claim that. >> >> Steve >> >> >> On Sun, May 16, 2010 at 8:33 PM, Amanda Fu <mandy.fu1@gmail.com> wrote: >>> Dear Mr. Weiss and Lachenbruch, >>> >>> I am sorry that I should be more clear when describing my question. In >>> my opinion, I need to be careful about this problem : for example, for >>> a variable that has 10 scales, the 9 value means a real scale and 99 >>> in that case means "not answered". >>> >>> The pattern is like this: >>> (1) if the maximum value of a variable is smaller than 9 , then the >>> "not answered" takes the value 9; >>> (2) if the maximum value of a variable is smaller than 99 but greater >>> than 10, then the "not answered" takes the value 99; >>> (3) if the maximum value of a variable is smaller than 999 but >>> greater than 100, then the "not answered" takes the value 999; >>> and so on. >>> >>> (And you are absolutely right for the reminder that there are values >>> such as 7,8, 98, or 97 to indicate "refused to answer" "invalid >>> answer". Here I would like to keep focus on one example of "not >>> answered" , because the other values could be dealt with using the >>> same way.) >>> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: recode 9, 99, 999,..., into missing***From:*Amanda Fu <mandy.fu1@gmail.com>

**st: RE: AW: recode 9, 99, 999,..., into missing***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing***From:*Amanda Fu <mandy.fu1@gmail.com>

**Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing***From:*Steve Samuels <sjsamuels@gmail.com>

**RE: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: statistical significance of cut points in ordered logit** - Next by Date:
**RE: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing** - Previous by thread:
**RE: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing** - Next by thread:
**AW: st: RE: AW: RE: AW: recode 9, 99, 999,..., into missing** - Index(es):