Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: repeated observations or multiple responses or bad data entry?

From   Joseph Coveney <>
To   Statalist <>
Subject   Re: st: repeated observations or multiple responses or bad data entry?
Date   Mon, 15 Sep 2003 14:36:07 +0900

Irina Campbell asked how to get a patient dataset with a variable for multiple 
questions into wide format, with one record per patient.  Some of the patients 
did not answer all questions; there doesn't exist any observation for those 
instances.  Other patients gave more than one answer to some questions, so 
there are multiple observations for those instances.

I suggest generating an identifier for the answers, so that multiple answers 
can be discriminated by Stata during the -reshape-.  If the question variable 
is a string variable, use the -string- option in the -reshape- command.  Also, 
-reshape- can take more than two variables in the -i()- argument in order to 
uniquely identify a patient-question combination, so a unique identifier 
doesn't need to exist in the dataset.  In addition, Stata will fill-in missing 
values in order to create a rectangular dataset in situations in which records 
for some patient-question combinations do not exist in the original long 
format.  I've illustrated below; note that the suggested solution is only four 
commands long--most of the do-file is to generate a dataset that I believe is 
similar in format to what Irina has, and I've assumed that both the question 
(var4) and answer (var5) variables are string, although it doesn't really 
matter for the latter variable.

Joseph Coveney


local obs = 242 * 26
set obs `obs'
set seed 20030915
generate byte que = mod(_n, 26)
generate str1 var4 = char(65 + que)
sort var4
generate int pid = mod(_n, 242) + 1
forvalues ans = 1/3 {
    generate byte ans`ans' = int(uniform() * 3) + 1
reshape long ans, i(pid q) j(a)
label define Answers 1 Yes 2 No 3 Maybe
label values ans Answers
decode ans, generate(var5)
local obs = `obs' * 3
drop if uniform() > 11700 / `obs'
drop que a ans
sort pid
forvalues i = 1/3 {
    generate float var`i' = .
    bys pid: replace var`i' = uniform() if _n == 1
    by pid: replace var`i' = var`i'[1]
* Begin suggested solution here
generate byte res = .
bysort pid var4: replace res = _n
reshape wide var5, i(pid var4) j(res)
reshape wide var51 var52 var53, i(pid) j(var4) string
* End suggested solution here
slist in 1/2, decimal(2)


*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index