This is a nice example for teaching as it shows that understanding the structure and provenance of the data makes analysis much clearer. The larger data might well have multiple observations for each patient.
"Raphael Fraser" <raphael.fraser@gmail.com> posed:
Below is a snippet of my data. Each subject has 2 observation where vi
indicates whether an ulcer is present or not. I would like to count
the number of patients with ulcers on one leg only and those with
ulcers on both legs.
id leg vi
40 right .
40 left .
46 right 1
46 left 1
47 left 0
47 right 1
48 right .
48 left .
55 left 1
55 right 1
57 right 0
57 left 1
Raphael
-------
As replies pointed out, the exact question could be addressed in Stata using egen, collapse, reshape or tab. Reshape appears to be the politically correct answer, since the NHS is committed to treating patients as individuals. I haven't seen anyone ask what . meant in these data. Presumably, patients can have ulcers on both legs only if they have two legs. The proportion of .'s in the sample suggests that they should be imputed zeros. Otherwise one is irresistibly reminded of the Tarzan sketch. For those who need an explanation, Dudley Moore is auditioning for the part of Tarzan, but is one-legged.
Peter Cook: Your left leg I like. I have nothing against your left leg. The problem is, neither have you.
