[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Help with data cleaning

From   "Nick Cox" <>
To   <>
Subject   RE: st: Help with data cleaning
Date   Sun, 15 Jun 2008 17:35:04 +0100

Additionally, see the packages -disjoint- and -spellutil- on SSC. 


Austin Nichols

Melonie Sullivan <>:
You may want to reconsider what you want; creating a duration variable
is frowned on for most purposes, since spells may be of unknown
duration if beginning or end dates are unknown (and you are virtually
guaranteed to have at least some end dates unknown).  See -help st- or
Maarten Buis's intro at
or Stephen Jenkins's course at for more
info on analysis of spells.

I'm not sure you actually want to drop the duplicates, as opposed to
recoding that info in another variable--isn't it informative that a
planned release didn't happen? As for defining spells, you may want to
just make a spell within some person ID of each Pcode that encompasses
any overlapping spells; the outer envelope, in other words.  And then
create variables that define what happens during that enveloping
spell.  For Pcode 10, presumably you want to know the start and end
dates, even if it happens during a spell of another type. The thread
beginning at
might be of interest to you; there are many others in the Archives
that are relevant as well.

You may want to provide more specifics about the planned analysis, and
provide some example raw data with ID variables with a target data
structure, and some example analysis commands, to get better advice.

On Sun, Jun 15, 2008 at 10:06 AM, Marcello Pagano
<> wrote:
> For Melonie:
>> From: Melonie Sullivan <>
>> Subject: help with data cleaning
>> To: "stata listserve" <>
>> Date: Saturday, June 14, 2008, 5:26 PM
>> Pcode Admit      Release
>> 32    7/9/2003           4/5/2004
>> 32    7/9/2003           8/19/2004
>> 10    8/1/2004           8/11/204
>> 25    8/27/2004          11/4/2004
>> 25    11/4/2004          2/24/2006
>> Above is an example of juvenile placement history data;
>> placement type code, admission and release dates. In
>> attempting to generate a variable reflecting duration of
>> placement, I am running into several problems.
>> Unfortunately, I have over 16,000 records of this type, so
>> hand cleaning these particular problems seems rather
>> overwhelming. Can you suggest coding that will clean this
>> data of the following problems?
>> (1) A double entry as in placement type 32; two entries
>> with overlapping dates, i.e., this is a single placement
>> lasting from 7/09/2003 to 8/19/2003. (A planned release on
>> 4/5/2004 did not happen.) How to collapse these kind of
>> entries into a single entry with the correct release date?
>> (2) Youth are often in detention (Type 10) placements while
>> in other types of secure confinements (type 32), so that
>> days in lock-up are double counted when attempting to
>> calculate days in lockup during a given time frame. I need
>> to eliminate this double counting.
>> (3) In attempting to simultaneously identify a given
>> placement event and the service dates, I have some
>> contiguous or run-on entries that need to be collapsed into
>> one, as in placement type 25 above, I need to identify one
>> placement with service dates of 8/27/04 to 2/24/06.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index