Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating long, filledin dataset from two, year variables


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Creating long, filledin dataset from two, year variables
Date   Mon, 7 Mar 2011 09:54:39 +0000

Thanks. This is much clearer to me.

You need to -reshape long-. To do that, you need an identifier.
Despite what you say, suppose that "Project" here might be repeated.
So, you then might guess at

egen id = concat(project doctor1 doctor2)

being sufficient to identify projects uniquely. You can test out any
potential identifiers using -isid-. Then it's a standard -reshape-:

reshape long doctor startdate enddate, i(id)

You can then -sort- as you wish. You can coarsen to months, if you
like, but that's throwing away data.

Nick

On Mon, Mar 7, 2011 at 9:35 AM, Adrian Stork <[email protected]> wrote:
> Hi Nick,
>
> First of all thanks for your answer! Here's a more detailed example of
> my dataset where
>
>  project_|_startdate1_| enddate1_|_doctor1_|_startdate2_|_enddate2_|
> doctor2_|_startdate3 |_enddate3_|_doctor3....s60|e60|d60
> _________________________________________________________________________________________________________
>
> Infection 10Jan1995 03Dec2008 J.Smith 23Dec1976 12Feb2009 R.Andrews .......
> Vaccine 15Feb1990 05Jun2007 A.Calvin 12Aug1988 13Sept2004 H.Hollen .......
> Cancer 12Sept1987 12Dec2009 R.Jackson 14Sep1973 23Dec2006 V.Karren ........
> Diabetes 05Jan1992 13Nov2007 P.Stevens 03Jan1981 17Aug2001 A.Calvin ........
> Cadiol. 07Feb1977 09Mar2007 S.Devin 04Apr1985 14Jan2003  J.Smith ........
>
> "Project" and "doctor" are strings. start and end date are float.
> Sometimes the end-date is missing (".") meaning that the project is
> still advised by
> that doctor. Each project is mentioned only once in my dataset, so it
> should be in fact my identifier,however, I do actually want to
> focus on the doctors in order to see which projects one doctor had in
> fact in each month and finally to count the number of projects
> he had in each month. As you can see J.Smith had a project "Indection"
> from 10Jan1995 until 03Dec2008 but he also had a project "Cardiol."
> from 04Apr1985 until 14Jan2003 (similar case also for A.Calvin).
> This means J.Smith had from 04Apr1985 until 10Jan1995 exactly one
> project ("Cardiol."), from 10Jan1995 until 14Jan2003 he had two
> projects
> ("Cardiol" & "Infection") and from14Jan2003 until 03Dec2008 again only
> one project ("Infection").
> This is also why I want the date to be in a panel on a monthly basis
> that should be like:
>
> Doctor |   Date   |  project1 | project2 | project3 |...
> J.Smith Apr1985  Cardiol.
> J.Smith May1985 Cardiol.
> ..         ..                 ..
> J.Smith Jan1995 Cardiol   Infection
> J.Smith Feb1995 Cardiol   Infection
> ..           ..             ..         ..
> J.Smith Jan2003 Cardiol  Infection
> J.Smith Feb2003             Infection
> ...             ..                     ..
> J.Smith Dec2008             Infection
> A.Calvin ...
> A.Calvin ..
> ..
>
> This is everything but easy. Somehow I need to bring the dates at
> least into one column.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index