Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating long, filledin dataset from two, year variables


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Creating long, filledin dataset from two, year variables
Date   Mon, 7 Mar 2011 10:08:00 +0000

Plus something like

rename startdate date1
rename enddate date2
drop _j
reshape long date, i(project doctor)


On Mon, Mar 7, 2011 at 9:54 AM, Nick Cox <[email protected]> wrote:
> Thanks. This is much clearer to me.
>
> You need to -reshape long-. To do that, you need an identifier.
> Despite what you say, suppose that "Project" here might be repeated.
> So, you then might guess at
>
> egen id = concat(project doctor1 doctor2)
>
> being sufficient to identify projects uniquely. You can test out any
> potential identifiers using -isid-. Then it's a standard -reshape-:
>
> reshape long doctor startdate enddate, i(id)
>
> You can then -sort- as you wish. You can coarsen to months, if you
> like, but that's throwing away data.
>
> Nick
>
> On Mon, Mar 7, 2011 at 9:35 AM, Adrian Stork <[email protected]> wrote:
>> Hi Nick,
>>
>> First of all thanks for your answer! Here's a more detailed example of
>> my dataset where
>>
>>  project_|_startdate1_| enddate1_|_doctor1_|_startdate2_|_enddate2_|
>> doctor2_|_startdate3 |_enddate3_|_doctor3....s60|e60|d60
>> _________________________________________________________________________________________________________
>>
>> Infection 10Jan1995 03Dec2008 J.Smith 23Dec1976 12Feb2009 R.Andrews .......
>> Vaccine 15Feb1990 05Jun2007 A.Calvin 12Aug1988 13Sept2004 H.Hollen .......
>> Cancer 12Sept1987 12Dec2009 R.Jackson 14Sep1973 23Dec2006 V.Karren ........
>> Diabetes 05Jan1992 13Nov2007 P.Stevens 03Jan1981 17Aug2001 A.Calvin ........
>> Cadiol. 07Feb1977 09Mar2007 S.Devin 04Apr1985 14Jan2003  J.Smith ........
>>
>> "Project" and "doctor" are strings. start and end date are float.
>> Sometimes the end-date is missing (".") meaning that the project is
>> still advised by
>> that doctor. Each project is mentioned only once in my dataset, so it
>> should be in fact my identifier,however, I do actually want to
>> focus on the doctors in order to see which projects one doctor had in
>> fact in each month and finally to count the number of projects
>> he had in each month. As you can see J.Smith had a project "Indection"
>> from 10Jan1995 until 03Dec2008 but he also had a project "Cardiol."
>> from 04Apr1985 until 14Jan2003 (similar case also for A.Calvin).
>> This means J.Smith had from 04Apr1985 until 10Jan1995 exactly one
>> project ("Cardiol."), from 10Jan1995 until 14Jan2003 he had two
>> projects
>> ("Cardiol" & "Infection") and from14Jan2003 until 03Dec2008 again only
>> one project ("Infection").
>> This is also why I want the date to be in a panel on a monthly basis
>> that should be like:
>>
>> Doctor |   Date   |  project1 | project2 | project3 |...
>> J.Smith Apr1985  Cardiol.
>> J.Smith May1985 Cardiol.
>> ..         ..                 ..
>> J.Smith Jan1995 Cardiol   Infection
>> J.Smith Feb1995 Cardiol   Infection
>> ..           ..             ..         ..
>> J.Smith Jan2003 Cardiol  Infection
>> J.Smith Feb2003             Infection
>> ...             ..                     ..
>> J.Smith Dec2008             Infection
>> A.Calvin ...
>> A.Calvin ..
>> ..
>>
>> This is everything but easy. Somehow I need to bring the dates at
>> least into one column.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index