Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: R: st: problem of data management

From	"Sarah Edgington" <[email protected]>
To	<[email protected]>
Subject	RE: R: st: problem of data management
Date	Wed, 17 Apr 2013 10:54:45 -0700

I don't think you need -todate- since your string variable has slashes
rather than being a run-together date.  Even if it were run together, the
command you gave wouldn't work since you say your dates are in the form
mmddyyyy, which is not what you specified in the todate statement.
Try:
gen date3=date(date,"MDY")
That should give you what you want if all your variables are in the form
mm/dd/yyyy.
Hope that helps.
-Sarah

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Iodice Federico
Sent: Wednesday, April 17, 2013 10:44 AM
To: [email protected]
Subject: RE: R: st: problem of data management

dear statalister i've a new problem,


With respect to the previous email, we have a string as a data in
the following format:

12/06/2007, which means mm/dd/yyyy i have tried to following todate
command (after downloading it):

todate date, generate(date3) pattern(yyyymmdd) 

I get the following error message:  

date: length does not match pattern

how to solve the problem? We need to transform the string to a date format.

----------------------------------------
> Date: Mon, 15 Apr 2013 23:52:26 +0100
> Subject: Re: R: st: problem of data management
> From: [email protected]
> To: [email protected]
>
> Thanks for the further detail. It is now evident that you do not have 
> duplicates (in the sense of -duplicates-) that can or should be 
> removed. You may be able to -xtset- your data using your identifier 
> variable and a date variable or you may just -xtset- your data using 
> the identifier variable.
> Nick
> [email protected]
>
>
> On 15 April 2013 21:56, Iodice Federico <[email protected]>
wrote:
> > we have included, for shortness sake only few variables, which makes
some observations look like duplicate. However, the observation that looks
like a duplicate is not a duplicate. It refers in fact to the same
individual being observed again, but with a different outcome variable. To
show this, we have added a different column to show what we mean:
> >
> > ID GENDER AGE TIME Outcome Sector
> > 861 woman 50-54 1 861 Petrol & gas
> > 848 woman 25-29 1 848 consumer goods
> > 820 man 50-54 1 820 consumer goods
> > 861 woman 50-54 1 869 Petrol & gas
> > 820 man 50-54 1 870 services
> > 860 woman 50-54 1 890 services
> > 860 woman 50-54 2 895 consumer goods
> > 860 woman 50-54 1 900 consumer goods
> >
> > In other words, the same individual has been observed another time,
because she has found a new job. I hope it is now clear what we mean. In
fact, there are also other variables that are different, such as the time
when the new contract has been signed, the type of contract and so on.
> >> Date: Mon, 15 Apr 2013 10:04:24 +0100
> >> Subject: Re: R: st: problem of data management
> >> From: [email protected]
> >> To: [email protected]
> >>
> >> Same answer from me. From what you tell us, a wide structure would 
> >> not make most conceivable analyses easier.
> >>
> >> Your example included
> >>
> >> ID GENDER AGE TIME
> >> 861 woman 50-54 1
> >> 861 woman 50-54 1
> >>
> >> so that on that information duplicates could be inferred.
> >>
> >> I don't think your example included enough detail for us to advise 
> >> well on how to create a time (meaning date) variable. Presumably 
> >> that depends on other variables that you don't show us. Much 
> >> depends on whether you want the date variable to make sense across
identifiers.
> >>
> >> Nick
> >> [email protected]
> >>
> >>
> >> On 15 April 2013 09:33, Iodice Federico <[email protected]>
wrote:
> >> > Sorry, probably I was not clear in the description of my data 
> >> > management problem,
> >> >
> >> > the TIME column in the example below is not a time variable, it 
> >> > identifies instead the type of working contract an individual has 
> >> > got: 1 is assigned to Part-time contracts and 2 is assigned to
full-time contracts.
> >> >
> >> > The ID column doesn't have duplicates, but different observations 
> >> > of the same ID in different moments.
> >> >
> >> > In the ID variable, individuals have several occurrences, but the 
> >> > time when the individual is observed again and again has no 
> >> > regularity, are not defined a priori. We are observing individuals
whenever they find a new job.
> >> > Sometimes, they have many short employment spells. Quite often 
> >> > they find a permanent job and then we do not observe them anymore.
> >> >
> >> > For that reasons, I thought that it could have been a better 
> >> > solution to transform the dataset in wide format. This would mean 
> >> > dealing with it not as a panel data set, but rather as a 
> >> > cross-section with a longitudinal dimension.
> >> >
> >> > The alternative hypothesis would be to maintain the long format 
> >> > that the data set naturally has but since the ID variable is 
> >> > repeated only for some cases, for this reason I need to create a 
> >> > time variable to assign a sequence to successive observations of 
> >> > the same individual. This would be important to implement the 
> >> > xtset command. In other words, to maintain the long format and 
> >> > use panel data analysis, I need a command that assigns a growing 
> >> > numerical value to the new time variable. It should be done 
> >> > automatically every time the same individual is observed again. We
have several thousands of observations.
> >>
> >> >> > Date: Tue, 9 Apr 2013 11:59:24 +0100
> >> >> > From: [email protected]
> >> >> >
> >> >> > You can -xtset- your data using
> >> >> >
> >> >> > xtset id
> >> >> >
> >> >> > but your example indicates that you have duplicates on -id 
> >> >> > time-, so
> >> >> >
> >> >> > xtset id time
> >> >> >
> >> >> > would fail. I don't think you are telling us enough for it to 
> >> >> > be clear whether you just have -duplicates- (see the command 
> >> >> > of that name) that should be remove.ì
> >> >> >
> >> >> > I can't see any advantages to your wide data structure (Example
2).
> >> >> > (Many people do call this a format.)
> >> >> >
> >> >> > Nick
> >> >> > [email protected]
> >> >> >
> >> >> >
> >> >> > On 9 April 2013 11:51, Iodice Federico
<[email protected]> wrote:
> >> >> >
> >> >> >> I have a problem of data management that I would like to 
> >> >> >> submit to your
> >> >> attention. I?ve an unbalanced panel databank. As you can see 
> >> >> from the example below, the variable in column 1 (the ID 
> >> >> variable) is repeated only for some cases. In other words, I 
> >> >> have the same individual who is repeated several times.
> >> >> >>
> >> >> >> Example 1
> >> >> >>
> >> >> >> ID GENDER AGE TIME
> >> >> >> 861 woman 50-54 1
> >> >> >> 848 woman 25-29 1
> >> >> >> 820 man 50-54 1
> >> >> >> 861 woman 50-54 1
> >> >> >> 820 man 50-54 1
> >> >> >> 860 woman 50-54 1
> >> >> >> 860 woman 50-54 2
> >> >> >> 860 woman 50-54 1
> >> >> >>
> >> >> >> This happens only for some, but not all individuals in the 
> >> >> >> sample. It
> >> >> means that probably the best way of dealing with this dataset is 
> >> >> to use it not as a panel, but as a longitudinal data set with
repeated observations.
> >> >> The observations that do not repeat themselves, I can treat as 
> >> >> staying in the same status.
> >> >> >> In order to use this information, my impression is that I need
either:
> >> >> >>
> >> >> >> a) to tell to Stata that this is a panel and treat the data 
> >> >> >> as if it were a ?long format?; If case a) is the best one, 
> >> >> >> the data is in the long
> >> >> format and I need only to tell to Stata that the same 
> >> >> observation is repeated for different periods. Nonetheless, these
periods are not fixed.
> >> >> There can be any length, from one week to several years. how to 
> >> >> tell to stata that the same observation is repeated several 
> >> >> times? How to define the time dimension?
> >> >> >>
> >> >> >> b) or to treat the data as a cross-section with repeated 
> >> >> >> observations. In
> >> >> this case, I need to move the rows that are repeated to the to 
> >> >> shift automatically the entire repeated line to the right of the 
> >> >> first line in which the variable y appears. An example of case 
> >> >> b) is below. My question
> >> >> is: how to move the entire row besides the one where that 
> >> >> observation is already defined?
> >> >> >>
> >> >> >> Example 2
> >> >> >>
> >> >> >> ID GENDER AGE TIME ID GENDER AGE TIME ID GENDER AGE TIME
> >> >> >> 861 woman 50-54 1 861 woman 50-54 1
> >> >> >> 848 woman 25-29 1
> >> >> >> 820 man 50-54 1 820 man 50-54 1
> >> >> >> 860 woman 50-54 1 860 woman 50-54 2 860 woman 50-54 1
> >>
> >> *
> >> * For searches and help try:
> >> * http://www.stata.com/help.cgi?search
> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
> >> * http://www.ats.ucla.edu/stat/stata/
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: R: st: problem of data management
  - From: Iodice Federico <[email protected]>

References:
- st: problem of data management
  - From: Iodice Federico <[email protected]>
- Re: st: problem of data management
  - From: Nick Cox <[email protected]>
- RE: R: st: problem of data management
  - From: Iodice Federico <[email protected]>
- Re: R: st: problem of data management
  - From: Nick Cox <[email protected]>
- RE: R: st: problem of data management
  - From: Iodice Federico <[email protected]>
- Re: R: st: problem of data management
  - From: Nick Cox <[email protected]>
- RE: R: st: problem of data management
  - From: Iodice Federico <[email protected]>

Prev by Date: Re: st: Removing outliers from my dataset
Next by Date: RE: st: RE: Re: xtmixed with log-transfered dependent variable: back to non-log on margins and marginsplot
Previous by thread: Re: R: st: problem of data management
Next by thread: RE: R: st: problem of data management
Index(es):
- Date
- Thread