Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: R: st: problem of data management

From	Iodice Federico <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: R: st: problem of data management
Date	Wed, 17 Apr 2013 22:22:27 +0200

Thank you for all the answers and hints. 
I looked at the help file and found that the following command works:

  gen eventdate = date(datfinemov, "DMY")

to see the date:

  format eventdate %td

----------------------------------------
> From: [email protected]
> To: [email protected]
> Subject: RE: R: st: problem of data management
> Date: Wed, 17 Apr 2013 10:54:45 -0700
>
> I don't think you need -todate- since your string variable has slashes
> rather than being a run-together date. Even if it were run together, the
> command you gave wouldn't work since you say your dates are in the form
> mmddyyyy, which is not what you specified in the todate statement.
> Try:
> gen date3=date(date,"MDY")
> That should give you what you want if all your variables are in the form
> mm/dd/yyyy.
> Hope that helps.
> -Sarah
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Iodice Federico
> Sent: Wednesday, April 17, 2013 10:44 AM
> To: [email protected]
> Subject: RE: R: st: problem of data management
>
> dear statalister i've a new problem,
>
>
> With respect to the previous email, we have a string as a data in
> the following format:
>
> 12/06/2007, which means mm/dd/yyyy i have tried to following todate
> command (after downloading it):
>
> todate date, generate(date3) pattern(yyyymmdd)
>
> I get the following error message:
>
> date: length does not match pattern
>
> how to solve the problem? We need to transform the string to a date format.
>
> ----------------------------------------
> > Date: Mon, 15 Apr 2013 23:52:26 +0100
> > Subject: Re: R: st: problem of data management
> > From: [email protected]
> > To: [email protected]
> >
> > Thanks for the further detail. It is now evident that you do not have
> > duplicates (in the sense of -duplicates-) that can or should be
> > removed. You may be able to -xtset- your data using your identifier
> > variable and a date variable or you may just -xtset- your data using
> > the identifier variable.
> > Nick
> > [email protected]
> >
> >
> > On 15 April 2013 21:56, Iodice Federico <[email protected]>
> wrote:
> > > we have included, for shortness sake only few variables, which makes
> some observations look like duplicate. However, the observation that looks
> like a duplicate is not a duplicate. It refers in fact to the same
> individual being observed again, but with a different outcome variable. To
> show this, we have added a different column to show what we mean:
> > >
> > > ID GENDER AGE TIME Outcome Sector
> > > 861 woman 50-54 1 861 Petrol & gas
> > > 848 woman 25-29 1 848 consumer goods
> > > 820 man 50-54 1 820 consumer goods
> > > 861 woman 50-54 1 869 Petrol & gas
> > > 820 man 50-54 1 870 services
> > > 860 woman 50-54 1 890 services
> > > 860 woman 50-54 2 895 consumer goods
> > > 860 woman 50-54 1 900 consumer goods
> > >
> > > In other words, the same individual has been observed another time,
> because she has found a new job. I hope it is now clear what we mean. In
> fact, there are also other variables that are different, such as the time
> when the new contract has been signed, the type of contract and so on.
> > >> Date: Mon, 15 Apr 2013 10:04:24 +0100
> > >> Subject: Re: R: st: problem of data management
> > >> From: [email protected]
> > >> To: [email protected]
> > >>
> > >> Same answer from me. From what you tell us, a wide structure would
> > >> not make most conceivable analyses easier.
> > >>
> > >> Your example included
> > >>
> > >> ID GENDER AGE TIME
> > >> 861 woman 50-54 1
> > >> 861 woman 50-54 1
> > >>
> > >> so that on that information duplicates could be inferred.
> > >>
> > >> I don't think your example included enough detail for us to advise
> > >> well on how to create a time (meaning date) variable. Presumably
> > >> that depends on other variables that you don't show us. Much
> > >> depends on whether you want the date variable to make sense across
> identifiers.
> > >>
> > >> Nick
> > >> [email protected]
> > >>
> > >>
> > >> On 15 April 2013 09:33, Iodice Federico <[email protected]>
> wrote:
> > >> > Sorry, probably I was not clear in the description of my data
> > >> > management problem,
> > >> >
> > >> > the TIME column in the example below is not a time variable, it
> > >> > identifies instead the type of working contract an individual has
> > >> > got: 1 is assigned to Part-time contracts and 2 is assigned to
> full-time contracts.
> > >> >
> > >> > The ID column doesn't have duplicates, but different observations
> > >> > of the same ID in different moments.
> > >> >
> > >> > In the ID variable, individuals have several occurrences, but the
> > >> > time when the individual is observed again and again has no
> > >> > regularity, are not defined a priori. We are observing individuals
> whenever they find a new job.
> > >> > Sometimes, they have many short employment spells. Quite often
> > >> > they find a permanent job and then we do not observe them anymore.
> > >> >
> > >> > For that reasons, I thought that it could have been a better
> > >> > solution to transform the dataset in wide format. This would mean
> > >> > dealing with it not as a panel data set, but rather as a
> > >> > cross-section with a longitudinal dimension.
> > >> >
> > >> > The alternative hypothesis would be to maintain the long format
> > >> > that the data set naturally has but since the ID variable is
> > >> > repeated only for some cases, for this reason I need to create a
> > >> > time variable to assign a sequence to successive observations of
> > >> > the same individual. This would be important to implement the
> > >> > xtset command. In other words, to maintain the long format and
> > >> > use panel data analysis, I need a command that assigns a growing
> > >> > numerical value to the new time variable. It should be done
> > >> > automatically every time the same individual is observed again. We
> have several thousands of observations.
> > >>
> > >> >> > Date: Tue, 9 Apr 2013 11:59:24 +0100
> > >> >> > From: [email protected]
> > >> >> >
> > >> >> > You can -xtset- your data using
> > >> >> >
> > >> >> > xtset id
> > >> >> >
> > >> >> > but your example indicates that you have duplicates on -id
> > >> >> > time-, so
> > >> >> >
> > >> >> > xtset id time
> > >> >> >
> > >> >> > would fail. I don't think you are telling us enough for it to
> > >> >> > be clear whether you just have -duplicates- (see the command
> > >> >> > of that name) that should be remove.ì
> > >> >> >
> > >> >> > I can't see any advantages to your wide data structure (Example
> 2).
> > >> >> > (Many people do call this a format.)
> > >> >> >
> > >> >> > Nick
> > >> >> > [email protected]
> > >> >> >
> > >> >> >
> > >> >> > On 9 April 2013 11:51, Iodice Federico
> <[email protected]> wrote:
> > >> >> >
> > >> >> >> I have a problem of data management that I would like to
> > >> >> >> submit to your
> > >> >> attention. I’ve an unbalanced panel databank. As you can see
> > >> >> from the example below, the variable in column 1 (the ID
> > >> >> variable) is repeated only for some cases. In other words, I
> > >> >> have the same individual who is repeated several times.
> > >> >> >>
> > >> >> >> Example 1
> > >> >> >>
> > >> >> >> ID GENDER AGE TIME
> > >> >> >> 861 woman 50-54 1
> > >> >> >> 848 woman 25-29 1
> > >> >> >> 820 man 50-54 1
> > >> >> >> 861 woman 50-54 1
> > >> >> >> 820 man 50-54 1
> > >> >> >> 860 woman 50-54 1
> > >> >> >> 860 woman 50-54 2
> > >> >> >> 860 woman 50-54 1
> > >> >> >>
> > >> >> >> This happens only for some, but not all individuals in the
> > >> >> >> sample. It
> > >> >> means that probably the best way of dealing with this dataset is
> > >> >> to use it not as a panel, but as a longitudinal data set with
> repeated observations.
> > >> >> The observations that do not repeat themselves, I can treat as
> > >> >> staying in the same status.
> > >> >> >> In order to use this information, my impression is that I need
> either:
> > >> >> >>
> > >> >> >> a) to tell to Stata that this is a panel and treat the data
> > >> >> >> as if it were a “long format”; If case a) is the best one,
> > >> >> >> the data is in the long
> > >> >> format and I need only to tell to Stata that the same
> > >> >> observation is repeated for different periods. Nonetheless, these
> periods are not fixed.
> > >> >> There can be any length, from one week to several years. how to
> > >> >> tell to stata that the same observation is repeated several
> > >> >> times? How to define the time dimension?
> > >> >> >>
> > >> >> >> b) or to treat the data as a cross-section with repeated
> > >> >> >> observations. In
> > >> >> this case, I need to move the rows that are repeated to the to
> > >> >> shift automatically the entire repeated line to the right of the
> > >> >> first line in which the variable y appears. An example of case
> > >> >> b) is below. My question
> > >> >> is: how to move the entire row besides the one where that
> > >> >> observation is already defined?
> > >> >> >>
> > >> >> >> Example 2
> > >> >> >>
> > >> >> >> ID GENDER AGE TIME ID GENDER AGE TIME ID GENDER AGE TIME
> > >> >> >> 861 woman 50-54 1 861 woman 50-54 1
> > >> >> >> 848 woman 25-29 1
> > >> >> >> 820 man 50-54 1 820 man 50-54 1
> > >> >> >> 860 woman 50-54 1 860 woman 50-54 2 860 woman 50-54 1
> > >>
> > >> *
> > >> * For searches and help try:
> > >> * http://www.stata.com/help.cgi?search
> > >> * http://www.stata.com/support/faqs/resources/statalist-faq/
> > >> * http://www.ats.ucla.edu/stat/stata/
> > > *
> > > * For searches and help try:
> > > * http://www.stata.com/help.cgi?search
> > > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > > * http://www.ats.ucla.edu/stat/stata/
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/ 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: problem of data management
  - From: Iodice Federico <[email protected]>
- Re: st: problem of data management
  - From: Nick Cox <[email protected]>
- RE: R: st: problem of data management
  - From: Iodice Federico <[email protected]>
- Re: R: st: problem of data management
  - From: Nick Cox <[email protected]>
- RE: R: st: problem of data management
  - From: Iodice Federico <[email protected]>
- Re: R: st: problem of data management
  - From: Nick Cox <[email protected]>
- RE: R: st: problem of data management
  - From: Iodice Federico <[email protected]>
- RE: R: st: problem of data management
  - From: "Sarah Edgington" <[email protected]>

Prev by Date: Re: st: marginsplot of marginal effects of a factor variable from a probit, including the baseline
Next by Date: Re: st: -prtest- interpreting 1/2 as variable in one sample proportion test
Previous by thread: RE: R: st: problem of data management
Next by thread: st: assetindex
Index(es):
- Date
- Thread