Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: R: st: problem of data management

From	Iodice Federico <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: R: st: problem of data management
Date	Mon, 15 Apr 2013 10:33:04 +0200

Sorry, probably I was not clear in the description of my data management
 problem,

 the TIME column in the example below is not a time variable, it identifies
 instead the type of working contract an individual has got: 1 is assigned to
 Part-time contracts and 2 is assigned to full-time contracts.

 The ID column doesn't have duplicates, but different observations of the
 same ID in different moments.

 In the ID variable, individuals have several occurrences, but the time when
 the individual is observed again and again has no regularity, are not
 defined a priori. We are observing individuals whenever they find a new job.
 Sometimes, they have many short employment spells. Quite often they find a
 permanent job and then we do not observe them anymore.

 For that reasons, I thought that it could have been a better solution to
 transform the dataset in wide format. This would mean dealing with it not as
 a panel data set, but rather as a cross-section with a longitudinal
 dimension.

 The alternative hypothesis would be to maintain the long format that the
 data set naturally has but since the ID variable is repeated only for some
 cases, for this reason I need to create a time variable to assign a sequence
 to successive observations of the same individual. This would be important
 to implement the xtset command. In other words, to maintain the long format
 and use panel data analysis, I need a command that assigns a
 growing numerical value to the new time variable. It should be done
 automatically every time the same individual is observed again. We have
 several thousands of observations.

> ----------------------------------------
> > Date: Tue, 9 Apr 2013 11:59:24 +0100
> > Subject: Re: st: problem of data management
> > From: [email protected]
> > To: [email protected]
> >
> > You can -xtset- your data using
> >
> > xtset id
> >
> > but your example indicates that you have duplicates on -id time-, so
> >
> > xtset id time
> >
> > would fail. I don't think you are telling us enough for it to be clear
> > whether you just have -duplicates- (see the command of that name) that
> > should be remove.ì
> >
> > I can't see any advantages to your wide data structure (Example 2).
> > (Many people do call this a format.)
> >
> > Nick
> > [email protected]
> >
> >
> > On 9 April 2013 11:51, Iodice Federico <[email protected]> wrote:
> >
> >> I have a problem of data management that I would like to submit to your
> attention. I’ve an unbalanced panel databank. As you can see from the
> example below, the variable in column 1 (the ID variable) is repeated only
> for some cases. In other words, I have the same individual who is repeated
> several times.
> >>
> >> Example 1
> >>
> >> ID GENDER AGE TIME
> >> 861 woman 50-54 1
> >> 848 woman 25-29 1
> >> 820 man 50-54 1
> >> 861 woman 50-54 1
> >> 820 man 50-54 1
> >> 860 woman 50-54 1
> >> 860 woman 50-54 2
> >> 860 woman 50-54 1
> >>
> >> This happens only for some, but not all individuals in the sample. It
> means that probably the best way of dealing with this dataset is to use it
> not as a panel, but as a longitudinal data set with repeated observations.
> The observations that do not repeat themselves, I can treat as staying in
> the same status.
> >> In order to use this information, my impression is that I need either:
> >>
> >> a) to tell to Stata that this is a panel and treat the data as if it
> >> were a “long format”; If case a) is the best one, the data is in the long
> format and I need only to tell to Stata that the same observation is
> repeated for different periods. Nonetheless, these periods are not fixed.
> There can be any length, from one week to several years. how to tell to
> stata that the same observation is repeated several times? How to define the
> time dimension?
> >>
> >> b) or to treat the data as a cross-section with repeated observations. In
> this case, I need to move the rows that are repeated to the to shift
> automatically the entire repeated line to the right of the first line in
> which the variable y appears. An example of case b) is below. My question
> is: how to move the entire row besides the one where that observation is
> already defined?
> >>
> >> Example 2
> >>
> >> ID GENDER AGE TIME ID GENDER AGE TIME ID GENDER AGE TIME
> >> 861 woman 50-54 1 861 woman 50-54 1
> >> 848 woman 25-29 1
> >> 820 man 50-54 1 820 man 50-54 1
> >> 860 woman 50-54 1 860 woman 50-54 2 860 woman 50-54 1
> >>
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
>
>
> 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: R: st: problem of data management
  - From: Nick Cox <[email protected]>

References:
- st: problem of data management
  - From: Iodice Federico <[email protected]>
- Re: st: problem of data management
  - From: Nick Cox <[email protected]>

Prev by Date: st: problems with ssc install
Next by Date: st: FW: Wrong skeweness for stochastic frontier analysis - impose it in MLE
Previous by thread: Re: st: problem of data management
Next by thread: Re: R: st: problem of data management
Index(es):
- Date
- Thread