Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: problem of data management

From	Iodice Federico <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: problem of data management
Date	Tue, 9 Apr 2013 12:51:14 +0200

Dear Statalists,

I have a problem of data management that I would like to submit to your attention. I’ve an unbalanced panel databank. As you can see from the example below, the variable in column 1 (the ID variable) is repeated only for some cases. In other words, I have the same individual who is repeated several times.

Example 1

ID GENDER AGE TIME
861 woman 50-54 1
848 woman 25-29 1
820 man 50-54 1
861 woman 50-54 1
820 man 50-54 1
860 woman 50-54 1
860 woman 50-54 2
860 woman 50-54 1

This happens only for some, but not all individuals in the sample. It means that probably the best way of dealing with this dataset is to use it not as a panel, but as a longitudinal data set with repeated observations. The observations that do not repeat themselves, I can treat as staying in the same status.
In order to use this information, my impression is that I need either:

a) to tell to Stata that this is a panel and treat the data as if it were a “long format”;
If case a) is the best one, the data is in the long format and I need only to tell to Stata that the same observation is repeated for different periods. Nonetheless, these periods are not fixed. There can be any length, from one week to several years. how to tell to stata that the same observation is repeated several times? How to define the time dimension?

b) or to treat the data as a cross-section with repeated observations. In this case, I need to move the rows that are repeated to the to shift automatically the entire repeated line to the right of the first line in which the variable y appears. An example of case b) is below. My question is: how to move the entire row besides the one where that observation is already defined?

Example 2

ID GENDER AGE TIME ID GENDER AGE TIME ID GENDER AGE TIME
861 woman 50-54 1 861 woman 50-54 1
848 woman 25-29 1
820 man 50-54 1 820 man 50-54 1
860 woman 50-54 1 860 woman 50-54 2 860 woman 50-54 1

Thank you in advance for your kindness and time
Best regards,
Federico Iodice
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: problem of data management
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: How to apply sktest to panel data?
Next by Date: Re: st: problem of data management
Previous by thread: st: Odd ratio / relative risk in logistic regression
Next by thread: Re: st: problem of data management
Index(es):
- Date
- Thread