Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: problem of data management


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: problem of data management
Date   Tue, 9 Apr 2013 11:59:24 +0100

You can -xtset- your data using

xtset id

but your example indicates that you have duplicates on -id time-, so

xtset id time

would fail. I don't think you are telling us enough for it to be clear
whether you just have -duplicates- (see the command of that name) that
should be remove.

I can't see any advantages to your wide data structure (Example 2).
(Many people do call this a format.)

Nick
njcoxstata@gmail.com


On 9 April 2013 11:51, Iodice Federico <federico.iodice@hotmail.it> wrote:

> I have a problem of data management that I would like to submit to your attention. I’ve an unbalanced panel databank. As you can see from the example below, the variable in column 1 (the ID variable) is repeated only for some cases. In other words, I have the same individual who is repeated several times.
>
> Example 1
>
> ID     GENDER    AGE       TIME
> 861    woman     50-54     1
> 848    woman     25-29     1
> 820    man       50-54     1
> 861    woman     50-54     1
> 820    man       50-54     1
> 860    woman     50-54     1
> 860    woman     50-54     2
> 860    woman     50-54     1
>
> This happens only for some, but not all individuals in the sample. It means that probably the best way of dealing with this dataset is to use it not as a panel, but as a longitudinal data set with repeated observations. The observations that do not repeat themselves, I can treat as staying in the same status.
> In order to use this information, my impression is that I need either:
>
> a) to tell to Stata that this is a panel and treat the data as if it were a “long format”;
> If case a) is the best one, the data is in the long format and I need only to tell to Stata that the same observation is repeated for different periods. Nonetheless, these periods are not fixed. There can be any length, from one week to several years. how to tell to stata that the same observation is repeated several times? How to define the time dimension?
>
> b) or to treat the data as a cross-section with repeated observations. In this case, I need to move the rows that are repeated to the to shift automatically the entire repeated line to the right of the first line in which the variable y appears. An example of case b) is below. My question is: how to move the entire row besides the one where that observation is already defined?
>
> Example 2
>
> ID     GENDER   AGE     TIME     ID     GENDER   AGE     TIME     ID     GENDER   AGE     TIME
> 861    woman    50-54   1        861    woman    50-54   1
> 848    woman    25-29   1
> 820    man      50-54   1        820    man      50-54   1
> 860    woman    50-54   1        860    woman    50-54   2        860    woman    50-54   1
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index