Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Iodice Federico <federico.iodice@hotmail.it> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: R: st: problem of data management |

Date |
Wed, 17 Apr 2013 22:22:27 +0200 |

Thank you for all the answers and hints. I looked at the help file and found that the following command works: gen eventdate = date(datfinemov, "DMY") to see the date: format eventdate %td ---------------------------------------- > From: sedging@ucla.edu > To: statalist@hsphsun2.harvard.edu > Subject: RE: R: st: problem of data management > Date: Wed, 17 Apr 2013 10:54:45 -0700 > > I don't think you need -todate- since your string variable has slashes > rather than being a run-together date. Even if it were run together, the > command you gave wouldn't work since you say your dates are in the form > mmddyyyy, which is not what you specified in the todate statement. > Try: > gen date3=date(date,"MDY") > That should give you what you want if all your variables are in the form > mm/dd/yyyy. > Hope that helps. > -Sarah > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Iodice Federico > Sent: Wednesday, April 17, 2013 10:44 AM > To: statalist@hsphsun2.harvard.edu > Subject: RE: R: st: problem of data management > > dear statalister i've a new problem, > > > With respect to the previous email, we have a string as a data in > the following format: > > 12/06/2007, which means mm/dd/yyyy i have tried to following todate > command (after downloading it): > > todate date, generate(date3) pattern(yyyymmdd) > > I get the following error message: > > date: length does not match pattern > > how to solve the problem? We need to transform the string to a date format. > > ---------------------------------------- > > Date: Mon, 15 Apr 2013 23:52:26 +0100 > > Subject: Re: R: st: problem of data management > > From: njcoxstata@gmail.com > > To: statalist@hsphsun2.harvard.edu > > > > Thanks for the further detail. It is now evident that you do not have > > duplicates (in the sense of -duplicates-) that can or should be > > removed. You may be able to -xtset- your data using your identifier > > variable and a date variable or you may just -xtset- your data using > > the identifier variable. > > Nick > > njcoxstata@gmail.com > > > > > > On 15 April 2013 21:56, Iodice Federico <federico.iodice@hotmail.it> > wrote: > > > we have included, for shortness sake only few variables, which makes > some observations look like duplicate. However, the observation that looks > like a duplicate is not a duplicate. It refers in fact to the same > individual being observed again, but with a different outcome variable. To > show this, we have added a different column to show what we mean: > > > > > > ID GENDER AGE TIME Outcome Sector > > > 861 woman 50-54 1 861 Petrol & gas > > > 848 woman 25-29 1 848 consumer goods > > > 820 man 50-54 1 820 consumer goods > > > 861 woman 50-54 1 869 Petrol & gas > > > 820 man 50-54 1 870 services > > > 860 woman 50-54 1 890 services > > > 860 woman 50-54 2 895 consumer goods > > > 860 woman 50-54 1 900 consumer goods > > > > > > In other words, the same individual has been observed another time, > because she has found a new job. I hope it is now clear what we mean. In > fact, there are also other variables that are different, such as the time > when the new contract has been signed, the type of contract and so on. > > >> Date: Mon, 15 Apr 2013 10:04:24 +0100 > > >> Subject: Re: R: st: problem of data management > > >> From: njcoxstata@gmail.com > > >> To: statalist@hsphsun2.harvard.edu > > >> > > >> Same answer from me. From what you tell us, a wide structure would > > >> not make most conceivable analyses easier. > > >> > > >> Your example included > > >> > > >> ID GENDER AGE TIME > > >> 861 woman 50-54 1 > > >> 861 woman 50-54 1 > > >> > > >> so that on that information duplicates could be inferred. > > >> > > >> I don't think your example included enough detail for us to advise > > >> well on how to create a time (meaning date) variable. Presumably > > >> that depends on other variables that you don't show us. Much > > >> depends on whether you want the date variable to make sense across > identifiers. > > >> > > >> Nick > > >> njcoxstata@gmail.com > > >> > > >> > > >> On 15 April 2013 09:33, Iodice Federico <federico.iodice@hotmail.it> > wrote: > > >> > Sorry, probably I was not clear in the description of my data > > >> > management problem, > > >> > > > >> > the TIME column in the example below is not a time variable, it > > >> > identifies instead the type of working contract an individual has > > >> > got: 1 is assigned to Part-time contracts and 2 is assigned to > full-time contracts. > > >> > > > >> > The ID column doesn't have duplicates, but different observations > > >> > of the same ID in different moments. > > >> > > > >> > In the ID variable, individuals have several occurrences, but the > > >> > time when the individual is observed again and again has no > > >> > regularity, are not defined a priori. We are observing individuals > whenever they find a new job. > > >> > Sometimes, they have many short employment spells. Quite often > > >> > they find a permanent job and then we do not observe them anymore. > > >> > > > >> > For that reasons, I thought that it could have been a better > > >> > solution to transform the dataset in wide format. This would mean > > >> > dealing with it not as a panel data set, but rather as a > > >> > cross-section with a longitudinal dimension. > > >> > > > >> > The alternative hypothesis would be to maintain the long format > > >> > that the data set naturally has but since the ID variable is > > >> > repeated only for some cases, for this reason I need to create a > > >> > time variable to assign a sequence to successive observations of > > >> > the same individual. This would be important to implement the > > >> > xtset command. In other words, to maintain the long format and > > >> > use panel data analysis, I need a command that assigns a growing > > >> > numerical value to the new time variable. It should be done > > >> > automatically every time the same individual is observed again. We > have several thousands of observations. > > >> > > >> >> > Date: Tue, 9 Apr 2013 11:59:24 +0100 > > >> >> > From: njcoxstata@gmail.com > > >> >> > > > >> >> > You can -xtset- your data using > > >> >> > > > >> >> > xtset id > > >> >> > > > >> >> > but your example indicates that you have duplicates on -id > > >> >> > time-, so > > >> >> > > > >> >> > xtset id time > > >> >> > > > >> >> > would fail. I don't think you are telling us enough for it to > > >> >> > be clear whether you just have -duplicates- (see the command > > >> >> > of that name) that should be remove.ì > > >> >> > > > >> >> > I can't see any advantages to your wide data structure (Example > 2). > > >> >> > (Many people do call this a format.) > > >> >> > > > >> >> > Nick > > >> >> > njcoxstata@gmail.com > > >> >> > > > >> >> > > > >> >> > On 9 April 2013 11:51, Iodice Federico > <federico.iodice@hotmail.it> wrote: > > >> >> > > > >> >> >> I have a problem of data management that I would like to > > >> >> >> submit to your > > >> >> attention. I’ve an unbalanced panel databank. As you can see > > >> >> from the example below, the variable in column 1 (the ID > > >> >> variable) is repeated only for some cases. In other words, I > > >> >> have the same individual who is repeated several times. > > >> >> >> > > >> >> >> Example 1 > > >> >> >> > > >> >> >> ID GENDER AGE TIME > > >> >> >> 861 woman 50-54 1 > > >> >> >> 848 woman 25-29 1 > > >> >> >> 820 man 50-54 1 > > >> >> >> 861 woman 50-54 1 > > >> >> >> 820 man 50-54 1 > > >> >> >> 860 woman 50-54 1 > > >> >> >> 860 woman 50-54 2 > > >> >> >> 860 woman 50-54 1 > > >> >> >> > > >> >> >> This happens only for some, but not all individuals in the > > >> >> >> sample. It > > >> >> means that probably the best way of dealing with this dataset is > > >> >> to use it not as a panel, but as a longitudinal data set with > repeated observations. > > >> >> The observations that do not repeat themselves, I can treat as > > >> >> staying in the same status. > > >> >> >> In order to use this information, my impression is that I need > either: > > >> >> >> > > >> >> >> a) to tell to Stata that this is a panel and treat the data > > >> >> >> as if it were a “long format”; If case a) is the best one, > > >> >> >> the data is in the long > > >> >> format and I need only to tell to Stata that the same > > >> >> observation is repeated for different periods. Nonetheless, these > periods are not fixed. > > >> >> There can be any length, from one week to several years. how to > > >> >> tell to stata that the same observation is repeated several > > >> >> times? How to define the time dimension? > > >> >> >> > > >> >> >> b) or to treat the data as a cross-section with repeated > > >> >> >> observations. In > > >> >> this case, I need to move the rows that are repeated to the to > > >> >> shift automatically the entire repeated line to the right of the > > >> >> first line in which the variable y appears. An example of case > > >> >> b) is below. My question > > >> >> is: how to move the entire row besides the one where that > > >> >> observation is already defined? > > >> >> >> > > >> >> >> Example 2 > > >> >> >> > > >> >> >> ID GENDER AGE TIME ID GENDER AGE TIME ID GENDER AGE TIME > > >> >> >> 861 woman 50-54 1 861 woman 50-54 1 > > >> >> >> 848 woman 25-29 1 > > >> >> >> 820 man 50-54 1 820 man 50-54 1 > > >> >> >> 860 woman 50-54 1 860 woman 50-54 2 860 woman 50-54 1 > > >> > > >> * > > >> * For searches and help try: > > >> * http://www.stata.com/help.cgi?search > > >> * http://www.stata.com/support/faqs/resources/statalist-faq/ > > >> * http://www.ats.ucla.edu/stat/stata/ > > > * > > > * For searches and help try: > > > * http://www.stata.com/help.cgi?search > > > * http://www.stata.com/support/faqs/resources/statalist-faq/ > > > * http://www.ats.ucla.edu/stat/stata/ > > > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/faqs/resources/statalist-faq/ > > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: problem of data management***From:*Iodice Federico <federico.iodice@hotmail.it>

**Re: st: problem of data management***From:*Nick Cox <njcoxstata@gmail.com>

**RE: R: st: problem of data management***From:*Iodice Federico <federico.iodice@hotmail.it>

**Re: R: st: problem of data management***From:*Nick Cox <njcoxstata@gmail.com>

**RE: R: st: problem of data management***From:*Iodice Federico <federico.iodice@hotmail.it>

**Re: R: st: problem of data management***From:*Nick Cox <njcoxstata@gmail.com>

**RE: R: st: problem of data management***From:*Iodice Federico <federico.iodice@hotmail.it>

**RE: R: st: problem of data management***From:*"Sarah Edgington" <sedging@ucla.edu>

- Prev by Date:
**Re: st: marginsplot of marginal effects of a factor variable from a probit, including the baseline** - Next by Date:
**Re: st: -prtest- interpreting 1/2 as variable in one sample proportion test** - Previous by thread:
**RE: R: st: problem of data management** - Next by thread:
**st: assetindex** - Index(es):