Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Austin Nichols <austinnichols@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: How to Correctly Structure a CSV before Loading it into STATA |
Date | Thu, 27 May 2010 10:08:32 -0400 |
Stephen R. Clark <stephenrclark@mchsi.com> : No, you don't need to add any rows (AKA observations) to your file. -tsset- your data after you load it, and use time-series operators to define lags etc. (help tsvarlist). On Thu, May 27, 2010 at 12:42 AM, Stephen R. Clark <stephenrclark@mchsi.com> wrote: > Dear Statalisters: > > Hello. I am a long-time member, but a first-time writer. > > I am using STATA/IC 10.1. > > I have primarily used STATA for cross-sectional analysis, but I now need to > use it to engage in panel data analysis. Thankfully, from my reading of > posts to this forum, I have learned that STATA has very powerful panel data > analysis features. > > Now, let me get to my question. I have an unbalanced panel of data that > consists of 20 cross-sectional units (markets). Each of these markets > contains a different number of time-series (daily) observations. These range > from 31 days for the shortest market to 48 days for the longest market. > > I currently have the data in stacked (long) form in a CSV file. I am > dealing with "relative dates," so I am just using integer values (not actual > dates) for the date variable. The data are, somewhat arbitrarily, organized > in this stacked format according to alphabetical order of the cross-section > name. To be as clear as possible, please let me specify in more detail how > the data is arranged in the CSV file: > > Relative-Day Market (# of observations) Dependent Variable Independent > Variables > > Under the relevant headings, I have 43 observations for "Market A." I then > have 41 observations for "Market B," and so on until "Market T" (the 20th > and final market), which has 40 observations. > > The missing data values can arguably be considered as randomly missing, so I > am not concerned about any potential inferential problems associated with > having an unbalanced panel. What I am concerned with is how to structure the > data in the CSV file before importing it into STATA. > > Since the longest market has 48 observations, do I need to have 48 rows for > each cross-section with blank cells where the data is missing? In other > words, do I need to "artificially balance" the data before importing it into > STATA? If not, then will I be fine leaving the data in stacked (long) > format, given an unequal number of days for each of the cross-sections? > > In considering my question, please be advised that my analysis will involve > the use of lagged values of the dependent variable. In other words, I will > be conducting dynamic panel data analysis. As such, I need STATA to > recognize the panel structure of the data and not "lag into" the values for > the preceding cross-section. > > Finally, if I need to "artificially balance" the data prior to importing it > into STATA, then should I enter the NA values at the beginning or at the end > of the respective markets? For instance, say that I am dealing with Market > A, which has 43 observations. With the maximum number of observations at 48, > I would need to enter 5 NA values. Should I do this as: > > NA > NA > NA > NA > NA > 43 values > > or as > > 43 values > NA > NA > NA > NA > NA > > Thanks in advance for your help. > > Stephen Clark * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/