Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to Correctly Structure a CSV before Loading it into STATA


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to Correctly Structure a CSV before Loading it into STATA
Date   Thu, 27 May 2010 10:08:32 -0400

Stephen R. Clark <stephenrclark@mchsi.com> :

No, you don't need to add any rows (AKA observations) to your file.
-tsset- your data after you load it, and use time-series operators to
define lags etc. (help tsvarlist).

On Thu, May 27, 2010 at 12:42 AM, Stephen R. Clark
<stephenrclark@mchsi.com> wrote:
> Dear Statalisters:
>
> Hello.  I am a long-time member, but a first-time writer.
>
> I am using STATA/IC 10.1.
>
> I have primarily used STATA for cross-sectional analysis, but I now need to
> use it to engage in panel data analysis.  Thankfully, from my reading of
> posts to this forum, I have learned that STATA has very powerful panel data
> analysis features.
>
> Now, let me get to my question.  I have an unbalanced panel of data that
> consists of 20 cross-sectional units (markets). Each of these markets
> contains a different number of time-series (daily) observations. These range
> from 31 days for the shortest market to 48 days for the longest market.
>
> I currently have the data in stacked (long) form in a CSV file.  I am
> dealing with "relative dates," so I am just using integer values (not actual
> dates) for the date variable.  The data are, somewhat arbitrarily, organized
> in this stacked format according to alphabetical order of the cross-section
> name. To be as clear as possible, please let me specify in more detail how
> the data is arranged in the CSV file:
>
> Relative-Day   Market (# of observations)   Dependent Variable   Independent
> Variables
>
> Under the relevant headings, I have 43 observations for "Market A." I then
> have 41 observations for "Market B," and so on until "Market T" (the 20th
> and final market), which has 40 observations.
>
> The missing data values can arguably be considered as randomly missing, so I
> am not concerned about any potential inferential problems associated with
> having an unbalanced panel. What I am concerned with is how to structure the
> data in the CSV file before importing it into STATA.
>
> Since the longest market has 48 observations, do I need to have 48 rows for
> each cross-section with blank cells where the data is missing? In other
> words, do I need to "artificially balance" the data before importing it into
> STATA?  If not, then will I be fine leaving the data in stacked (long)
> format, given an unequal number of days for each of the cross-sections?
>
> In considering my question, please be advised that my analysis will involve
> the use of lagged values of the dependent variable. In other words, I will
> be conducting dynamic panel data analysis. As such, I need STATA to
> recognize the panel structure of the data and not "lag into" the values for
> the preceding cross-section.
>
> Finally, if I need to "artificially balance" the data prior to importing it
> into STATA, then should I enter the NA values at the beginning or at the end
> of the respective markets? For instance, say that I am dealing with Market
> A, which has 43 observations. With the maximum number of observations at 48,
> I would need to enter 5 NA values. Should I do this as:
>
> NA
> NA
> NA
> NA
> NA
> 43 values
>
> or as
>
> 43 values
> NA
> NA
> NA
> NA
> NA
>
> Thanks in advance for your help.
>
> Stephen Clark

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index