Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Import csv file


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Import csv file
Date   Tue, 29 Aug 2006 14:53:25 +0100

This solution seems a bit more complicated than necessary. 

I started with this soares.csv

Name,Company1,Company2,Company3
Code,0000(MV),0001(MV),0002(MV)
Currency,GBP,GBP,USD
31/12/1986,1,2,3
30/01/1987,4,5,6
30/12/2005,7,8,9

and then ran this 

insheet using soares.csv, clear
foreach v of var * {
	rename `v' `=`v'[1]'
}
drop in 1
gen long id = _n
reshape long Company , i(id)
gen Code = Company if  Name == "Code"
gen Currency = Company if Name == "Currency"
bysort _j (Code) : replace Code = Code[_N]
bysort _j (Currency) : replace Currency = Currency[_N]
drop if inlist(Name, "Code", "Currency")
drop id
rename Company Value 
rename _j Company 
gen Date = date(Name, "dmy") 
format Date %dd/n/CY
drop Name 
destring Value 
sort Company Date

The results are

. l

     +----------------------------------------------------+
     | Company   Value       Code   Currency         Date |
     |----------------------------------------------------|
  1. |       1       1   0000(MV)        GBP   31/12/1986 |
  2. |       1       4   0000(MV)        GBP    30/1/1987 |
  3. |       1       7   0000(MV)        GBP   30/12/2005 |
  4. |       2       2   0001(MV)        GBP   31/12/1986 |
  5. |       2       5   0001(MV)        GBP    30/1/1987 |
     |----------------------------------------------------|
  6. |       2       8   0001(MV)        GBP   30/12/2005 |
  7. |       3       3   0002(MV)        USD   31/12/1986 |
  8. |       3       6   0002(MV)        USD    30/1/1987 |
  9. |       3       9   0002(MV)        USD   30/12/2005 |
     +----------------------------------------------------+

What are the key differences? 

1. No need to resort to a user-written program -sxpose-, 
no matter how much you trust it. 

2. Date manipulations are much simpler with -date()-. 
You _don't_ have to -destring- the date. Also dropping 
the "/" is dangerous! 

Nick 
n.j.cox@durham.ac.uk 

Nuno Soares
 
> Thanks a lot Radu! Just had to destring the date_ variables. 
> Here goes the code. Might be useful to other statalisters:
> 
> ***************** BEGIN CODE ***********************
> 
> clear
> *reading in the data
> insheet using "C:\mv1.csv", clear
> 
> *transposing
> sxpose, clear
> 
> *renaming the first three variables
> foreach v of varlist _var1 _var2 _var3 {
> local i = `v'[1] 
> rename `v' `i'
> }
> 
> *getting rid of "/" in the dates
> *note that after renaming, only the date variables start with "_var"
> destring _var*, replace ignore("/") force
> 
> *now renaming the remaining variables
> foreach v of varlist _var* {
> local i = `v'[1]
> rename `v' data`i'
> }
> 
> *dropping the first observation (because this is just the 
> name of the variable) drop in 1
> 
> *reshaping
> reshape long data, i(Name) j(date) string
> 
> 
> *now you're left with the date in a string format, which you 
> can convert to a proper date, in a couple of steps
> 
> gen str date_year = substr(date, -4,.)
> gen str date_month = substr(date, 3, 2)
> gen str date_day = substr(date, 1, 2)
> destring  date_year date_month date_day, replace
> 
> gen truedate = mdy(date_month, date_day, date_year)
> format truedate %d
> 
> ***************** END CODE ***********************

Radu Ban
 
> Here's a try. For this you would need to install the -sxpose- 
> command (do a -findit sxpose- and then follow the link to 
> install). This is needed because the built-in -xpose- command 
> doesn't work with strings.
> 
> ******************************
> 
> *reading in the data
> insheet using "C:\somedir\myfile.csv", clear
> 
> *transposing
> sxpose, clear
> 
> *renaming the first three variables
> foreach v of varlist _var1 _var2 _var3 { local i = `v'[1] 
> rename `v' `i'
> }
> 
> *getting rid of "/" in the dates
> *note that after renaming, only the date variables start with "_var"
> destring _var*, replace ignore("/") force
> 
> *now renaming the remaining variables
> foreach v of varlist _var* {
> local i = `v'[1]
> rename `v' data`i'
> }
> 
> *dropping the first observation (because this is just the 
> name of the variable) drop in 1
> 
> *reshaping
> reshape long data, i(Name) j(date) string
> 
> 
> *now you're left with the date in a string format, which you 
> can convert to a proper *date, in a couple of steps
> 
> gen str date_year = substr(date, -4,.)
> gen str date_month = substr(date, 3, 2)
> gen str date_day = substr(date, 1, 2)
> 
> gen truedate = mdy(date_month, date_day, date_year)
> 
> *now you can format the date using the preferred date format, 
> for example format truedate %d
> ********************
> 
> I hope this helps. Also if you want to run this for several 
> .csv files, check out the archives for last week, there's a 
> thread on how to loop over all files in a directory.
 

Nuno Soares 

> > I need to import some csv files that have an awkward 
> format. The format is always the same:
> >
> > Name Company1 Company2 Company3 ...
> > Code 0000(MV) 0001(MV) 0002(MV) ...
> > Currency £ £ € ...
> > 31/12/1986 data1 data2 data3 ...
> > 30/01/1987 dataX dataY dataZ ...
> > ...
> > 30/12/2005 dataW dataJ dataK ...
> >
> > The number of companies is variable (ie, columns), and its 
> width is also variable. Data1, data2, etc. are numeric.
> >
> > Can you please tell me if it is possible to import to Stata 
> the data described, in the following format:
> >
> > Company  Code   Currency  Date  Data
> > Company1 0000(MV) £   31/12/1986 data1
> > Company1 0000(MV) £   30/01/1987 dataX
> > ...
> > Company1 0000(MV) £   30/12/2005 dataW
> > Company2 0001(MV) £   31/12/1986 data2
> > Company2 0001(MV) £   30/01/1987 dataY
> > ...
> > Company2 0001(MV) £   30/12/2005 dataJ
> > Company3 0002(MV) £   31/12/1986 data3
> > Company3 0002(MV) £   30/01/1987 dataZ
> > ...
> > Company3 0002(MV) £   30/12/2005 dataK
> > and so on.
> >
> > Some treatment on the original data can be done in Excel, 
> but I wanted to avoid it (I have almost one hundred files 
> that I need to import...). If any one has an idea of how to 
> do it, I would be greatly appreciated!

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index