Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Jessie Grace <gracejessie@hotmail.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RE: why messy when importing a csv file? |
Date | Thu, 6 May 2010 18:34:00 +0000 |
Sarah Edgington, thank you for help. If I open the file in notepad, the result is "Stkcd,Accper,A001101000" "000002,""1999-06-30"",468010960.13" "000002,""2002-09-30"",1166858479.70" "000002,""2000-01-01"",772831829.15" "000002,""2000-06-30"",911966043.54" "000002,""2000-12-31"",995745160.05" "000002,""2009-03-31"",26921921879.80" "000002,""1997-06-30"",0" "000002,""1991-12-31"",88628783.34" "000002,""1992-12-31"",204653478.04" "000003,""1998-12-31"",120946052.36" If I enter " type firms.csv " in Stata, the result is as follows, which is full of strange characters. . type firms.csv ..".S.t.k.c.d.,.A.c.c.p.e.r.,.A.0.0.1.1.0.1.0.0.0.". . .".0.0.0.0.0.2.,.".".1.9.9.9.-.0.6.-.3.0.".".,.4.6.8.0.1.0.9.6.0...1.3.". . .".0.0.0.0.0.2.,.".".2.0.0.2.-.0.9.-.3.0.".".,.1.1.6.6.8.5.8.4.7.9...7.0.". . .".0.0.0.0.0.2.,.".".2.0.0.0.-.0.1.-.0.1.".".,.7.7.2.8.3.1.8.2.9...1.5.". . .".0.0.0.0.0.2.,.".".2.0.0.0.-.0.6.-.3.0.".".,.9.1.1.9.6.6.0.4.3...5.4.". . .".0.0.0.0.0.2.,.".".2.0.0.0.-.1.2.-.3.1.".".,.9.9.5.7.4.5.1.6.0...0.5.". . .".0.0.0.0.0.2.,.".".2.0.0.9.-.0.3.-.3.1.".".,.2.6.9.2.1.9.2.1.8.7.9...8.0.". . .".0.0.0.0.0.2.,.".".1.9.9.7.-.0.6.-.3.0.".".,.0.". . .".0.0.0.0.0.2.,.".".1.9.9.1.-.1.2.-.3.1.".".,.8.8.6.2.8.7.8.3...3.4.". . .".0.0.0.0.0.2.,.".".1.9.9.2.-.1.2.-.3.1.".".,.2.0.4.6.5.3.4.7.8...0.4.". . .".0.0.0.0.0.3.,.".".1.9.9.8.-.1.2.-.3.1.".".,.1.2.0.9.4.6.0.5.2...3.6.". The problem seems to lie in that it is not plain text (ASCII) as -insheet- requires. Thank you for all help. Grace. ---------------------------------------- > From: sedging@ucla.edu > To: statalist@hsphsun2.harvard.edu > Subject: RE: st: RE: why messy when importing a csv file? > Date: Thu, 6 May 2010 11:04:37 -0700 > > to save a new one. Depending on how big the data set is the solution of > simply copying the contents of the file to the editor window and saving a > stata dataset might be the easiest. Otherwise you need to make sure that > you're saving a csv file that doesn't have extraneous information in it that > Stata can't use. > > You say "The characteristic of the file is the contents of each row are in > the same cell." What does this mean? Are you referring to the fact that > the value of the first variable is repeated? If so, that isn't a problem. > If you mean something else, particularly something having to do with the way > the end of the line is treated in the file then you have a problem. Are you > saying that if you open the csv file in a spreadsheet program you get all 25 > lines of data in a single row of the spreadsheet? If so, that's likely > going to cause issues. What does the csv file look like in a really basic > text editor (for example on a windows machine what does it look like if you > open it in notepad, not wordpad or word, but notepad)? Or alternatively > what do you get if you enter " type firms.csv " in Stata? > > -Sarah Edgington > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jessie Grace > Sent: Thursday, May 06, 2010 10:19 AM > To: statalist@hsphsun2.harvard.edu > Subject: RE: st: RE: why messy when importing a csv file? > > Nick, thank you for reply. > Additionally, the csv file is downloaded from a certain database. If I copy > the contents of the file to Stata's editor window. Everything goes well. > > . list > +-------------------------------+ > | stkcd accper a00110~0 | > |-------------------------------| > 1. | 2 1999-06-30 4.68e+08 | > 2. | 2 2002-09-30 1.17e+09 | > 3. | 2 2000-01-01 7.73e+08 | > 4. | 2 2000-06-30 9.12e+08 | > 5. | 2 2000-12-31 9.96e+08 | > |-------------------------------| > 6. | 2 2009-03-31 2.69e+10 | > 7. | 2 1997-06-30 0 | > 8. | 2 1991-12-31 8.86e+07 | > 9. | 2 1992-12-31 2.05e+08 | > 10. | 3 1998-12-31 1.21e+08 | > +-------------------------------+ > > If I copy the contents to a new csv file and type "insheet using firms.csv", > the results are as follows. > > . list > +-------------------------+ > | v1 | > |-------------------------| > 1. | Stkcd,Accper,A001101000 | > 2. | ,468010960.13 | > 3. | ,1166858479.70 | > 4. | ,772831829.15 | > 5. | ,911966043.54 | > |-------------------------| > 6. | ,995745160.05 | > 7. | ,26921921879.80 | > 8. | ,0 | > 9. | ,88628783.34 | > 10. | ,204653478.04 | > |-------------------------| > 11. | ,120946052.36 | > +-------------------------+ > > I think the points are "the contents of each row are in the same cell" and > the double quotes of the second variable in my csv file. > > Thank you for any help. > > Grace > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/