Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: comas in numbrers

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: RE: comas in numbrers
Date	Sun, 2 May 2010 21:44:27 -0400

assume that your commas are not separating the integer from the
non-integer part of the numbers, but instead are indicating
thousands--see Jose's post today.

 I suggest that you use -filefilter- instead of -sed- , Since this is
a Stata command, it will be platform independent, whereas the -sed-
command on your system might have a different syntax than the one on
mine (OS X 10.5.8, bash shell) . (The  "-e" might be an issue; you
could omit it.)  Either one of these two statements should work.

filefilter file1.txt file2.txt, from(",") to("")  //best
filefilter file1.txt file2.txt, from(\044d) to("")

 Monitor what is in file2.txt  either  by opening it up in your text
editor (best), or using -type-.

  I'm not sure what you mean by "highlighting" the data, and I'm not
sure that I consider the do file editor as a good text editor. (I
wouldn't want to use it with a thousand-line file).  Still, while you
are there (or in a good text editor), why not search for and replace
the comma, save out, as Jose and Nick suggested.
'
Good luck,

Steve

On Sun, May 2, 2010 at 12:20 PM, [email protected] <[email protected]> wrote:
> Thanks Steve ,however I seem to be doing something wrong with processing
> the following data into a file calling it file1.txt. I am doing this by
> highlighting the data,copy it and paste it into a do-file and then
> saving it as file1.txt.       .....
>
>  31,666 15,127 46,793 31,259 15,409 46,668 30,215 17,182 7,397
>  28,323 13,798 42,121 28,267 14,024 42,291 27,049 15,886 2,935
>  6,615 2,307 8,922 6,668 2,602 9,270 6,130 3,380 9,510
>  7,829 4,196 12,025 7,556 3,855 11,411 7,020 4,336 11,356
>  5,583 2,809 8,392 6,098 2,856 8,954 6,177 3,145 9,322
>  4,399 2,421 6,820 4,311 2,520 6,831 4,119 2,609 6,728
>  3,897 2,065 5,962 3,634 2,191 5,825 3,603 2,416 6,019
>  3,343 1,329 4,672 2,992 1,385 4,377 3,166 1,296 4,462
>
>  do "C:\DOCUME~1\Victor\LOCALS~1\Temp\STD04000000.tmp"
>
>  capture erase file2.txt
>
>  ! sed -e 's/,//g' file1.txt>file2.txt
>
> .
>  infile v1 v2 v3 using file2.txt, clear
> (0 observations read)
>
>  des
>
> Contains data
>  obs:             0
>  vars:             0
>  size:             0 (100.0% of memory free)
> Sorted by:
>
>  list
>
> .
> end of do-file
>
>
>
> On 28/4/2010, "Steve Samuels" <[email protected]> wrote:
>
>>Victor, you can skip all the problems of adding quotes or destringing
>>by deleting the commas outside of Stata. Use the SED package, built
>>into all the Unix, Linux, and OS X distributions and also available
>>for Windows.  Write the data into a text file, say file1.txt.  Here is
>>an example:
>>
>>**************************CODE BEGINS**************************
>>/*
>> File file1.txt contains the following two lines with three variables
>>3,000,000 4,000,000    .
>>     40,000   300,000 100
>>*/
>>capture erase file2.txt
>>! sed  -e 's/,//g' file1.txt>file2.txt
>>
>>infile v1 v2 v3 using file2.txt, clear
>>des
>>list
>>***************************CODE ENDS**************************
>>
>>Steve
>>
>>On Tue, Apr 27, 2010 at 5:16 PM, Martin Weiss <[email protected]> wrote:
>>>
>>> <>
>>>
>>> To answer this conclusively, we need to know where the data reside, i.e.
>>> what kind of a file.
>>>
>>> I just created a file "Book1.txt" from spreadsheet software, containing
>>> numbers with commata in them, and used -insheet using Book1.txt, tab clear-
>>> to get them into Stata. "tab" is just a red herring here so Stata does not
>>> treat the commata as delimiters. Afterwards, I -destring-ed as described
>>> before and ended up with a numeric variable in Stata. This approach could be
>>> viable for you as well.
>>>
>>>
>>> HTH
>>> Martin
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of [email protected]
>>> Sent: Dienstag, 27. April 2010 19:54
>>> To: [email protected]
>>> Subject: st: RE: RE: RE: comas in numbrers
>>>
>>> I mean thousands of observations,represented by numbers with a coma in
>>> them.It is a problem because it would take too much time to quote
>>> unquote each one of them.
>>> Unfortunately,in Eurostat we come across commas in numbers.
>>>
>>>
>>>
>>> On 27/4/2010, "Martin Weiss" <[email protected]> wrote:
>>>
>>>>
>>>><>
>>>>
>>>>What do you mean? There is no substantive difference to the situation with
>>> 5
>>>>obs, is there? Apart from the fact that you do not want to -input- them
>>> all,
>>>>but that problem is unrelated to the issue of commata...
>>>>
>>>>
>>>>HTH
>>>>Martin
>>>>
>>>>
>>>>-----Original Message-----
>>>>From: [email protected]
>>>>[mailto:[email protected]] On Behalf Of [email protected]
>>>>Sent: Dienstag, 27. April 2010 19:16
>>>>To: [email protected]
>>>>Subject: st: RE: RE: comas in numbrers
>>>>
>>>>What do I do when there are thousands of observations?
>>>>
>>>>
>>>>
>>>>010, "Martin Weiss" <[email protected]> wrote:
>>>>
>>>>>
>>>>><>
>>>>>
>>>>>*************
>>>>>
>>>>> drop _all
>>>>>
>>>>>input str10 a
>>>>>"2,345"
>>>>>"213"
>>>>>"34,456"
>>>>>"458"
>>>>>end
>>>>>
>>>>>destring a, generate(numerica) ignore(,)
>>>>>l
>>>>>*************
>>>>>
>>>>>
>>>>>HTH
>>>>>Martin
>>>>>
>>>>>-----Original Message-----
>>>>>From: [email protected]
>>>>>[mailto:[email protected]] On Behalf Of [email protected]
>>>>>Sent: Dienstag, 27. April 2010 17:39
>>>>>To: [email protected]
>>>>>Subject: st: RE: comas in numbrers
>>>>>
>>>>>Dear Statalist,
>>>>>
>>>>>how do you handle numbers with a coma to indicate thousands?
>>>>>I thank you before hand
>>>>>Victor M. Zammit
>>>>>
>>>>> drop _all
>>>>>
>>>>> input a
>>>>>
>>>>>a
>>>>>1. 2,345
>>>>>2. 213
>>>>>3. 34,456
>>>>>4. 458
>>>>>5. end
>>>>>
>>>>>..
>>>>>end of do-file
>>>>>
>>>>> l
>>>>>
>>>>>+-----+
>>>>>a
>>>>>-----
>>>>>1.    2
>>>>>2.  213
>>>>>3.   34
>>>>>4.  458
>>>>>+-----+
>>>>>
>>
>>
>>--
>>Steven Samuels
>>[email protected]
>>18 Cantine's Island
>>Saugerties NY 12477
>>USA
>>Voice: 845-246-0774
>>Fax:    206-202-4783
>>
>>
>>
>>--
>>Steven Samuels
>>[email protected]
>>18 Cantine's Island
>>Saugerties NY 12477
>>USA
>>Voice: 845-246-0774
>>Fax:    206-202-4783
>>*
>>*   For searches and help try:
>>*   http://www.stata.com/help.cgi?search
>>*   http://www.stata.com/support/statalist/faq
>>*   http://www.ats.ucla.edu/stat/stata/
>>
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

-- 
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: RE: comas in numbrers
  - From: "Nick Cox" <[email protected]>
- Re: st: RE: comas in numbrers
  - From: "[email protected]" <[email protected]>

References:
- st: RE: comas in numbrers
  - From: "[email protected]" <[email protected]>

Prev by Date: st: Mix short and long run constrains in a SVAR
Next by Date: Re: st: Memory requirements for factor variables
Previous by thread: st: RES: RE: comas in numbrers
Next by thread: Re: st: RE: comas in numbrers
Index(es):
- Date
- Thread