Ronnie Babigumira wrote:
Yes I have used and continue to use -insheet- (mainly if I have tab
delimited data from excel) and specifying the string length is not a problem
with -insheet-. That said, there are situations where -infile- is the
appropriate command and of course -input- is invaluable when I want to input
a few entries. In this case I have to specify the length of string variables
Ada Ma wrote:
have you tried -insheet-??
<snip>
Is there something in newer versions of Stata that would save me from
guessing the length of strings when using -infile-
and -input-
</snip>
--------------------------------------------------------------------------------
For -infile-:
If your input file is space-delimited (that is, spaces aren't used to
represent missing values and there aren't internal spaces in strings), then
you can use -split- after -infix str v1 1-244 using <filename>- for record
lengths up to 244 bytes. You can then -destring- to restore numeric
variables.
In cases of multispace-delimited files (typically used, for example, where
there are internal spaces in strings), then I believe that you can specify a
multispace parsing string with -split-. (See first example below.) Be
aware that -infix- strips leading spaces at the beginning of the
record; -filefilter- can help to remedy that beforehand if needed.
In cases where the input file is messy, you can use Stata's conventional
string functions and new regular expression functions after -infix str v1
1-244 using <filename>-. I've just finished such a project (the data were
imbedded in prettily formatted .pdf files), and Stata's regular expression
functions were a godsend.
If the record length is longer than 244, then I believe that you can -infix
str v1 1-244 str v2 245-488 . . . using <filename>-, and proceed as above.
For -input-:
You don't actually need to guess string length in order to use -input-.
Just specify the maximum and away you go. (See second example below.)
Joseph Coveney
. set obs 2
obs was 0, now 2
. input str244 a
a
1. "a b c d"
2. "e f g h i"
. split a, generate(b) parse(" ")
variables created as string:
b1 b2 b3
. list b*, noobs
+----------------+
| b1 b2 b3 |
|----------------|
| a b c d |
| e f g h i |
+----------------+
. clear
. input str244 a byte b str244 c int d str244 e float f
a b
c d
e f
1. abc 3 def 200 ghi 1001.1
2. lmn -1 opq .m "" 10000
3. end
. compress
a was str244 now str3
c was str244 now str3
e was str244 now str3
. list, noobs
+-------------------------------------+
| a b c d e f |
|-------------------------------------|
| abc 3 def 200 ghi 1001.1 |
| lmn -1 opq .m 10000 |
+-------------------------------------+
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/