You can now use Stata’s string variables to hold exceedingly long strings, even the contents of files or even binary files.
Say we have data on 500 patients stored in our Stata dataset patients.dta.
We have doctor notes stored in 500 other files with names like notes17213.xyz, notes18417.xyz, and so on. The number in the filename is the patient’s ID.
We have the variable patid containing the patient IDs.
We can read all 500 files into our dataset:
Just as easily, we could re-create all 500 files.
We want to know whether the phrase “Diabetes Mellitus Type 1” appears in the doctor’s notes, which the doctor would have written as T1DM. We could type
to create variable t1dm, which flags whether the note is in the file.
We could also type
to list the variables sugar level, patient age, and patient weight wherever the doctor recorded Diabetes Mellitus Type 1.
We could even type
to run a regression of sugar level on age and weight.
Now for some details ...
A string is a sequence of characters:
Samuel Smith California U.K.
Strings can be stored in Stata datasets as string variables.
|make str18 %-18s Make and Model|
The variable make is a str18 variable. It can contain strings up to 18 characters long. The strings are not all 18-characters long:
All str18 means is that the variable cannot hold a string longer than 18 characters. Even that is unimportant because Stata automatically promotes str# variables to be longer when required:
The string-variable storage types are str1, str2, ..., str2045, and strL.
Think of it like this: after 2,045 comes L. The L stands for long. strL is pronounced sturl.
strL variables work just like str# variables:
|mymake strL %9s|
strL variables can be exceedingly long, but that is not required.
We can replace strL values just as we can replace str# values:
We can use string functions with strL variables just as we can with str# variables:
|mymake len first5|
|1.||Mercedes Benz Gullwing 22 Merce|
|2.||AMC Pacer 9 AMC P|
We can even make tabulations:
|brand||Freq. Percent Cum.|
|AMC||2 2.70 2.70|
|Audi||2 2.70 5.41|
|BMW||1 1.35 6.76|
|VW||4 5.41 98.65|
|Volvo||1 1.35 100.00|
strLs can hold binary strings. A binary string is, technically speaking, any string that contains binary 0. Here is a silly example:
list displays binary zeros as \0.
str# variables cannot contain binary 0. strL variables can.
Read all about long strings and BLOBs in the manual entry.
See New in Stata 17 to learn about what was added in Stata 17.