Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: Analysis of text


From   Taavi Lai <[email protected]>
To   [email protected]
Subject   Re: st: Re: Analysis of text
Date   Sat, 09 Dec 2006 12:06:39 +0200

Dear Michael,

Thank you for the suggestions and questions.

The problem is that I don't have much more detail at the moment as the files aren't handed over to me yet and the background documentation is almost missing. So I posted my original question to get first impressions whether I have to turn to a programmer for a specialised program or could I try to resolve it on my own.

Said that I can still answer some of your questions. The file structure is described to me as "free text, like an essay". The files are computer generated so the structure should be uniform. There are multiple lines, at least one per each question. The number of questions per person varies between 300 and 500 (overall number of variables ~700). Missing variables are left empty, but I don't know if the question numbers are shown or the "empty" means that those are omitted as well.

Best regards,
Taavi

Michael Blasnik wrote:

Yes, it can be done. You should really post more details if you want more help. How are the files structured? Are there multiple lines in each? How many variables? How consistent is the layout across files? How are missing values coded? etc.. Why don't you just show a section of a file?

Without more info, I'd guess that you will most likely use the -file- command to read in the data (but infix may work, perhaps with -split-). Looping across files and accumulating results is fairly easy.

M Blasnik


----- Original Message ----- From: "Taavi Lai" <[email protected]>
To: <[email protected]>
Sent: Friday, December 08, 2006 1:56 PM
Subject: st: Analysis of text



Dear statalisters,

I have a set (~10 000) of text files, each containing questionnaire answers for one person. These files are structured as plain text files without any tabular form.
I�d like to loop through all these files and generate one datafile as a result. Say, "M1" represents a question number and the value for this question is thus the text/number between "M1" and "M2".

Could such a thing be done using Stata? Any suggestions and comments are welcome.

Best regards,
Taavi
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index