Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: neighbourhood size


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: neighbourhood size
Date   Wed, 24 Jul 2013 22:12:47 -0400

Does not sound like a big deal, except that Stata does not work with
unicode. However even in English you will need to decide how to deal
with ambiguities in the text. Suppose your dictionary is greek
letters: alpha, beta,... you encounter 'opsilon' in the text, do you
increment the frequency of 'epsilon'? 'upsilon'? both (according to
your definition)? or none? (this is not a valid word but a typo) Once
you resolve that: for i=1 { for j=1 {...}} A couple of loops should
suffice. Now that can be slow, so then you investigate what special is
known about your word list, what special is known about your text, and
what is acceptable in terms of performance. A lot depends on the size
of the corpus. If you say it is a page of google search results - we
are ok. If it is the contents of JSTOR for the last 20 years, we might
be in trouble. What is the size of the word list? is it two three ten
keywords? or is it the contents of a novel?

Why is Stata picked as a tool for solving this problem I wonder?
http://stackoverflow.com/questions/4520876/counting-the-frequency-of-specific-words-in-text-file

Sergiy


On Wed, Jul 24, 2013 at 8:45 PM, Mehdi Bakhtiar <mbakhtiar@gmail.com> wrote:
>>> Dear Experts,
>>> I have a question about how to use stata to calculate neighbourhood size for a list of my words. Basically, I have my own word list and a corpus.   I need to tell stata to count the number of neighbours of each word in my wordlist (words with one letter variation)  out of my corpus. Also, I need to mention that my words are in Persian script.
>>> In advance many thanks for any attention and support,
>>> Kind regards,
>>> Mehdi Bakhtiar
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index