Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: match string variables

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: match string variables
Date	Thu, 14 Apr 2011 08:01:57 +0100

I should have said binominal.

http://en.wikipedia.org/wiki/Binomial_nomenclature

Wikipedia is great (except when it isn't).

I glanced at the Wikipedia entry on Stata, which quotes my words from
the Statalist FAQ

The correct English pronunciation of "Stata" "must remain a mystery;"

Setting aside the bizarre punctuation, it is nice to be taken as an
authority even when you are making a joke.

Nick

On Thu, Apr 14, 2011 at 7:34 AM, Nick Cox <[email protected]> wrote:
> I don't think these problems ever have clean, simple solutions. You
> just need to keep plugging away until -merge- works.
>
> One specific [pun intended] detail is to standardise binomials
> [taxonomic sense] of the form Genusname speciesname so that
> speciesname always begins with lower case, as in Homo sapiens, Homo
> sasuser and Homo statauser.
>
> Standardising to that in your main dataset is
>
> gen taxon2 = upper(substr(taxon, 1,1)) + lower(substr(taxon, 2, .))
>
> Another standardisation is to generic and specific names only in your
> species list as in
>
> gen taxon2 = word(taxon,1) + " " + word(taxon, 2)
>
> In other words, you can work at the corresponding variables in each
> dataset until you have a good chance of a successful -merge-.
>
> -merge- does not depend on files having the same number of observations!
>
> More general advice is contained in
>
> SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string variables
>        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin and E. Poen
>        Q3/08   SJ 8(3):444--445                                 (no commands)
>        tip on how to clean up user-entered string variables
>
> Nick
>
> P.S. 1 taxon, 2 taxa.
>
> On Wed, Apr 13, 2011 at 10:18 PM, "Lukas Bösch" <[email protected]> wrote:
>>
>> I am working with the CITES trade data and my aim is to analyze the export of 130 countries from 1990 to 2009 with a logistic model. CITES regulates the international trade in endangered species. The export data for Afghanistan, for example, looks like this:
>>
>> year       taxon          term       unit        country         value
>>
>> 1990      Falco Cherrug   live         -            AF              0
>> 1991      Falco Cherrug   live         -            AF              0
>> 1992      Falco Cherrug   live         -            AF              0
>> 1993      Falco Cherrug   live         -            AF              0
>>
>> In the case of Afghanistan, the data contains 180 rows, with nine different taxon. In some cases it contains up to 8000 rows with 2000 taxon. I know that I could also have shown the data in a wide form with much fewer rows...
>>
>> Now I want to create a variable “indigenous” with 1 if the exported taxon exists in the country or 0 if not. In order to get this I copied the species lists for all 130 countries from the CITES homepage, which looks like this (again for Afghanistan):
>>
>> Accipiter badius (Gmelin, 1788)
>> Accipiter gentilis (Linnaeus, 1758)
>> Accipiter nisus (Linnaeus, 1758)
>> Acinonyx jubatus (Schreber, 1775)
>> Acipenser nudiventris Lovetzky, 1828
>>
>> This list contains 131 taxon and I sorted it out in order to get rid of the years, the commas and so on.
>>
>> Accipiter badius
>> Accipiter gentilis
>> Accipiter nisus
>> Acinonyx jubatus
>> Acipenser nudiventris Lovetzky
>>
>> I have tried different variations of merge and joinby, I looked at the ado files _gsoundex, nearmrg, nmatch and reclink but I haven’t been able to create the “indigenous” variable so far.
>> There are two major problems. The first one is that the taxon in the species list doesn’t always match exactly with the taxon in the export data. For example, Falco cherrug, in the export data is listed as Falco Cherrug Gray in the species list. The second problem is that the species list has a different number of observations than the export data and they dont fit logically together.
>> I need something like: If the taxon from the export data is on the species list, then “indigenous” = 1, if the taxon from the export data is not on the species list, then “indigenous” = 0.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: match string variables
  - From: "Lukas Bösch" <[email protected]>
- Re: st: match string variables
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: list coefficients
Next by Date: Re: st: Problem with the time
Previous by thread: Re: st: match string variables
Next by thread: Re: st: match string variables
Index(es):
- Date
- Thread