Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: match string variables

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: match string variables
Date	Thu, 14 Apr 2011 22:13:52 +0100

I am not a biologist either, but high school science, surely?

But pleased you got there. Nick

On Thu, Apr 14, 2011 at 9:56 PM, "Lukas Bösch" <[email protected]> wrote:
> Thank you all
>
> Now that I restricted the species list to the first two words everything works fine. I really should have noticed that only the first two words are the taxonomic classification but i am not a biologist...
>
>
> -------- Original-Nachricht --------
>> Datum: Thu, 14 Apr 2011 09:20:06 -0400
>> Von: "Dimitriy V. Masterov" <[email protected]>
>> An: [email protected]
>> Betreff: Re: st: match string variables
>
>> Lukas,
>>
>> After standardizing the strings as Nick suggested, I think you might
>> try using the user written command -strgroup- from ssc. It does not
>> work on 64-bit Windows and you will still have to do a lot of manual
>> checking, but it may make your project a bit easier. I would try
>> merging (or perhaps nearmrging) the two datasets and using strgroup on
>> the _merge!=3 group.
>>
>> DVM
>>
>> On Wed, Apr 13, 2011 at 5:18 PM, "Lukas Bösch" <[email protected]> wrote:
>> > Dear Stata Community.
>> >
>> > I am working with the CITES trade data and my aim is to analyze the
>> export of 130 countries from 1990 to 2009 with a logistic model. CITES
>> regulates the international trade in endangered species. The export data for
>> Afghanistan, for example, looks like this:
>> >
>> > year       taxon          term       unit        country
>>         value
>> >
>> > 1990      Falco Cherrug   live         -            AF
>>            0
>> > 1991      Falco Cherrug   live         -            AF
>>            0
>> > 1992      Falco Cherrug   live         -            AF
>>            0
>> > 1993      Falco Cherrug   live         -            AF
>>            0
>> >
>> > In the case of Afghanistan, the data contains 180 rows, with nine
>> different taxon. In some cases it contains up to 8000 rows with 2000 taxon. I
>> know that I could also have shown the data in a wide form with much fewer
>> rows...
>> >
>> > Now I want to create a variable “indigenous” with 1 if the exported
>> taxon exists in the country or 0 if not. In order to get this I copied the
>> species lists for all 130 countries from the CITES homepage, which looks
>> like this (again for Afghanistan):
>> >
>> > Accipiter badius (Gmelin, 1788)
>> > Accipiter gentilis (Linnaeus, 1758)
>> > Accipiter nisus (Linnaeus, 1758)
>> > Acinonyx jubatus (Schreber, 1775)
>> > Acipenser nudiventris Lovetzky, 1828
>> >
>> > This list contains 131 taxon and I sorted it out in order to get rid of
>> the years, the commas and so on.
>> >
>> > Accipiter badius
>> > Accipiter gentilis
>> > Accipiter nisus
>> > Acinonyx jubatus
>> > Acipenser nudiventris Lovetzky
>> >
>> > I have tried different variations of merge and joinby, I looked at the
>> ado files _gsoundex, nearmrg, nmatch and reclink but I haven’t been able
>> to create the “indigenous” variable so far.
>> > There are two major problems. The first one is that the taxon in the
>> species list doesn’t always match exactly with the taxon in the export data.
>> For example, Falco cherrug, in the export data is listed as Falco Cherrug
>> Gray in the species list. The second problem is that the species list has a
>> different number of observations than the export data and they dont fit
>> logically together.
>> > I need something like: If the taxon from the export data is on the
>> species list, then “indigenous” = 1, if the taxon from the export data is
>> not on the species list, then “indigenous” = 0.
>> >
>> > Maybe someone has an idea and can give me a hint on how to do this.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: match string variables
  - From: "Lukas Bösch" <[email protected]>
- Re: st: match string variables
  - From: "Dimitriy V. Masterov" <[email protected]>
- Re: st: match string variables
  - From: "Lukas Bösch" <[email protected]>

Prev by Date: Re: st: Unique dyads
Next by Date: Re: st: add up variable / quantile
Previous by thread: Re: st: match string variables
Next by thread: st: -tabout- excess space between caption and first line
Index(es):
- Date
- Thread