[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: match string variables
"Dimitriy V. Masterov" <email@example.com>
Re: st: match string variables
Thu, 14 Apr 2011 09:20:06 -0400
After standardizing the strings as Nick suggested, I think you might
try using the user written command -strgroup- from ssc. It does not
work on 64-bit Windows and you will still have to do a lot of manual
checking, but it may make your project a bit easier. I would try
merging (or perhaps nearmrging) the two datasets and using strgroup on
the _merge!=3 group.
On Wed, Apr 13, 2011 at 5:18 PM, "Lukas Bösch" <L.Boesch@gmx.de> wrote:
> Dear Stata Community.
> I am working with the CITES trade data and my aim is to analyze the export of 130 countries from 1990 to 2009 with a logistic model. CITES regulates the international trade in endangered species. The export data for Afghanistan, for example, looks like this:
> year taxon term unit country value
> 1990 Falco Cherrug live - AF 0
> 1991 Falco Cherrug live - AF 0
> 1992 Falco Cherrug live - AF 0
> 1993 Falco Cherrug live - AF 0
> In the case of Afghanistan, the data contains 180 rows, with nine different taxon. In some cases it contains up to 8000 rows with 2000 taxon. I know that I could also have shown the data in a wide form with much fewer rows...
> Now I want to create a variable “indigenous” with 1 if the exported taxon exists in the country or 0 if not. In order to get this I copied the species lists for all 130 countries from the CITES homepage, which looks like this (again for Afghanistan):
> Accipiter badius (Gmelin, 1788)
> Accipiter gentilis (Linnaeus, 1758)
> Accipiter nisus (Linnaeus, 1758)
> Acinonyx jubatus (Schreber, 1775)
> Acipenser nudiventris Lovetzky, 1828
> This list contains 131 taxon and I sorted it out in order to get rid of the years, the commas and so on.
> Accipiter badius
> Accipiter gentilis
> Accipiter nisus
> Acinonyx jubatus
> Acipenser nudiventris Lovetzky
> I have tried different variations of merge and joinby, I looked at the ado files _gsoundex, nearmrg, nmatch and reclink but I haven’t been able to create the “indigenous” variable so far.
> There are two major problems. The first one is that the taxon in the species list doesn’t always match exactly with the taxon in the export data. For example, Falco cherrug, in the export data is listed as Falco Cherrug Gray in the species list. The second problem is that the species list has a different number of observations than the export data and they dont fit logically together.
> I need something like: If the taxon from the export data is on the species list, then “indigenous” = 1, if the taxon from the export data is not on the species list, then “indigenous” = 0.
> Maybe someone has an idea and can give me a hint on how to do this.
> Thank you very much
> Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
> belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: