Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identify observations within a variable


From   Jeph Herrin <stata@spandrel.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Identify observations within a variable
Date   Mon, 22 Apr 2013 10:40:30 -0400

Sounds like what you want is:

 by artist isrc_country : egen country_sales=total(sales)
 by artist (country_sales) : gen origin=isrc_country[_N]

this will assign the country that has the most sales, not the most observations, as the -origin-.

hth,
Jeph


On 4/22/2013 10:08 AM, Estrella Gomez wrote:
Hi, Nick

One additional question: I guess that

by artist: egen origin = mode (isrc_country)

where isrc_country is the country of origin, shows for each artist the
origin country that is more frequent. Is it possible to weight this
measure according to the number of total sales?

This is an example of what I mean: there are many observations in
which the number of sales is only one. Then if there are, say, 200
observations with one sale for Shakira originally recorded as from
Colombia, but one observation with 300 sales for Shakira originally
recorded as from USA, the egen command would interpret that Shakira is
from Colombia, when it is more reasonable to attribute an US origin in
this case.

Thanks a lot,
Estrella


2013/4/22 Nick Cox <njcoxstata@gmail.com>:
As I understand it you want to replace differing values by the most
commonly occurring value. This is just the mode and the -mode()-
function of the -egen- command should suffice.

It has supported string arguments too since birth.

That aside your question is an FAQ

FAQ     . . . . . .  Listing observations in a group that differ on a variable
         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
         11/01   How do I list observations in a group that differ
                 on a variable?
http://www.stata.com/support/faqs/data-management/listing-observations-in-group/

Although understanding the principles there will do no harm, my guess
is that you don't need it given -egen-'s -mode().

Nick
njcoxstata@gmail.com


On 22 April 2013 11:26, Estrella Gomez <estrellastata@gmail.com> wrote:

I am cleanning a music dataset and one of the problems I have is that
there are many cases in which there two different origin countries for
the same artist. For instance, Shakira appears as from USA, Colombia,
Netherlands and UK.

I want to assign one unique origin country to each artist based on the
number of records. I have 94,330,173 observations, so I can't do it
manually.

My problem is that I don't know how to tell Stata that I want to see
those cases in which there are different countries for the same
artist. Both are string variables. Once I identify those
"wrong"observations,  I would select one unique country for each
artist according to the number of Total sales, which is numerical
variable
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index