Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identify observations within a variable

From	Estrella Gomez <[email protected]>
To	[email protected]
Subject	Re: st: Identify observations within a variable
Date	Mon, 22 Apr 2013 16:08:01 +0200

Hi, Nick

One additional question: I guess that

by artist: egen origin = mode (isrc_country)

where isrc_country is the country of origin, shows for each artist the
origin country that is more frequent. Is it possible to weight this
measure according to the number of total sales?

This is an example of what I mean: there are many observations in
which the number of sales is only one. Then if there are, say, 200
observations with one sale for Shakira originally recorded as from
Colombia, but one observation with 300 sales for Shakira originally
recorded as from USA, the egen command would interpret that Shakira is
from Colombia, when it is more reasonable to attribute an US origin in
this case.

Thanks a lot,
Estrella


2013/4/22 Nick Cox <[email protected]>:
> As I understand it you want to replace differing values by the most
> commonly occurring value. This is just the mode and the -mode()-
> function of the -egen- command should suffice.
>
> It has supported string arguments too since birth.
>
> That aside your question is an FAQ
>
> FAQ     . . . . . .  Listing observations in a group that differ on a variable
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         11/01   How do I list observations in a group that differ
>                 on a variable?
> http://www.stata.com/support/faqs/data-management/listing-observations-in-group/
>
> Although understanding the principles there will do no harm, my guess
> is that you don't need it given -egen-'s -mode().
>
> Nick
> [email protected]
>
>
> On 22 April 2013 11:26, Estrella Gomez <[email protected]> wrote:
>
>> I am cleanning a music dataset and one of the problems I have is that
>> there are many cases in which there two different origin countries for
>> the same artist. For instance, Shakira appears as from USA, Colombia,
>> Netherlands and UK.
>>
>> I want to assign one unique origin country to each artist based on the
>> number of records. I have 94,330,173 observations, so I can't do it
>> manually.
>>
>> My problem is that I don't know how to tell Stata that I want to see
>> those cases in which there are different countries for the same
>> artist. Both are string variables. Once I identify those
>> "wrong"observations,  I would select one unique country for each
>> artist according to the number of Total sales, which is numerical
>> variable
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Identify observations within a variable
  - From: Jeph Herrin <[email protected]>
- Re: st: Identify observations within a variable
  - From: Nick Cox <[email protected]>

References:
- st: Identify observations within a variable
  - From: Estrella Gomez <[email protected]>
- Re: st: Identify observations within a variable
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Nearstat competitor count
Next by Date: Re: st: Hierarchical CFA problem
Previous by thread: Re: st: Identify observations within a variable
Next by thread: Re: st: Identify observations within a variable
Index(es):
- Date
- Thread