Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: data formatting question


From   Michael Horowitz <[email protected]>
To   [email protected]
Subject   RE: st: data formatting question
Date   Mon, 13 Feb 2006 16:42:51 -0500 (EST)

One more question.  Suppose you have a few duplicate observations that are
errors.  To take the example I used in my previous email, the data is set
up as such:

> > > Country Number  year    ethnicity       ethnicpop
> > > 10              1930    Caucasion       1,000,000
> > > 10              1930    Hispanic        50,000
> > > 10              1931    Caucasion       1,000,100
> > > 10              1931    Hispanic        51,000
> > > 11              1931    Asia            10,000

Now suppose there are multiple observations for Caucasians for a given
year but with slightly different ethnic population totals.  I wish to
systematically keep the one with the higher number and drop the other one.
However, since the observations are not technical "duplicates" given that
the ethnic population scores are different, I am having trouble using the
"duplicate" command.  Does anyone have any ideas?

Thank you again for your help.

Michael


On Mon, 13 Feb 2006, Nick Cox wrote:

> No such index is needed. In fact as recommended here
> it will almost always give you an incorrect answer.
> To see why, note that after
>
> . sort country year
>
> different values of -ethnicity- will
> be sorted arbitrarily. Thus the same
> ethnicity index, as defined here, will often be
> assigned to different values of ethnicity,
> and vice versa.
>
> It is not what you ask for, quite, but a
> reshape using
>
> reshape wide ethnicpop , i(number year) j(ethnicity) string
>
> is one possibility. My guess is that it will be
> manageable than what you ask for.
>
> Nick
> [email protected]
>
> Radu Ban
>
> > see -help reshape-. you need first to generate an index at
> > country-year level
> >
> > bys country year: gen ethnic_index = _n
> >
> > reshape wide ethnicity ethnicpop, i(country year) j(ethnic_index)
> >
> > cheers,
> > -radu
> >
> > 2006/2/13, Michael Horowitz <[email protected]>:
> > > To whom it may concern:
> > >
> > > I have a dataset I was wondering if people might have a fix for.
> > >
> > > My data measures various information (ethnicity especially) of
> > > countries.  The way the data is currently set up it has
> > multiple entries
> > > per country per year depending on the background of the
> > country.  This
> > > means that if there are 2 ethnic groups in a country with
> > significant
> > > populations, there are 2 entries per year as follows (these
> > numbers are
> > > made up to illustrate the situation).  There can also be
> > more than 2,
> > > etc., and it can change depending on the population in a given year:
> > >
> > > Country Number  year    ethnicity       ethnicpop
> > > 10              1930    Caucasion       1,000,000
> > > 10              1930    Hispanic        50,000
> > > 10              1931    Caucasion       1,000,100
> > > 10              1931    Hispanic        51,000
> > > 11              1931    Asia            10,000
> > >
> > >
> > > I want to set up the data so there is only one entry per
> > country per year,
> > > as follows:
> > >
> > > Country Number  year    ethnic1    ethnic2   ethpop1    ethpop2
> > > 10              1930    Caucasian  Hispanic  1,000,000  50,000
> > >
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*******************************************************************************

Michael Horowitz
83 Beacon St., Apt. 3
Somerville, MA 02143
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index