Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Looking up which observation has a particular value


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Looking up which observation has a particular value
Date   Fri, 8 Aug 2008 14:36:24 +0100

Sergiy has indicated the main way I know of to approach this. 

His second, and better, solution can be tweaked (and fixed for a typo): 

gen long obsn = _n 
su obsn if name == "Jones", meanonly 
local Jones_age = age[`r(min)'] 

In fact 

local Jones_age = age[r(min)]

will do fine as well. 

The tweaks are 

1. Using -long- is advisable for a large dataset. You want to know the
_exact_ observation number. 

2. Using -, meanonly- is more efficient. 

In addition note that after -summarize- checking that r(min) and r(max)
are identical is a check for uniqueness. 

Note that the question has been written up as an SJ Tip:

SJ-6-4  dm0025  . . . . . . . . . .  Stata tip 36: Which observations?
Erratum
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
J. Cox
        Q4/06   SJ 6(4):596                              (no commands)
        correction of example code for Stata tip 36

SJ-6-3  dm0025  . . . . . . . . . . . . . .  Stata tip 36: Which
observations?
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
J. Cox
        Q3/06   SJ 6(3):430--432                                 (no
commands)
        tip for identifying which observations satisfy some
        specified condition

Nick
[email protected] 

Sergiy Radyakin

this is rather destructive way to do this but if you need to do this
only once:

keep if name=="Jones"
local Jones_age=age[1]

Another way is:
gen obsn=_n
sum obsn if name=="Jones"
local Jones_age=age[`=r(min']

and there are lots of other ways to do the same.

If you want to be fast - write a loop in Mata or C - then you don't
need to create another variable.

Depending on the task, you might want to sort your data and then go
by(name) to do something.

If this doesn't help, tell us more what you are doing. VLOOKUP is
probably not here for a good reason.

Also see if -merge- can be useful. (eg. if you have a dataset with
names and another larger one with names and ages - you don't need to
look for each name individually, just merge the two datasets by name).


Brian Karfunkel

> If I have a dataset that looks like this:
>
>     +--------------------+
>     |    name   age |
>     |---------------------|
>  1. |   Smith    42 |
>  2. |  Kellog    28 |
>  3. | McCairn    19 |
>  4. |   Jones    22 |
>     +-------------------+
>
> I want to use a particular value of one variable to look up, and then
> store in a local (or scalar, if need be) the value of another variable
> iIN THAT SAME ROW. In other words, I want to do something like this
> (this is of course pseudocode; I understand that locals don't work
> like this):
>
> - local jones_age = age if name == "Jones"
>
> or maybe:
>
> - local jones_obs = (some function that returns the row number of the
> first observation for which name == "Jones")
> - local jones_age = age in `jones_obs'
>
> One thought I had was to either create a variable, say age_jones, and:
> - gen age_jones = age if name == "Jones"
>
> And then maybe -sum- age_jones, using the return code to find the age
> (which if the obs. is unique should be min, max, and mean), but since
> my data is actually many more than four observations long, this seems
> rather cumbersome. For those of you familiar with Excel, I'm looking
> for something similar to the VLOOKUP and HLOOKUP commands.
>
> Any suggestions? Is there a way to create locals from variable values?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index