"Davide Cantoni" <davide.cantoni@gmail.com>

statalist@hsphsun2.harvard.edu

Re: -by:- is sweet [was: Re: Re: st: Creating a new variable with information from other observations]

Mon, 19 May 2008 19:05:15 +0200

Thank you very much, Nick. This is elegant indeed, and congratulations for retrieving that piece of high poetry by Sta Ta. I'm wondering, though, about your statement: > A more cautious approach slaps an extra condition on the second statement > > & is_capital[_N] why would you need this? To make sure that the -bysort countryid (is_capital)- command works fine, and puts capital cities at the end of the block indeed? But if I do bysort countryid (is_capital) : gen latitude_capital = latitude[_N] & is_capital[_N] I obtain latitude_capital equal to 1 for all observations, instead of the desired result (which I get if I do not add "& is_capital[_N]"). Davide 2008/5/19 n j cox <n.j.cox@durham.ac.uk>: > . > > A more mundane solution uses -by:-. > > gen is_capital = capitalid == cityid > bysort countryid (is_capital) : gen latitude_capital = latitude[_N] > > The indicator (dummy) is 1 when a city is the capital and 0 otherwise. > > If you sort each capital city to the end of the block of observations for a > country, then you can just pick up its value for the new variable. > > A more cautious approach slaps an extra condition on the second statement > > & is_capital[_N] > > So, no loops necessary at all. Or, more precisely, Stata does the loop > required automatically as a consequence of -by:-. > > The following poem [by one Sta Ta?] fell into my hands recently. > > Something to repeat? > Seek a method neat. > Loops are lovely, > -by:- is sweet. > > The style leaves much to be desired, but the content is good. > > Nick > n.j.cox@durham.ac.uk > > Teresio Poggio > > from your dataset I'd build a just capitals dataset: > - select just the capitals (drop if cityid !=capitalid) > - in the new dataset keep just capitalid and latitude > - rename latitude into latitude_capital > - sort the data by capitalid and save it > > then open you original data set and sort it by capitalid, > merge it with the new "just capital dataset" using capitalid as a key > and the option uniqmaster > (help merge for details) > > Davide Cantoni > >> I am having a rather intricate problem in creating a new variable in a >> panel dataset, and I appreciate any help you could offer. I hope the >> problem can potentially be of general interest. >> >> I have a panel dataset of cities and their characteristics in >> different countries. I know the latitude of each one of these cities, >> but now I want to create an additional variable reflecting the >> latitude of the capital city of the country a given city lies in. So >> for example: for the cities of New York, Chicago, etc., I want this >> new variable to contain the latitude of Washington, DC. >> >> Here is a description of the dataset's structure: it is a panel in >> long form, with cities in different countries, observed over different >> years. Each city has a unique numeric identifier, "cityid". Then there >> is a country identifier, called "countryid". Finally, there is a >> variable that repeats the capital city's cityid for each city in a >> given country, "capitalid". For instance, if the cityid of London was >> 135, all cities in the dataset that are in the UK would get a value of >> 135 in the variable "capitalid". Finally, there is a variable called >> "latitude" that refelcts the latitude of each city. >> >> How would I now proceed to create this new variable, call it >> "latitude_capital", by using the variables above? >> >> Basically, the problem I'm having is >> - tell stata to look up for each city its capitalid >> - browse the dataset until you find a city that has the cityid equal >> to this capitalid >> - find out the latitude of this capital city >> - go back to the original city and replace "latitude_capital" with the >> latitude you've just retrieved >> >> The additional problem I encounter while trying to construct something >> with "foreach..." (that, at least, is what I was trying so far) is >> that the values that the capitalid variable takes are of course not a >> clean numlist (like "1(1)100"), but rather a sequence of numbers >> without any regularity, such as 11 12 50 54 60 131... and so on. > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

