Chris Parker <cparker.phd2007@london.edu>

statalist@hsphsun2.harvard.edu |

Re: st: RE: RE: Creating a group variable based on values in observations

Sat, 21 May 2011 10:29:42 +0100

Thank you everyone for the responses. I've learned some interesting ways to generate these types of variables! Chris On Sat, May 21, 2011 at 9:33 AM, Nick Cox <njcoxstata@gmail.com> wrote: > > Going for the interpretation that 1 ... 7 are days of the week, then > another way to do it would have been > > gen signature = "" > bysort id (market) : replace signature = signature[_n-1] + string(market) > > which creates signatures such as "135", which as Bert says can be > mapped to integers 1 up with -egen, group()-. Strictly, -group()- is > an -egen- function, not an option. > > Use the -label- option and they remain intelligible. > > Use -tabulate- and you get indicator variables. > > Nick > > On Sat, May 21, 2011 at 2:32 AM, Bert Jung <bjung59@gmail.com> wrote: > > Hi Chris, > > > > I am not sure if I understand your problem but maybe this helps: > > > > If you would like group IDs for all unique values within your > > "openmarkets" variable, you could use the "group" option of -egen-. I > > suspect that requires that the values in the openmarkets variable must > > always have the same order since -egen- would consider "2-3-5" and > > "5-3-2" as two different groups but to you and me they're the same. > > > > If "openmarkets" is a string variable you could also remove the "-" > > with -subinstr- or with -destring openmarkets, ignore("-") gen(new)-. > > Since the values 1 to 5 seem to uniquely identify the week days (?) > > that would be similar to Sarah's suggestion. > > > > Cheers, > > Bert > > > > > > On Fri, May 20, 2011 at 7:49 PM, Sarah Edgington <sedging@ucla.edu> wrote: > >> Chris, > >> I think there are a number of different ways to solve this problem. > >> How many markets are you dealing with? If it's fewer than 20 here's a > >> solution that gets you around the reshaping issue. > >> First, create a new market id where market 1=1, market 2=10, market 3=100, > >> etc. Then sum this id within days. That will give you a group variable > >> where each place represents a particular market (starting with market 1 on > >> the right) and a 1 or 0 tells you if the market was open or not. Your day > >> one group id would be 11111. Day two's would be 10110. > >> > >> gen double mid=10^(market-1) > >> bysort day: egen double margroup=total(mid) > >> > >> This only works well up to 19 markets because of precision issues. In > >> principle, though, you could do it in any base and have everything add up to > >> create a unique group id. So if you used 2 as your base instead of 10 (that > >> is, change the first line to gen double mid=2^(market-1) ) you'd be able to > >> accommodate more markets. Doing that you lose the ability to easily look at > >> it and read which markets are open straight from the group variable. That > >> doesn't really matter for analytical purposes, though. > >> > >> -Sarah > >> > >> > >> -----Original Message----- > >> From: owner-statalist@hsphsun2.harvard.edu > >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker > >> Sent: Friday, May 20, 2011 3:05 PM > >> To: statalist@hsphsun2.harvard.edu > >> Subject: st: RE: Creating a group variable based on values in observations > >> > >> Hi, > >> > >> I think I have a solution. My data is a bit too big to do this all at once > >> (reshape gives a return code telling me productmarket takes on too many > >> values) but here is what works in case anyone runs into a similar > >> problem: > >> > >> . gen marketdup = market > >> . reshape wide market, i(date) j(marketdup) . egen openmarkets = > >> concat(market*), punc(_) . encode openmarkets, gen(groupid) . drop > >> openmarkets . reshape long . drop marketdup > >> > >> Chris > >> > >> -----Original Message----- > >> From: owner-statalist@hsphsun2.harvard.edu > >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker > >> Sent: Friday, May 20, 2011 9:36 PM > >> To: statalist@hsphsun2.harvard.edu > >> Subject: st: Creating a group variable based on values in observations > >> > >> Hi Statalist, > >> > >> I have a problem that's been troubling me for a while now. I have daily > >> prices for several products in several markets over time. I use the data to > >> measure price dispersion as the coefficient of variation of prices on a day > >> for a product. However, not every market is open on every day. > >> Systematic differences between the markets that are open (such as average > >> distance between markets, percent of markets of type A, etc.) could impact > >> price dispersion, so I need to control for this. For each product I would > >> like to create a variable that lists which group of markets was open on each > >> day (openmarkets in the example below). I could then encode this variable > >> and include i.groupid which controls for these differences. > >> > >> Example data for one of the products: > >> > >> day market openmarkets groupid > >> 1 1 1-2-3-4-5 1 > >> 1 2 1-2-3-4-5 1 > >> 1 3 1-2-3-4-5 1 > >> 1 4 1-2-3-4-5 1 > >> 1 5 1-2-3-4-5 1 > >> 2 2 2-3-5 2 > >> 2 3 2-3-5 2 > >> 2 5 2-3-5 2 > >> 3 1 1-3-4-5 3 > >> 3 3 1-3-4-5 3 > >> 3 4 1-3-4-5 3 > >> 3 5 1-3-4-5 3 > >> 4 2 2-3-5 2 > >> 4 3 2-3-5 2 > >> 4 5 2-3-5 2 > >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > ______________________________________________________________________ > > This email has been scanned by the MessageLabs Email Security System > on behalf of the London Business School community. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

