Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: RE: RE: Creating a group variable based on values in observations

 From Chris Parker To statalist@hsphsun2.harvard.edu Subject Re: st: RE: RE: Creating a group variable based on values in observations Date Sat, 21 May 2011 10:29:42 +0100

```Thank you everyone for the responses. I've learned some interesting
ways to generate these types of variables!

Chris

On Sat, May 21, 2011 at 9:33 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>
> Going for the interpretation that 1 ... 7 are days of the week, then
> another way to do it would have been
>
> gen signature = ""
> bysort id (market) : replace signature = signature[_n-1] + string(market)
>
> which creates signatures such as "135", which as Bert says can be
> mapped to integers 1 up with -egen, group()-. Strictly, -group()- is
> an -egen- function, not an option.
>
> Use the -label- option and they remain intelligible.
>
> Use -tabulate- and you get indicator variables.
>
> Nick
>
> On Sat, May 21, 2011 at 2:32 AM, Bert Jung <bjung59@gmail.com> wrote:
> > Hi Chris,
> >
> > I am not sure if I understand your problem but maybe this helps:
> >
> > If you would like group IDs for all unique values within your
> > "openmarkets" variable, you could use the "group" option of -egen-.  I
> > suspect that requires that the values in the openmarkets variable must
> > always have the same order since -egen- would consider "2-3-5" and
> > "5-3-2" as two different groups but to you and me they're the same.
> >
> > If "openmarkets" is a string variable you could also remove the "-"
> > with -subinstr- or with -destring openmarkets, ignore("-") gen(new)-.
> > Since the values 1 to 5 seem to uniquely identify the week days (?)
> > that would be similar to Sarah's suggestion.
> >
> > Cheers,
> > Bert
> >
> >
> > On Fri, May 20, 2011 at 7:49 PM, Sarah Edgington <sedging@ucla.edu> wrote:
> >> Chris,
> >> I think there are a number of different ways to solve this problem.
> >> How many markets are you dealing with?  If it's fewer than 20 here's a
> >> solution that gets you around the reshaping issue.
> >> First, create a new market id where market 1=1, market 2=10, market 3=100,
> >> etc.  Then sum this id within days.  That will give you a group variable
> >> where each place represents a particular market (starting with market 1 on
> >> the right) and a 1 or 0 tells you if the market was open or not.  Your day
> >> one group id would be 11111.  Day two's would be 10110.
> >>
> >>        gen double mid=10^(market-1)
> >>        bysort day: egen double margroup=total(mid)
> >>
> >> This only works well up to 19 markets because of precision issues.  In
> >> principle, though, you could do it in any base and have everything add up to
> >> create a unique group id.  So if you used 2 as your base instead of 10 (that
> >> is, change the first line to  gen double mid=2^(market-1) ) you'd be able to
> >> accommodate more markets.  Doing that you lose the ability to easily look at
> >> it and read which markets are open straight from the group variable.  That
> >> doesn't really matter for analytical purposes, though.
> >>
> >> -Sarah
> >>
> >>
> >> -----Original Message-----
> >> From: owner-statalist@hsphsun2.harvard.edu
> >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker
> >> Sent: Friday, May 20, 2011 3:05 PM
> >> To: statalist@hsphsun2.harvard.edu
> >> Subject: st: RE: Creating a group variable based on values in observations
> >>
> >> Hi,
> >>
> >> I think I have a solution. My data is a bit too big to do this all at once
> >> (reshape gives a return code telling me productmarket takes on too many
> >> values) but here is what works in case anyone runs into a similar
> >> problem:
> >>
> >> . gen marketdup = market
> >> . reshape wide market, i(date) j(marketdup) . egen openmarkets =
> >> concat(market*), punc(_) . encode openmarkets, gen(groupid) . drop
> >> openmarkets . reshape long . drop marketdup
> >>
> >> Chris
> >>
> >> -----Original Message-----
> >> From: owner-statalist@hsphsun2.harvard.edu
> >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker
> >> Sent: Friday, May 20, 2011 9:36 PM
> >> To: statalist@hsphsun2.harvard.edu
> >> Subject: st: Creating a group variable based on values in observations
> >>
> >> Hi Statalist,
> >>
> >> I have a problem that's been troubling me for a while now. I have daily
> >> prices for several products in several markets over time. I use the data to
> >> measure price dispersion as the coefficient of variation of prices on a day
> >> for a product. However, not every market is open on every day.
> >> Systematic differences between the markets that are open (such as average
> >> distance between markets, percent of markets of type A, etc.) could impact
> >> price dispersion, so I need to control for this. For each product I would
> >> like to create a variable that lists which group of markets was open on each
> >> day (openmarkets in the example below). I could then encode this variable
> >> and include i.groupid which controls for these differences.
> >>
> >> Example data for one of the products:
> >>
> >> day     market          openmarkets     groupid
> >> 1       1               1-2-3-4-5       1
> >> 1       2               1-2-3-4-5       1
> >> 1       3               1-2-3-4-5       1
> >> 1       4               1-2-3-4-5       1
> >> 1       5               1-2-3-4-5       1
> >> 2       2               2-3-5           2
> >> 2       3               2-3-5           2
> >> 2       5               2-3-5           2
> >> 3       1               1-3-4-5         3
> >> 3       3               1-3-4-5         3
> >> 3       4               1-3-4-5         3
> >> 3       5               1-3-4-5         3
> >> 4       2               2-3-5           2
> >> 4       3               2-3-5           2
> >> 4       5               2-3-5           2
> >>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> ______________________________________________________________________
>
> This email has been scanned by the MessageLabs Email Security System
> on behalf of the London Business School community.