Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: RE: RE: Creating a group variable based on values in observations

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: RE: RE: Creating a group variable based on values in observations Date Sat, 21 May 2011 09:33:32 +0100

```Going for the interpretation that 1 ... 7 are days of the week, then
another way to do it would have been

gen signature = ""
bysort id (market) : replace signature = signature[_n-1] + string(market)

which creates signatures such as "135", which as Bert says can be
mapped to integers 1 up with -egen, group()-. Strictly, -group()- is
an -egen- function, not an option.

Use the -label- option and they remain intelligible.

Use -tabulate- and you get indicator variables.

Nick

On Sat, May 21, 2011 at 2:32 AM, Bert Jung <bjung59@gmail.com> wrote:
> Hi Chris,
>
> I am not sure if I understand your problem but maybe this helps:
>
> If you would like group IDs for all unique values within your
> "openmarkets" variable, you could use the "group" option of -egen-.  I
> suspect that requires that the values in the openmarkets variable must
> always have the same order since -egen- would consider "2-3-5" and
> "5-3-2" as two different groups but to you and me they're the same.
>
> If "openmarkets" is a string variable you could also remove the "-"
> with -subinstr- or with -destring openmarkets, ignore("-") gen(new)-.
> Since the values 1 to 5 seem to uniquely identify the week days (?)
> that would be similar to Sarah's suggestion.
>
> Cheers,
> Bert
>
>
> On Fri, May 20, 2011 at 7:49 PM, Sarah Edgington <sedging@ucla.edu> wrote:
>> Chris,
>> I think there are a number of different ways to solve this problem.
>> How many markets are you dealing with?  If it's fewer than 20 here's a
>> solution that gets you around the reshaping issue.
>> First, create a new market id where market 1=1, market 2=10, market 3=100,
>> etc.  Then sum this id within days.  That will give you a group variable
>> where each place represents a particular market (starting with market 1 on
>> the right) and a 1 or 0 tells you if the market was open or not.  Your day
>> one group id would be 11111.  Day two's would be 10110.
>>
>>        gen double mid=10^(market-1)
>>        bysort day: egen double margroup=total(mid)
>>
>> This only works well up to 19 markets because of precision issues.  In
>> principle, though, you could do it in any base and have everything add up to
>> create a unique group id.  So if you used 2 as your base instead of 10 (that
>> is, change the first line to  gen double mid=2^(market-1) ) you'd be able to
>> accommodate more markets.  Doing that you lose the ability to easily look at
>> it and read which markets are open straight from the group variable.  That
>> doesn't really matter for analytical purposes, though.
>>
>> -Sarah
>>
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker
>> Sent: Friday, May 20, 2011 3:05 PM
>> To: statalist@hsphsun2.harvard.edu
>> Subject: st: RE: Creating a group variable based on values in observations
>>
>> Hi,
>>
>> I think I have a solution. My data is a bit too big to do this all at once
>> (reshape gives a return code telling me productmarket takes on too many
>> values) but here is what works in case anyone runs into a similar
>> problem:
>>
>> . gen marketdup = market
>> . reshape wide market, i(date) j(marketdup) . egen openmarkets =
>> concat(market*), punc(_) . encode openmarkets, gen(groupid) . drop
>> openmarkets . reshape long . drop marketdup
>>
>> Chris
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker
>> Sent: Friday, May 20, 2011 9:36 PM
>> To: statalist@hsphsun2.harvard.edu
>> Subject: st: Creating a group variable based on values in observations
>>
>> Hi Statalist,
>>
>> I have a problem that's been troubling me for a while now. I have daily
>> prices for several products in several markets over time. I use the data to
>> measure price dispersion as the coefficient of variation of prices on a day
>> for a product. However, not every market is open on every day.
>> Systematic differences between the markets that are open (such as average
>> distance between markets, percent of markets of type A, etc.) could impact
>> price dispersion, so I need to control for this. For each product I would
>> like to create a variable that lists which group of markets was open on each
>> day (openmarkets in the example below). I could then encode this variable
>> and include i.groupid which controls for these differences.
>>
>> Example data for one of the products:
>>
>> day     market          openmarkets     groupid
>> 1       1               1-2-3-4-5       1
>> 1       2               1-2-3-4-5       1
>> 1       3               1-2-3-4-5       1
>> 1       4               1-2-3-4-5       1
>> 1       5               1-2-3-4-5       1
>> 2       2               2-3-5           2
>> 2       3               2-3-5           2
>> 2       5               2-3-5           2
>> 3       1               1-3-4-5         3
>> 3       3               1-3-4-5         3
>> 3       4               1-3-4-5         3
>> 3       5               1-3-4-5         3
>> 4       2               2-3-5           2
>> 4       3               2-3-5           2
>> 4       5               2-3-5           2
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```