Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: tabstatmat question

From   Austin Nichols <>
Subject   Re: st: tabstatmat question
Date   Fri, 2 Sep 2011 13:32:10 -0400

Sergio <> :
Sounds like my earlier suggestion
egen mby=mean(num_typ3), by(zone wave)
gets you there in one line.

If you have a dataset for "years past" then you can -merge zone wave-
onto this year's data.  Or -joinby zone wave-.

If you have panel data, see -by- and Nick Cox's tutorials on -by- e.g.

You can also use a matrix (and -tabstatmat-), if you can ensure that
the row numbers correspond to a variable on your data created by
-egen, group-, but that is substantially trickier.

On Fri, Sep 2, 2011 at 1:14 PM, Alvarez,Sergio <> wrote:
> Hi Austin,
> Thanks for your response.  So what I'm doing is a site choice model of
> recreational fishing.  From the dataset I can tell what city/town people
> come from and what city/town they went fishing in.  I can also tell how many
> fish people caught.  For the site choice model, I need to create a series of
> alternatives or just other places where the person could have fished at but
> decided not to.  But I need some indication of the quality of the site that
> was not visited.  Since alternative fishing trips did not take place, I have
> no indication of how many fish the person could have caught if they had gone
> to place B, rather than to place A, which is where they actually went.
> So as an indication of quality I was going to use the mean number of fish
> caught in the site (zone) at that particular time of the year (wave) in
> years past.  That is why I wanted to create a matrix that would have the
> mean catch by zone and wave, something like this:
>        WAVE
> ZONE      1          2       ...
> 1      mean(1,1)  mean(1,2)
> 2      mean(2,1)  mean(2,2)
> ...
> Which was my original question.  I would use that matrix to input the mean
> catch for the alternatives that did not happen after I created them.  Now if
> the matrix looked like the example above, I thought I could use:
> gen meancatch = matrix[zone,wave]
> I was hoping that this line of code would look up the wave and zone of each
> observation and input the value from the matrix that corresponded to each
> observation.  So I looked around and found -tabstatmat- from SSC, and tried
> it, using the code you gave yesterday:
> egen byv=group(zone wave), lab
> tabstat num_typ3, stat(mean) by(byv) save
> tabstatmat TABLE
> And this created the matrix with the values, and looks like this:
> TABLE[414,1]
>             num_typ3
>  1†1:mean  1.9822335
>  1†2:mean  2.6614173
>  1†3:mean  2.7150396
>  1†4:mean  3.3340782
>  1†5:mean  2.8161094
>  1†6:mean  1.1767857
>  2†1:mean  1.5857143
>  2†2:mean  2.1863208
>  2†3:mean   2.542777
>  2†4:mean  1.8849432
>  2†5:mean  1.7281553
>  2†6:mean  1.4927536
>  3†1:mean      1.875
>   .....
> There's 85 sites with 6 waves a piece.
> The original dataset has about 70,000 observations, so after creating 84
> alternatives for each I get about 6,000,000 observations.  I already know
> how to do this using -reshape- and the distance to the alternative sites,
> which I already put in the dataset.  And what I need is to have the
> indicator of quality, or mean catch for each alternative site during the
> time period that the person actually went fishing.  Then I will be able to
> run -clogit- or a similar procedure.
> I hope this makes sense.  I'm new both to stata and to choice models, so
> this has been a pretty confusing and slow process for me.
> I really appreciate the help.
> Sergio
> On Fri, 2 Sep 2011 12:45:05 -0400, Austin Nichols wrote:
>> Sergio <>:
>> Did you read my response?
>> Look at the matrix; there is one column, so your references to row and
>> column make no sense.
>> You could make another matrix with values of byv corresponding to zone
>> and wave, noting that you must have these be integers counting from 1
>> up for row and column numbers to correspond to what you seem to want.
>> But why?  What would be the point of this?
>> On Fri, Sep 2, 2011 at 12:35 PM, Alvarez,Sergio <> wrote:
>>> Sorry about ambiguity.
>>> So I used the mean by group code to create the matrix that would store
>>> the
>>> mean values for each group, using:
>>> egen byv=group(zone wave), lab
>>> tabstat num_typ3, stat(mean) by(byv) save
>>> tabstatmat TABLE
>>> which gives me a matrix, or rather a vector, with all the values I need.
>>>  The first few lines of the matrix in the output screen look like this:
>>> TABLE[414,1]
>>>             num_typ3
>>>  1†1:mean  1.9822335
>>>  1†2:mean  2.6614173
>>>  1†3:mean  2.7150396
>>>  1†4:mean  3.3340782
>>>  1†5:mean  2.8161094
>>>  1†6:mean  1.1767857
>>>  2†1:mean  1.5857143
>>>  2†2:mean  2.1863208
>>>  2†3:mean   2.542777
>>>  2†4:mean  1.8849432
>>> Now what I want to do is use -gen- or -egen- to create a variable that
>>> would
>>> look up the zone and wave of the corresponding observation from the
>>> matrix
>>> and insert the correct value in there.  So I tried:
>>> gen meancatch = TABLE[zone,wave]
>>> and this gives the correct values for all observations with wave = 1, but
>>> creates missing values on the rest of the observations.  I also tried:
>>> gen meancatch = TABLE[byv,num_typ3]
>>> and this gives me the correct value in some of the observations, but
>>> mostly
>>> missing values in the others.
>>> So I must be doing something wrong, but can't figure out what.  I guess
>>> the
>>> question is how to call the row and column numbers from the TABLE matrix?
>>> Thanks again,
>>> Sergio
>>> On Fri, 2 Sep 2011 12:08:31 -0400, Austin Nichols wrote:
>>>> Sergio <> :
>>>> Now I have no idea what you are trying to do. For the mean by group,
>>>> egen mby=mean(num_typ3), by(zone wave)
>>>> but you are referring to (probably) nonexistent row and column numbers
>>>> of a matrix in your example.
>>>> On Fri, Sep 2, 2011 at 10:42 AM, Alvarez,Sergio <>
>>>> wrote:
>>>>> Thanks Austin and Nick for your help.  I used what Austin suggested
>>>>> (which
>>>>> is what Nick also suggested) and it worked. However, when I try to
>>>>> create
>>>>> the variable that contains the mean by group it works for some
>>>>> observations,
>>>>> but missing values are created for most of them.  I tried both:
>>>>> gen meancatch = TABLE[zone,wave]
>>>>> and
>>>>> gen meancatch = TABLE[byv,num_typ3]
>>>>> For the first line of code, it creates the correct value for all
>>>>> observations where wave = 1, but not for any others.  The second line
>>>>> creates missing values at random (as far as I can tell).
>>>>> I'd appreciate any tips.

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index