Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: tabstatmat question

From   "Alvarez,Sergio" <[email protected]>
To   <[email protected]>
Subject   Re: st: tabstatmat question
Date   Fri, 02 Sep 2011 13:14:40 -0400

Hi Austin,

Thanks for your response. So what I'm doing is a site choice model of recreational fishing. From the dataset I can tell what city/town people come from and what city/town they went fishing in. I can also tell how many fish people caught. For the site choice model, I need to create a series of alternatives or just other places where the person could have fished at but decided not to. But I need some indication of the quality of the site that was not visited. Since alternative fishing trips did not take place, I have no indication of how many fish the person could have caught if they had gone to place B, rather than to place A, which is where they actually went.

So as an indication of quality I was going to use the mean number of fish caught in the site (zone) at that particular time of the year (wave) in years past. That is why I wanted to create a matrix that would have the mean catch by zone and wave, something like this:

ZONE      1          2       ...
1      mean(1,1)  mean(1,2)
2      mean(2,1)  mean(2,2)

Which was my original question. I would use that matrix to input the mean catch for the alternatives that did not happen after I created them. Now if the matrix looked like the example above, I thought I could use:

gen meancatch = matrix[zone,wave]

I was hoping that this line of code would look up the wave and zone of each observation and input the value from the matrix that corresponded to each observation. So I looked around and found -tabstatmat- from SSC, and tried it, using the code you gave yesterday:

egen byv=group(zone wave), lab
tabstat num_typ3, stat(mean) by(byv) save
tabstatmat TABLE

And this created the matrix with the values, and looks like this:

  1†1:mean  1.9822335
  1†2:mean  2.6614173
  1†3:mean  2.7150396
  1†4:mean  3.3340782
  1†5:mean  2.8161094
  1†6:mean  1.1767857
  2†1:mean  1.5857143
  2†2:mean  2.1863208
  2†3:mean   2.542777
  2†4:mean  1.8849432
  2†5:mean  1.7281553
  2†6:mean  1.4927536
  3†1:mean      1.875

There's 85 sites with 6 waves a piece.

The original dataset has about 70,000 observations, so after creating 84 alternatives for each I get about 6,000,000 observations. I already know how to do this using -reshape- and the distance to the alternative sites, which I already put in the dataset. And what I need is to have the indicator of quality, or mean catch for each alternative site during the time period that the person actually went fishing. Then I will be able to run -clogit- or a similar procedure.

I hope this makes sense. I'm new both to stata and to choice models, so this has been a pretty confusing and slow process for me.

I really appreciate the help.


On Fri, 2 Sep 2011 12:45:05 -0400, Austin Nichols wrote:
Sergio <[email protected]>:
Did you read my response?
Look at the matrix; there is one column, so your references to row and
column make no sense.
You could make another matrix with values of byv corresponding to zone
and wave, noting that you must have these be integers counting from 1
up for row and column numbers to correspond to what you seem to want.
But why?  What would be the point of this?

On Fri, Sep 2, 2011 at 12:35 PM, Alvarez,Sergio <[email protected]> wrote:
Sorry about ambiguity.

So I used the mean by group code to create the matrix that would store the
mean values for each group, using:

egen byv=group(zone wave), lab
tabstat num_typ3, stat(mean) by(byv) save
tabstatmat TABLE

which gives me a matrix, or rather a vector, with all the values I need.  The first few lines of the matrix in the output screen look like this:

 1†1:mean  1.9822335
 1†2:mean  2.6614173
 1†3:mean  2.7150396
 1†4:mean  3.3340782
 1†5:mean  2.8161094
 1†6:mean  1.1767857
 2†1:mean  1.5857143
 2†2:mean  2.1863208
 2†3:mean   2.542777
 2†4:mean  1.8849432

Now what I want to do is use -gen- or -egen- to create a variable that would look up the zone and wave of the corresponding observation from the matrix
and insert the correct value in there.  So I tried:

gen meancatch = TABLE[zone,wave]

and this gives the correct values for all observations with wave = 1, but creates missing values on the rest of the observations.  I also tried:

gen meancatch = TABLE[byv,num_typ3]

and this gives me the correct value in some of the observations, but mostly
missing values in the others.

So I must be doing something wrong, but can't figure out what.  I guess the question is how to call the row and column numbers from the TABLE matrix?

Thanks again,


On Fri, 2 Sep 2011 12:08:31 -0400, Austin Nichols wrote:

Sergio <[email protected]> :
Now I have no idea what you are trying to do. For the mean by group,
egen mby=mean(num_typ3), by(zone wave)
but you are referring to (probably) nonexistent row and column numbers
of a matrix in your example.

On Fri, Sep 2, 2011 at 10:42 AM, Alvarez,Sergio <[email protected]> wrote:

Thanks Austin and Nick for your help.  I used what Austin suggested
is what Nick also suggested) and it worked. However, when I try to create
the variable that contains the mean by group it works for some
but missing values are created for most of them.  I tried both:

gen meancatch = TABLE[zone,wave]
gen meancatch = TABLE[byv,num_typ3]

For the first line of code, it creates the correct value for all
observations where wave = 1, but not for any others.  The second line
creates missing values at random (as far as I can tell).

I'd appreciate any tips.


*   For searches and help try:

Sergio Alvarez
Food and Resource Economics
University of Florida
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index