Austin Nichols <austinnichols@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: tabstatmat question

Fri, 2 Sep 2011 13:32:10 -0400

Sergio <sergioal@ufl.edu> : Sounds like my earlier suggestion egen mby=mean(num_typ3), by(zone wave) gets you there in one line. If you have a dataset for "years past" then you can -merge zone wave- onto this year's data. Or -joinby zone wave-. If you have panel data, see -by- and Nick Cox's tutorials on -by- e.g. http://www.stata-journal.com/sjpdf.html?articlenum=pr0004 You can also use a matrix (and -tabstatmat-), if you can ensure that the row numbers correspond to a variable on your data created by -egen, group-, but that is substantially trickier. On Fri, Sep 2, 2011 at 1:14 PM, Alvarez,Sergio <sergioal@ufl.edu> wrote: > Hi Austin, > > Thanks for your response. So what I'm doing is a site choice model of > recreational fishing. From the dataset I can tell what city/town people > come from and what city/town they went fishing in. I can also tell how many > fish people caught. For the site choice model, I need to create a series of > alternatives or just other places where the person could have fished at but > decided not to. But I need some indication of the quality of the site that > was not visited. Since alternative fishing trips did not take place, I have > no indication of how many fish the person could have caught if they had gone > to place B, rather than to place A, which is where they actually went. > > So as an indication of quality I was going to use the mean number of fish > caught in the site (zone) at that particular time of the year (wave) in > years past. That is why I wanted to create a matrix that would have the > mean catch by zone and wave, something like this: > > WAVE > ZONE 1 2 ... > 1 mean(1,1) mean(1,2) > 2 mean(2,1) mean(2,2) > ... > > Which was my original question. I would use that matrix to input the mean > catch for the alternatives that did not happen after I created them. Now if > the matrix looked like the example above, I thought I could use: > > gen meancatch = matrix[zone,wave] > > I was hoping that this line of code would look up the wave and zone of each > observation and input the value from the matrix that corresponded to each > observation. So I looked around and found -tabstatmat- from SSC, and tried > it, using the code you gave yesterday: > > egen byv=group(zone wave), lab > tabstat num_typ3, stat(mean) by(byv) save > tabstatmat TABLE > > And this created the matrix with the values, and looks like this: > > TABLE[414,1] > num_typ3 > 1†1:mean 1.9822335 > 1†2:mean 2.6614173 > 1†3:mean 2.7150396 > 1†4:mean 3.3340782 > 1†5:mean 2.8161094 > 1†6:mean 1.1767857 > 2†1:mean 1.5857143 > 2†2:mean 2.1863208 > 2†3:mean 2.542777 > 2†4:mean 1.8849432 > 2†5:mean 1.7281553 > 2†6:mean 1.4927536 > 3†1:mean 1.875 > ..... > > There's 85 sites with 6 waves a piece. > > The original dataset has about 70,000 observations, so after creating 84 > alternatives for each I get about 6,000,000 observations. I already know > how to do this using -reshape- and the distance to the alternative sites, > which I already put in the dataset. And what I need is to have the > indicator of quality, or mean catch for each alternative site during the > time period that the person actually went fishing. Then I will be able to > run -clogit- or a similar procedure. > > I hope this makes sense. I'm new both to stata and to choice models, so > this has been a pretty confusing and slow process for me. > > I really appreciate the help. > > Sergio > > > > On Fri, 2 Sep 2011 12:45:05 -0400, Austin Nichols wrote: >> >> Sergio <sergioal@ufl.edu>: >> Did you read my response? >> Look at the matrix; there is one column, so your references to row and >> column make no sense. >> You could make another matrix with values of byv corresponding to zone >> and wave, noting that you must have these be integers counting from 1 >> up for row and column numbers to correspond to what you seem to want. >> But why? What would be the point of this? >> >> On Fri, Sep 2, 2011 at 12:35 PM, Alvarez,Sergio <sergioal@ufl.edu> wrote: >>> >>> Sorry about ambiguity. >>> >>> So I used the mean by group code to create the matrix that would store >>> the >>> mean values for each group, using: >>> >>> egen byv=group(zone wave), lab >>> tabstat num_typ3, stat(mean) by(byv) save >>> tabstatmat TABLE >>> >>> which gives me a matrix, or rather a vector, with all the values I need. >>> The first few lines of the matrix in the output screen look like this: >>> >>> TABLE[414,1] >>> num_typ3 >>> 1†1:mean 1.9822335 >>> 1†2:mean 2.6614173 >>> 1†3:mean 2.7150396 >>> 1†4:mean 3.3340782 >>> 1†5:mean 2.8161094 >>> 1†6:mean 1.1767857 >>> 2†1:mean 1.5857143 >>> 2†2:mean 2.1863208 >>> 2†3:mean 2.542777 >>> 2†4:mean 1.8849432 >>> >>> Now what I want to do is use -gen- or -egen- to create a variable that >>> would >>> look up the zone and wave of the corresponding observation from the >>> matrix >>> and insert the correct value in there. So I tried: >>> >>> gen meancatch = TABLE[zone,wave] >>> >>> and this gives the correct values for all observations with wave = 1, but >>> creates missing values on the rest of the observations. I also tried: >>> >>> gen meancatch = TABLE[byv,num_typ3] >>> >>> and this gives me the correct value in some of the observations, but >>> mostly >>> missing values in the others. >>> >>> So I must be doing something wrong, but can't figure out what. I guess >>> the >>> question is how to call the row and column numbers from the TABLE matrix? >>> >>> Thanks again, >>> >>> Sergio >>> >>> >>> On Fri, 2 Sep 2011 12:08:31 -0400, Austin Nichols wrote: >>>> >>>> Sergio <sergioal@ufl.edu> : >>>> Now I have no idea what you are trying to do. For the mean by group, >>>> egen mby=mean(num_typ3), by(zone wave) >>>> but you are referring to (probably) nonexistent row and column numbers >>>> of a matrix in your example. >>>> >>>> On Fri, Sep 2, 2011 at 10:42 AM, Alvarez,Sergio <sergioal@ufl.edu> >>>> wrote: >>>>> >>>>> Thanks Austin and Nick for your help. I used what Austin suggested >>>>> (which >>>>> is what Nick also suggested) and it worked. However, when I try to >>>>> create >>>>> the variable that contains the mean by group it works for some >>>>> observations, >>>>> but missing values are created for most of them. I tried both: >>>>> >>>>> gen meancatch = TABLE[zone,wave] >>>>> and >>>>> gen meancatch = TABLE[byv,num_typ3] >>>>> >>>>> For the first line of code, it creates the correct value for all >>>>> observations where wave = 1, but not for any others. For the first line of code, it creates the correct value for all observations where wave = 1, but not for any others. The second line creates missing values at random (as far as I can tell).

I'd appreciate any tips.

