Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: tabstatmat question
Austin Nichols <firstname.lastname@example.org>
Re: st: tabstatmat question
Fri, 2 Sep 2011 13:32:10 -0400
Sergio <email@example.com> :
Sounds like my earlier suggestion
egen mby=mean(num_typ3), by(zone wave)
gets you there in one line.
If you have a dataset for "years past" then you can -merge zone wave-
onto this year's data. Or -joinby zone wave-.
If you have panel data, see -by- and Nick Cox's tutorials on -by- e.g.
You can also use a matrix (and -tabstatmat-), if you can ensure that
the row numbers correspond to a variable on your data created by
-egen, group-, but that is substantially trickier.
On Fri, Sep 2, 2011 at 1:14 PM, Alvarez,Sergio <firstname.lastname@example.org> wrote:
> Hi Austin,
> Thanks for your response. So what I'm doing is a site choice model of
> recreational fishing. From the dataset I can tell what city/town people
> come from and what city/town they went fishing in. I can also tell how many
> fish people caught. For the site choice model, I need to create a series of
> alternatives or just other places where the person could have fished at but
> decided not to. But I need some indication of the quality of the site that
> was not visited. Since alternative fishing trips did not take place, I have
> no indication of how many fish the person could have caught if they had gone
> to place B, rather than to place A, which is where they actually went.
> So as an indication of quality I was going to use the mean number of fish
> caught in the site (zone) at that particular time of the year (wave) in
> years past. That is why I wanted to create a matrix that would have the
> mean catch by zone and wave, something like this:
> ZONE 1 2 ...
> 1 mean(1,1) mean(1,2)
> 2 mean(2,1) mean(2,2)
> Which was my original question. I would use that matrix to input the mean
> catch for the alternatives that did not happen after I created them. Now if
> the matrix looked like the example above, I thought I could use:
> gen meancatch = matrix[zone,wave]
> I was hoping that this line of code would look up the wave and zone of each
> observation and input the value from the matrix that corresponded to each
> observation. So I looked around and found -tabstatmat- from SSC, and tried
> it, using the code you gave yesterday:
> egen byv=group(zone wave), lab
> tabstat num_typ3, stat(mean) by(byv) save
> tabstatmat TABLE
> And this created the matrix with the values, and looks like this:
> 1†1:mean 1.9822335
> 1†2:mean 2.6614173
> 1†3:mean 2.7150396
> 1†4:mean 3.3340782
> 1†5:mean 2.8161094
> 1†6:mean 1.1767857
> 2†1:mean 1.5857143
> 2†2:mean 2.1863208
> 2†3:mean 2.542777
> 2†4:mean 1.8849432
> 2†5:mean 1.7281553
> 2†6:mean 1.4927536
> 3†1:mean 1.875
> There's 85 sites with 6 waves a piece.
> The original dataset has about 70,000 observations, so after creating 84
> alternatives for each I get about 6,000,000 observations. I already know
> how to do this using -reshape- and the distance to the alternative sites,
> which I already put in the dataset. And what I need is to have the
> indicator of quality, or mean catch for each alternative site during the
> time period that the person actually went fishing. Then I will be able to
> run -clogit- or a similar procedure.
> I hope this makes sense. I'm new both to stata and to choice models, so
> this has been a pretty confusing and slow process for me.
> I really appreciate the help.
> On Fri, 2 Sep 2011 12:45:05 -0400, Austin Nichols wrote:
>> Sergio <email@example.com>:
>> Did you read my response?
>> Look at the matrix; there is one column, so your references to row and
>> column make no sense.
>> You could make another matrix with values of byv corresponding to zone
>> and wave, noting that you must have these be integers counting from 1
>> up for row and column numbers to correspond to what you seem to want.
>> But why? What would be the point of this?
>> On Fri, Sep 2, 2011 at 12:35 PM, Alvarez,Sergio <firstname.lastname@example.org> wrote:
>>> Sorry about ambiguity.
>>> So I used the mean by group code to create the matrix that would store
>>> mean values for each group, using:
>>> egen byv=group(zone wave), lab
>>> tabstat num_typ3, stat(mean) by(byv) save
>>> tabstatmat TABLE
>>> which gives me a matrix, or rather a vector, with all the values I need.
>>> The first few lines of the matrix in the output screen look like this:
>>> 1†1:mean 1.9822335
>>> 1†2:mean 2.6614173
>>> 1†3:mean 2.7150396
>>> 1†4:mean 3.3340782
>>> 1†5:mean 2.8161094
>>> 1†6:mean 1.1767857
>>> 2†1:mean 1.5857143
>>> 2†2:mean 2.1863208
>>> 2†3:mean 2.542777
>>> 2†4:mean 1.8849432
>>> Now what I want to do is use -gen- or -egen- to create a variable that
>>> look up the zone and wave of the corresponding observation from the
>>> and insert the correct value in there. So I tried:
>>> gen meancatch = TABLE[zone,wave]
>>> and this gives the correct values for all observations with wave = 1, but
>>> creates missing values on the rest of the observations. I also tried:
>>> gen meancatch = TABLE[byv,num_typ3]
>>> and this gives me the correct value in some of the observations, but
>>> missing values in the others.
>>> So I must be doing something wrong, but can't figure out what. I guess
>>> question is how to call the row and column numbers from the TABLE matrix?
>>> Thanks again,
>>> On Fri, 2 Sep 2011 12:08:31 -0400, Austin Nichols wrote:
>>>> Sergio <email@example.com> :
>>>> Now I have no idea what you are trying to do. For the mean by group,
>>>> egen mby=mean(num_typ3), by(zone wave)
>>>> but you are referring to (probably) nonexistent row and column numbers
>>>> of a matrix in your example.
>>>> On Fri, Sep 2, 2011 at 10:42 AM, Alvarez,Sergio <firstname.lastname@example.org>
>>>>> Thanks Austin and Nick for your help. I used what Austin suggested
>>>>> is what Nick also suggested) and it worked. However, when I try to
>>>>> the variable that contains the mean by group it works for some
>>>>> but missing values are created for most of them. I tried both:
>>>>> gen meancatch = TABLE[zone,wave]
>>>>> gen meancatch = TABLE[byv,num_typ3]
>>>>> For the first line of code, it creates the correct value for all
>>>>> observations where wave = 1, but not for any others. The second line
>>>>> creates missing values at random (as far as I can tell).
>>>>> I'd appreciate any tips.
* For searches and help try: