Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Marnix Zoutenbier" <Marnix.Zoutenbier@cqm.nl> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
RE: st: Predict in version 11 |

Date |
Wed, 8 Dec 2010 16:52:35 +0100 |

Dear Nick, Thank you for your response. However, your solution is not what I mean. I want to predict forboth testset==1 and testset==2, but I want Stata to predict a missing value in the case that x1=4 in testset==2 because x1=4 does not appear in testset==1. However, in version 11 Stata also predicts in testset==2 for values of x1 that do not appear in testset==1 (trainingset). Stata uses the constant to predict, which I think, is very confusing in large datasets. In version 10, Stata predicts a missing value in those cases, which is, in my opinion, the proper way to proceed. Thank you in advance for your consideration, Best regards, Marnix ______________________ Drs. Marnix Zoutenbier MTD CIRM Senior Consultant T: +31 (0)40 750 23 25 F: +31 (0)40 750 16 99 E: zoutenbier@cqm.nl CQM B.V. PO Box 414, 5600 AK Eindhoven, The Netherlands Vonderweg 16, 5616 RM Eindhoven, The Netherlands KvK 17076484 I: www.cqm.nl From: Nick Cox <n.j.cox@durham.ac.uk> To: "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> Date: 08-12-2010 13:38 Subject: RE: st: Predict in version 11 Sent by: owner-statalist@hsphsun2.harvard.edu The solution appears to be just a twist away from that already given. ... if testset == 1 Otherwise put, -predict- allows -if- (and -in-), so just specify whatever restrictions you want. Nick n.j.cox@durham.ac.uk Marnix Zoutenbier Neil his reaction is correct. However, it shows that I did not formulate my problem accurate, because it is not the solution that works for me. Let me extend the example with one extra observation to make myself more clear x1 testset y 1 1 12 2 1 13 3 1 14 4 2 . 3 2 . So the last observation is defined by x1 in the same way as the third observation. The testset (testset==2) consists of 2 observations, from which the observation with x1=3 can be predicted based on the traininset (testset==1) but the observation with x1=4 can not be predicted because x1=4 is not in the trainingset. First in version 11 version 11 anova y x1 if testset==1 predict yhat Gives the following result in version 11 x1 testset y yhat 1 1 12 12 2 1 13 13 3 1 14 14 4 2 . 12 3 2 . 14 Now in version 10 version 10 anova y x1 if testset==1 predict yhat Gives the following result x1 testset y yhat 1 1 12 12 2 1 13 13 3 1 14 14 4 2 . . 3 2 . 14 This problem is not fixed with the 'e(sample)' suggestion, because I do want to predict in the testset (outside e(sample)), however, I only want predictions for values of x1 that are used in the trainingset (testset==1). From: Neil Shephard <nshephard@gmail.com> On Wed, Dec 8, 2010 at 9:58 AM, Marnix Zoutenbier > I see a difference in the way predict works between Stata10 and 11. > > Consider the following example > x1 testset y > 1 1 12 > 2 1 13 > 3 1 14 > 4 2 . > > And the commands > anova y x1 if testset==1 > predict yhat > > The following is the result in version 11 > x1 testset y yhat > 1 1 12 12 > 2 1 13 13 > 3 1 14 14 > 4 2 . 12 > > While in version 10 the following dataset results > x1 testset y yhat > 1 1 12 12 > 2 1 13 13 > 3 1 14 14 > 4 2 . . > > I prefer the version 10 way-of-working, because it gives me the opportunity > to identify observations that are in the testset (testset==2) and not in > the trainingset (testset==1). > > Is it possible to obtain the same result in version 11 as in version 10, > other than switching with the version command before and after predict? Yes, see the -man predict- page (http://www.stata.com/help.cgi?predict), items 6 and 7 in the Description section near the top... predict can be used to make in-sample or out-of-sample predictions: 6. predict calculates the requested statistic for all possible observations, whether they were used in fitting the model or not. predict does this for the standard options 1 through 3 and generally does this for estimator-specific options 4. 7. predict newvar if e(sample), ... restricts the prediction to the estimation subsample. So in your above example under Stata 11 you should use... predict yhat if(e(sample)) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Predict in version 11***From:*Nick Cox <n.j.cox@durham.ac.uk>

**References**:**st: Predict in version 11***From:*"Marnix Zoutenbier" <Marnix.Zoutenbier@cqm.nl>

**Re: st: Predict in version 11***From:*Neil Shephard <nshephard@gmail.com>

**Re: st: Predict in version 11***From:*"Marnix Zoutenbier" <Marnix.Zoutenbier@cqm.nl>

**RE: st: Predict in version 11***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: calculating effect sizes when using svy command** - Next by Date:
**RE: st: Predict in version 11** - Previous by thread:
**RE: st: Predict in version 11** - Next by thread:
**RE: st: Predict in version 11** - Index(es):