Chat Q & A: Nonparametric regression—Estimation, inference, and effects

Participants of the webinar Nonparametric regression: Estimation, inference, and effects, which took place on May 9, 2018, asked StataCorp developers the following:

Question: Variable lbweight is also categorical. Why not add an i.lbweight?
Response: lbweight is the outcome variable. We usually don't specify factor-variable notation for outcome variables.
Question: Oh, so it's nonparametric logistic?
Response: It is not a logistic regression; it's estimating the conditional mean of a binary variable, where the means can be interpreted then in terms of probabilities.
Question: Seems to me that "kernel" should be an option (to follow a comma), rather than inserted after the command and before the model's dependent variable. Why the change in syntax for npregression?
Response: There are several estimation commands that have the method after the command name. For example, we have ivregress 2sls. kernel is one of several possible methods for doing nonparametric regression. We anticipate adding more methods for nonparametric regression in the future.
Question: Does npregress work in Stata 14?
Response: npregress is not available in Stata 14. It is a new command in Stata 15.
Question: Can npregress deal with complex design data with pweight?
Response: npregress does not support complex survey data or pweights.
Question: How do you account for endogeneity in the case we would like to make causal inference between y and x?
Response: npregress assumes that all the covariates are exogenous, like regress.
Question: There is no way to have an equivalent of two-stage least squares with nonparametric regressions?
Response: Stata has not yet implemented an estimator for nonparametric 2SLS. Another participant in the webinar pointed to the community-contributed nplate command described at https://sites.google.com/site/blaisemelly/home/computer-programs/nonparametric-mean-iv-estimation.
Question: Does this procedure work with survey (svy) data?
Response: npregress does not support complex survey data or pweights.
Question: I ran on my data, and the bootstrap failed, my N=312: "insufficient observations to compute bootstrap standard errors; no results will be saved". So what sample size is needed?
Response: There is no rule for the number of observations required. The number of observations required depends on the number of covariates and distributions of your covariates. Look at the slides that Enrique gave describing how npregress takes an average of nearby observations. If there are too few observations in one of these cells, the estimation cannot proceed.
Question: Will you get the same estimates as using lincom as you do with margins?
Response: lincom has little use in nonparametric regression. Recall Enrique's slide with a slash through Beta. There are no interesting Betas to combine. You think in terms of expected means, and margins does that.
Question: In margins, what does "nowald" mean in "contrast(nowald)"?
Response: Specifying nowald will suppress an additional table that shows Wald test results.
Question: Is the effect of increased education a function of reduced smoking and drinking in better-educated women?
Response: The syntax of the final margins commands in the slides demonstrates how you can explore the full response surface for the answer to these types of questions.
Question: Are these marginal effects commands the same for parametric estimation?
Response: Yes. margins can be used for both parametric and nonparametric models.
Question: Can nonparametric regression be applied to survival data?
Response: Possibly yes, but most likely you will have censored data, and in that case npregress may not be appropriate. npregress does not handle censored observations. It would treat censored observations as actually observed.
Question: Can you show an example where the function as done parametrically came up with different answers than nonparametric?
Response: Look at the example Enrique gave using lpoly. He showed that regress assuming a linear functional form gives a different answer than the nonparametric lpoly estimator. You would also get different results if you compared npregress with regress for that case.
Question: I understand I can use npregress for continuous and categorical variables, is that right?
Response: Yes, npregress can be used with continuous and categorical covariates.
Question: If the results of probit and npregress are similar, what would be the additional benefits of having npregress?
Response: I do not need to make any assumptions about the functional form. Think about the first example, when we had citations and fines. You could have fit different models, and they might have given you wrong answers. For the probit example, even if you knew it was a probit, you probably had no clue of the interactions. I knew what the true function was because I simulated it. Researchers usually do not have that luxury.
Question: So the specificity of npregress is to control for interactions?
Response: npregress goes after the mean function whatever its form is. So it "controls" for interactions and functional form if you want to think about it in that regard.
Question: How is nonparametric estimation superior to semiparametric estimation?
Response: When I think about semiparametric estimation of the mean function, I think about estimating a part parametrically and a part nonparametrically. So you are allowing some flexibility in some components and assuming a functional form in others. It is not better or worse; semiparametric estimation just makes more assumptions.
Question: Is it possible to use these insights to analyze the efficiency of a commercial bank?
Response: Possibly yes! However, we cannot say whether it's the best method for your particular purpose.

If you have additional questions about the webinar or about nonparametric regression in Stata, please contact our technical services department.