Stata
Products Purchase Support Company
Search
   >> Home >> Products >> Capabilities >> Linear models >> Linear regression and influence

Linear regression and influence

  • Ramsey regression specification error test for omitted variables
  • Cook and Weisberg test for heteroskedasticity
  • Variance-inflation factors
  • Cook's distance
  • COVRATIO
  • DFBETAs
  • DFITs
  • Diagonal elements of hat matrix
  • Residuals, standardized residuals, studentized residuals
  • Standard errors of the forecast, prediction, and residuals
  • Welsch distance

Under the heading least squares, Stata can fit ordinary regression models, instrumental variable models, constrained linear regression, nonlinear least squares, and two-stage least-squares models. (Stata can also fit quantile regression models, which include median regression or minimization of the absolute sums of the residuals.)

After fitting a linear regression model, Stata can calculate predictions, residuals, standardized residuals, and studentized (jackknifed) residuals; the standard error of the forecast, prediction, and residuals; the influence measures Cook's distance, COVRATIO, DFBETAs, DFITS, leverage, and Welsch's distance; variance-inflation factors; specification tests; and tests for heteroskedasticity.

Among the fit diagnostic tools are added-variable plots (also known as partial-regression leverage plots, partial regression plots, or adjusted partial residual plots), component-plus-residual plots (also known as augmented partial residual plots), leverage-versus-squared-residual plots (or L-R plots), residual-versus-fitted plots, and residual-versus-predictor plots (or independent variable plots). Each tool is available by typing one command.

For example, let's start with a dataset that contains the price, weight, mpg, and origin (foreign or U.S.) for 74 cars:

    . webuse auto
    (1978 Automobile Data)

    . gen forXmpg=foreign*mpg

    . regress price weight mpg forXmpg foreign

	  Source |       SS       df       MS              Number of obs =      74
    -------------+------------------------------           F(  4,    69) =   21.22
	   Model |   350319665     4  87579916.3           Prob > F      =  0.0000
	Residual |   284745731    69  4126749.72           R-squared     =  0.5516
    -------------+------------------------------           Adj R-squared =  0.5256
	   Total |   635065396    73  8699525.97           Root MSE      =  2031.4
	
    ------------------------------------------------------------------------------
	   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
	  weight |   4.613589   .7254961     6.36   0.000     3.166264    6.060914
	     mpg |   263.1875   110.7961     2.38   0.020     42.15527    484.2197
	 forXmpg |  -307.2166   108.5307    -2.83   0.006    -523.7294   -90.70369
	 foreign |   11240.33   2751.681     4.08   0.000     5750.878    16729.78
	   _cons |  -14449.58    4425.72    -3.26   0.002    -23278.65    -5620.51
    ------------------------------------------------------------------------------
We created a new variable called forXmpg and then fitted our model. regress is Stata's linear regression command. All estimation commands have the same syntax: the name of the dependent variable followed by the names of the independent variables. After estimation, we can review diagnostic plots:

    . rvfplot, yline(0) 
Figure 1

Typing rvfplot displays a residual-versus-fitted plot, although we created the graph above by typing rvfplot, yline(0); this drew a line across the graph at 0. That you can discern a pattern indicates that our model has problems.

Here is how we obtain a leverage plot:

 . lvr2plot
Figure 2

avplot draws added-variable plots, both for variables currently in the model and variables not yet in the model:

    . avplot mpg
Figure 3

Added-variable plots are so useful that they are worth reviewing for every variable in the model:

    . avplots
Figure 4

The graph above is one Stata image and was created by typing avplots. The combined graph is useful because we have only four variables in our model, although Stata would draw the graph even if we had 798 variables in our model. The individual graphs would, however, be too small to be useful. That is why there is an avplot command.

Exploring the influence of observations in other ways is equally easy. For instance, we could obtain a new variable called cook containing Cook's distance and then list suspicious observations by typing

    . predict cook, cooksd, if e(sample)

    . predict e if e(sample), resid

    . list make price e cook if cook>4/74

         +--------------------------------------------------+ 
         | make                price           e       cook |
         |--------------------------------------------------| 
     12. | Cad. Eldorado      14,500     7271.96   .1492676 | 
     13. | Cad. Seville       15,906    5036.348   .3328515 |
     24. | Ford Fiesta         4,389    3164.872   .0638815 | 
     28. | Linc. Versailles   13,466    6560.912   .1308004 |
     42. | Plym. Arrow         4,647   -3312.968   .1700736 |
         +--------------------------------------------------+

We could obtain all the DFBETAs and then list the four observations having the most negative influence on the foreign coefficient and the four observations having the most positive influence by typing

    . dfbeta
	                    DFweight:  DFbeta(weight)
	                       DFmpg:  DFbeta(mpg)
	                   DFforXmpg:  DFbeta(forXmpg)
	                   DFforeign:  DFbeta(foreign)

    . sort DFforeign
    
    . list make price foreign DFforeign in 1/4
	
         +--------------------------------------------------+
         | make                price    foreign   DFforeign |
         |--------------------------------------------------|
      1. | Plym. Arrow         4,647   Domestic   -.6622424 |
      2. | Cad. Eldorado      14,500   Domestic   -.5290519 |
      3. | Linc. Versailles   13,466   Domestic   -.5283729 |
      4. | Toyota Corona       5,719    Foreign    -.256431 |
         +--------------------------------------------------+
	
    . list make price foreign DFforeign in -4/l

         +---------------------------------------------+
         | make            price    foreign   DFfore~n |
         |---------------------------------------------|
     71. | Volvo 260      11,995    Foreign   .2318289 |
     72. | Plym. Champ     4,425   Domestic   .2371104 |
     73. | Peugeot 604    12,990    Foreign   .2552032 |
     74. | Cad. Seville   15,906   Domestic   .8243419 |
         +---------------------------------------------+

See New in Stata 10 for more about what was added in Stata Release 10.

Stata 10
Overview: Why use Stata?
Stata/MP
64-bit Stata
Capabilities
Overview
Statistics
Basic statistics
Linear models
Linear regression & influence
Cross-sectional TS regression
Quantile regression
Multilevel mixed-effects models
Limited dependent variables
Panel data
GLM
Nonparametric
Exact statistics
ANOVA / MANOVA
Multivariate methods
Cluster analysis
Bootstrapping
Model testing
Survey methods
Survival analysis
Epidemiology tools
Time series
Maximum likelihood
Normality tests
Other methods
Data management
Graphics
Matrix programming—Mata
Programming
Internet capabilities
Y2K
Accessibility
Sample session
New in Stata 10
Supported platforms
Which Stata package?
Technical support
User comments
Products & services
Stata 10
Order Stata
Upgrade
NetCourses
Bookstore
Stata Journal
Stata Press
Stata News
STB
Stat/Transfer
Gift Shop

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index