markstat
in it’s strict formThis iteration is for html and uses code fences aka strict markdown. There are gains and losses:
Gains
s/
blocksLosses
This is a meant to be a very simple exposition about modeling energy usage using Stata’s auto dataset. What makes the dataset special is that it is from the year 1978. Notable occurrences in 1978 were that Wayne Gretzky signed with the Indianapolis Racers of the World Hockey Association, and that the Boston Red Sox folded a 14-game lead to the Yankees. Eek. Oh, and homebrewing of beer was legalized in the U.S. To make things work more nicely, let’s pretend that this is some sort of sample of measurements, so that when we talk about “average energy consumption”, it will make some sense.
Let’s open the auto dataset, and look at its structure.
. sysuse auto, clear (1978 Automobile Data) . describe Contains data from /Applications/AAApplications/MathTools/Stata15/ado/base/a/au > to.dta obs: 74 1978 Automobile Data vars: 12 13 Apr 2016 17:45 size: 3,182 (_dta has notes) ─────────────────────────────────────────────────────────────────────────────── storage display value variable name type format label variable label ─────────────────────────────────────────────────────────────────────────────── make str18 %-18s Make and Model price int %8.0gc Price mpg int %8.0g Mileage (mpg) rep78 int %8.0g Repair Record 1978 headroom float %6.1f Headroom (in.) trunk int %8.0g Trunk space (cu. ft.) weight int %8.0gc Weight (lbs.) length int %8.0g Length (in.) turn int %8.0g Turn Circle (ft.) displacement int %8.0g Displacement (cu. in.) gear_ratio float %6.2f Gear Ratio foreign byte %8.0g origin Car type ─────────────────────────────────────────────────────────────────────────────── Sorted by: foreign
We could use a codebook
command here to look at all the variables, but it will take up too much space. Let’s do this instead:
. codebook, compact Variable Obs Unique Mean Min Max Label ─────────────────────────────────────────────────────────────────────────────── make 74 74 . . . Make and Model price 74 74 6165.257 3291 15906 Price mpg 74 21 21.2973 12 41 Mileage (mpg) rep78 69 5 3.405797 1 5 Repair Record 1978 headroom 74 8 2.993243 1.5 5 Headroom (in.) trunk 74 18 13.75676 5 23 Trunk space (cu. ft.) weight 74 64 3019.459 1760 4840 Weight (lbs.) length 74 47 187.9324 142 233 Length (in.) turn 74 18 39.64865 31 51 Turn Circle (ft.) displacement 74 31 197.2973 79 425 Displacement (cu. in.) gear_ratio 74 36 3.014865 2.19 3.89 Gear Ratio foreign 74 2 .2972973 0 1 Car type ───────────────────────────────────────────────────────────────────────────────
For those unfamiliar with the system of weights and measures used in the United States (and Liberia), the important conversions to remember are that
One other oddity in the so-called traditional (or Standard or English or Imperial) system, is that energy usage is measured in miles per gallon (mpg). This is not good for analysis, because it makes for a non-linear relationship between weight and energy. This can be seen in following graph:
To make the analysis work better, we should make a variable measuring gallons use per 100 miles driven:
. gen gp100m = 100/mpg, before(mpg) . label variable gp100m "Gallons per 100 miles"
One last conversion 1 gallon per 100 miles is about 75/32 (= 2.344) liters per 100 km.
Let’s take a look at various variables by whether the cars are from the US (domestic), or whether they are from outside the US (foreign). This was 1978, so country of manufacture mostly matched location of company. This is, of course, no longer the case.
. tabstat gp100m weight length turn displacement gear_ratio, /// > statistics( mean sd count ) by(foreign) Summary statistics: mean, sd, N by categories of: foreign (Car type) foreign │ gp100m weight length turn displa~t gear_r~o ─────────┼──────────────────────────────────────────────────────────── Domestic │ 5.318155 3317.115 196.1346 41.44231 233.7115 2.806538 │ 1.224346 695.3637 20.04605 3.967582 85.26299 .3359556 │ 52 52 52 52 52 52 ─────────┼──────────────────────────────────────────────────────────── Foreign │ 4.312848 2315.909 168.5455 35.40909 111.2273 3.507273 │ 1.144388 433.0035 13.68255 1.501082 24.88054 .2969076 │ 22 22 22 22 22 22 ─────────┼──────────────────────────────────────────────────────────── Total │ 5.01928 3019.459 187.9324 39.64865 197.2973 3.014865 │ 1.279856 777.1936 22.26634 4.399354 91.83722 .4562871 │ 74 74 74 74 74 74 ─────────┴────────────────────────────────────────────────────────────
This works, but it would be nice to have a table which makes it easier to see comparisons. For a simple example (with fewer statistics), we can use Ian Watson’s tabout
, version 3, from http://tabout.net.au. To facilitate the options needed for rerunning the command for different output types, the options for generating the command have been put in the file tabout_oneway.options
.
Mean values for US and Non-US cars
Gp100M | Weight | Displacement | Gear Ratio | |
Domestic (70%) | 5.32 | 3,317.1 | 233.7 | 2.807 |
Foreign (29%) | 4.31 | 2,315.9 | 111.2 | 3.507 |
Total (100%) | 5.02 | 3,019.5 | 197.3 | 3.015 |
Source: auto.dta
If time permits, we should be able to make a more-complete version of this table.
Before modelling, we should take a look to see if there could be collinearities in the predictors. [commented out here]
* graph matrix gp100m weight length turn displacement gear_ratio
Finally, how about modelling let’s first run a regression with many variables and then store the results
. regress gp100m weight displacement gear_ratio foreign Source │ SS df MS Number of obs = 74 ─────────────┼────────────────────────────────── F(4, 69) = 56.84 Model │ 91.7374232 4 22.9343558 Prob > F = 0.0000 Residual │ 27.8388375 69 .403461414 R-squared = 0.7672 ─────────────┼────────────────────────────────── Adj R-squared = 0.7537 Total │ 119.576261 73 1.63803097 Root MSE = .63519 ─────────────┬──────────────────────────────────────────────────────────────── gp100m │ Coef. Std. Err. t P>|t| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── weight │ .0014428 .000216 6.68 0.000 .0010118 .0018737 displacement │ .0012388 .0021161 0.59 0.560 -.0029828 .0054603 gear_ratio │ -.2037991 .3258603 -0.63 0.534 -.8538726 .4462744 foreign │ .733736 .2301493 3.19 0.002 .2746007 1.192871 _cons │ .8147969 1.239181 0.66 0.513 -1.657301 3.286895 ─────────────┴────────────────────────────────────────────────────────────────
We can see that, as expected, heavier cars take more energy to move. Perhaps unexpectedly, non-US cars use more gas at the same weight. It appears that we can throw out both displacement
and gear_ratio
as predictors and fit a simpler model.
. regress gp100m weight foreign Source │ SS df MS Number of obs = 74 ─────────────┼────────────────────────────────── F(2, 71) = 113.97 Model │ 91.1761694 2 45.5880847 Prob > F = 0.0000 Residual │ 28.4000913 71 .400001287 R-squared = 0.7625 ─────────────┼────────────────────────────────── Adj R-squared = 0.7558 Total │ 119.576261 73 1.63803097 Root MSE = .63246 ─────────────┬──────────────────────────────────────────────────────────────── gp100m │ Coef. Std. Err. t P>|t| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── weight │ .0016254 .0001183 13.74 0.000 .0013896 .0018612 foreign │ .6220535 .1997381 3.11 0.003 .2237871 1.02032 _cons │ -.0734839 .4019932 -0.18 0.855 -.8750354 .7280677 ─────────────┴────────────────────────────────────────────────────────────────
We can put these coefficients in a table, this time hiding the esttab
command:
──────────────────────────────────────────── (1) (2) gp100m gp100m ──────────────────────────────────────────── weight 0.00144*** 0.00163*** (6.68) (13.74) displacement 0.00124 (0.59) gear_ratio -0.204 (-0.63) foreign 0.734** 0.622** (3.19) (3.11) _cons 0.815 -0.0735 (0.66) (-0.18) ──────────────────────────────────────────── N 74 74 ──────────────────────────────────────────── t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001
From the simple model, cars from 40 years ago used 0.163 gallons per mile per extra 100 pounds, on average. Also, non-US cars use about 0.622 more gallons per mile, on average, all other things being equal.