Factor-variable notation is a collection of prefixes and operators that allows us to specify regression models quickly and easily. We can distinguish between continuous and categorical variables, select reference categories, specify interactions between variables, and include polynomials of continuous variables. And factor-variable notation works with nearly all of Stata's regression commands such as **regress**, **probit**, **logit**, and **poisson**.

Let's begin by opening the **nhanes2l** dataset. Then let's **describe** and **summarize** the variables **bpsystol**, **age**, **bmi**, **diabetes**, and **hlthstat**.

.webuse nhanes2l(Second National Health and Nutrition Examination Survey) .describe bpsystol hlthstat diabetes age bmi

Variable Storage Display Value name type format label Variable label |

bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) bmi float %9.0g Body mass index (BMI) |

Variable | Obs Mean Std. dev. Min Max | |

bpsystol | 10,351 130.8817 23.33265 65 300 | |

hlthstat | 10,335 2.586164 1.206196 1 5 | |

diabetes | 10,349 .0482172 .2142353 0 1 | |

age | 10,351 47.57965 17.21483 20 74 | |

bmi | 10,351 25.5376 4.914969 12.3856 61.1297 |

We are going to fit a series of linear regression models for the outcome variable **bpsystol**, which measures systolic blood pressure with a range of 65 to 300 mmHg. **hlthstat** measures health status with a range from 1 to 5. **diabetes** measures diabetes status with a range of 0 to 1. **age** measures age with a range of 20 to 74 years. And **bmi** measures body mass index with a range of 12.4 to 61.1 kg/m^{2}.

Let's begin with a model including the predictor variable **hlthstat**. We suspect that **hlthstat** is a categorical variable because its description shows a value label named “hlth” and its summary has a minimum value of 1 and a maximum value of 5. Let's use **label list** to view the category labels.

.label list hlthhlth: 1 Excellent 2 Very good 3 Good 4 Fair 5 Poor .a Blank but applicable

**hlthstat** has five categories labeled Excellent, Very good, Good, Fair, and Poor. Stata's regression commands treat predictor variables as continuous by default, so we need to create indicator variables for each category of **hlthstat**. We could do this manually, but it is easier to use the “i.” prefix. The “i.” prefix is factor-variable notation that tells Stata a variable is categorical, and Stata will create temporary indicator variables for us automatically. Let's type **list hlthstat i.hlthstat** to see how it works.

.list hlthstat i.hlthstat in 1/10

1. 2. 3. 4. 5. | |

hlthstat hlthstat hlthstat hlthstat hlthstat hlthstat | |

1. | Very good 0 1 0 0 0 |

2. | Very good 0 1 0 0 0 |

3. | Good 0 0 1 0 0 |

4. | Fair 0 0 0 1 0 |

5. | Very good 0 1 0 0 0 |

6. | Poor 0 0 0 0 1 |

7. | Very good 0 1 0 0 0 |

8. | Excellent 1 0 0 0 0 |

9. | Very good 0 1 0 0 0 |

10. | Poor 0 0 0 0 1 |

The first column lists the value of **hlthstat** for the first 10 observations in our dataset. The next five columns, named **1.hlthstat** through **5.hlthstat**, are temporary indicator variables that Stata created for us. Category 1 in **hlthstat** is labeled “Excellent”, so the indicator variable **1.hlthstat** will equal 1 when **hlthstat** equals “Excellent” and 0 otherwise. Category 2 in **hlthstat** is labeled “Very good”, so the indicator variable **2.hlthstat** will equal 1 when **hlthstat** equals “Very good” and 0 otherwise. The indicator variables **3.hlthstat**, **4.hlthstat**, and **5.hlthstat** follow the same pattern for “Good”, “Fair”, and “Poor”, respectively. Note that the indicator variables do not remain in the dataset after the command finishes running.

We can use the “i.” prefix with **regress** to treat **hlthstat** as a categorical predictor variable.

.regress bpsystol i.hlthstat

Source | SS df MS | Number of obs = 10,335 | |

F(4, 10330) = 158.34 | |||

Model | 325244.686 4 81311.1715 | Prob > F = 0.0000 | |

Residual | 5304728.67 10,330 513.526492 | R-squared = 0.0578 | |

Adj R-squared = 0.0574 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 22.661 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Very good | 2.981587 .6415165 4.65 0.000 1.72409 4.239083 | |

Good | 8.034913 .6230047 12.90 0.000 6.813703 9.256123 | |

Fair | 14.71925 .721698 20.40 0.000 13.30459 16.13392 | |

Poor | 16.42304 .9580047 17.14 0.000 14.54517 18.30092 | |

_cons | 124.3191 .4618951 269.15 0.000 123.4137 125.2245 | |

The output includes a coefficient for the intercept, labeled “_cons”, as well as slope coefficients for “Very good”, “Good”, “Fair”, and “Poor”. The “Excellent” category was automatically removed from the model and used as the comparison group called the “reference category”. By default, Stata will select the category with the smallest number, estimate the mean of the outcome for that category, and label it “_cons”. So the mean systolic blood pressure for the “Excellent” category is 124.3 mmHg. The coefficients for the other categories are the differences between the mean outcome in that category relative to the reference category. For example, the coefficient for the “Poor” group is 16.4, so the mean systolic blood pressure in the “Poor” group is 16.4 points higher than the “Excellent” category.

We can select a different reference category using the “ib(#).” prefix, where “#” is the category number for the reference category. Let's use **hlthstat** category 5, “Poor”, as the reference category.

.regress bpsystol ib(5).hlthstat

Source | SS df MS | Number of obs = 10,335 | |

F(4, 10330) = 158.34 | |||

Model | 325244.686 4 81311.1715 | Prob > F = 0.0000 | |

Residual | 5304728.67 10,330 513.526492 | R-squared = 0.0578 | |

Adj R-squared = 0.0574 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 22.661 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Excellent | -16.42304 .9580047 -17.14 0.000 -18.30092 -14.54517 | |

Very good | -13.44146 .9500643 -14.15 0.000 -15.30377 -11.57915 | |

Good | -8.38813 .937664 -8.95 0.000 -10.22613 -6.550127 | |

Fair | -1.703789 1.005946 -1.69 0.090 -3.675638 .2680593 | |

_cons | 140.7421 .8393008 167.69 0.000 139.0969 142.3873 | |

The “Poor” category is now omitted from the output and “Excellent” is included. The coefficient for **_cons**, 140.7, is now the mean systolic blood pressure in the “Poor” group, and the mean systolic blood pressure in the “Excellent” group is 16.4 mmHg lower than the “Poor” group.

We can also use the prefix “ib(frequent).” to select the category with the largest sample size. We can type **tabulate hlthstat** to verify that the “Good” category has the largest sample size.

.tabulate hlthstat

Health status | Freq. Percent Cum. | ||

Excellent | 2,407 23.29 23.29 | ||

Very good | 2,591 25.07 48.36 | ||

Good | 2,938 28.43 76.79 | ||

Fair | 1,670 16.16 92.95 | ||

Poor | 729 7.05 100.00 | ||

Total | 10,335 100.00 |

Source | SS df MS | Number of obs = 10,335 | |

F(4, 10330) = 158.34 | |||

Model | 325244.686 4 81311.1715 | Prob > F = 0.0000 | |

Residual | 5304728.67 10,330 513.526492 | R-squared = 0.0578 | |

Adj R-squared = 0.0574 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 22.661 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Excellent | -8.034913 .6230047 -12.90 0.000 -9.256123 -6.813703 | |

Very good | -5.053326 .6107242 -8.27 0.000 -6.250464 -3.856189 | |

Fair | 6.684341 .6944701 9.63 0.000 5.323045 8.045637 | |

Poor | 8.38813 .937664 8.95 0.000 6.550127 10.22613 | |

_cons | 132.354 .4180763 316.58 0.000 131.5345 133.1735 | |

We can also use the prefix “ib(none).” to omit the reference category. This will display the mean outcome for each category when combined with the **noconstant** option.

.regress bpsystol ib(none).hlthstat, noconstant

Source | SS df MS | Number of obs = 10,335 | |

F(5, 10330) = 69083.04 | |||

Model | 177379866 5 35475973.3 | Prob > F = 0.0000 | |

Residual | 5304728.67 10,330 513.526492 | R-squared = 0.9710 | |

Adj R-squared = 0.9709 | |||

Total | 182684595 10,335 17676.3033 | Root MSE = 22.661 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Excellent | 124.3191 .4618951 269.15 0.000 123.4137 125.2245 | |

Very good | 127.3007 .4451924 285.95 0.000 126.428 128.1733 | |

Good | 132.354 .4180763 316.58 0.000 131.5345 133.1735 | |

Fair | 139.0383 .5545276 250.73 0.000 137.9513 140.1253 | |

Poor | 140.7421 .8393008 167.69 0.000 139.0969 142.3873 | |

The output tells us that the mean systolic blood pressure in the “Excellent” category is 124.3 and the mean systolic blood pressure in the “Poor” group is 140.7.

Binary variables are simply categorical variables with two categories, so everything we discussed above applies to binary variables. Binary variables are often coded as “0/1” indicator variables, but you should still use the “i.” prefix if you plan to use postestimation commands, such as **margins**, after you fit a regression model. Let's look at a few quick examples in the interest of completeness.

Here is a model that includes **diabetes** as a binary predictor variable.

.regress bpsystol i.diabetes

Source | SS df MS | Number of obs = 10,349 | |

F(1, 10347) = 244.99 | |||

Model | 130296.034 1 130296.034 | Prob > F = 0.0000 | |

Residual | 5502984.01 10,347 531.843434 | R-squared = 0.0231 | |

Adj R-squared = 0.0230 | |||

Total | 5633280.05 10,348 544.38346 | Root MSE = 23.062 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

diabetes | ||

Diabetic | 16.56328 1.058212 15.65 0.000 14.48898 18.63758 | |

_cons | 130.088 .2323666 559.84 0.000 129.6325 130.5435 | |

Let's use factor-variable notation to select people with diabetes as the reference category.

.regress bpsystol ib(1).diabetes

Source | SS df MS | Number of obs = 10,349 | |

F(1, 10347) = 244.99 | |||

Model | 130296.034 1 130296.034 | Prob > F = 0.0000 | |

Residual | 5502984.01 10,347 531.843434 | R-squared = 0.0231 | |

Adj R-squared = 0.0230 | |||

Total | 5633280.05 10,348 544.38346 | Root MSE = 23.062 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

diabetes | ||

Not diabetic | -16.56328 1.058212 -15.65 0.000 -18.63758 -14.48898 | |

_cons | 146.6513 1.032385 142.05 0.000 144.6276 148.675 | |

Let's fit a model with no intercept and no reference category.

.regress bpsystol ib(none).diabetes, noconstant

Source | SS df MS | Number of obs = 10,349 | |

F(2, 10347) > 99999.00 | |||

Model | 177422292 2 88711146 | Prob > F = 0.0000 | |

Residual | 5502984.01 10,347 531.843434 | R-squared = 0.9699 | |

Adj R-squared = 0.9699 | |||

Total | 182925276 10,349 17675.6475 | Root MSE = 23.062 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

diabetes | ||

Not diabetic | 130.088 .2323666 559.84 0.000 129.6325 130.5435 | |

Diabetic | 146.6513 1.032385 142.05 0.000 144.6276 148.675 | |

Stata's regression commands treat predictor variables as continuous by default. But you can use the “c.” prefix to tell Stata explicitly that a predictor variable should be treated as continuous. This will be necessary when you include continuous variables in interactions with other variables.

Here is a quick example treating **age** as a continuous predictor variable.

.regress bpsystol c.age

Source | SS df MS | Number of obs = 10,351 | |

F(1, 10349) = 3116.79 | |||

Model | 1304200.02 1 1304200.02 | Prob > F = 0.0000 | |

Residual | 4330470.01 10,349 418.443328 | R-squared = 0.2315 | |

Adj R-squared = 0.2314 | |||

Total | 5634670.03 10,350 544.412563 | Root MSE = 20.456 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

age | .6520775 .0116801 55.83 0.000 .6291823 .6749727 | |

_cons | 99.85603 .5909867 168.96 0.000 98.69758 101.0145 | |

Factor-variable notation also includes two operators. The “#” operator specifies an interaction between two variables, and the “##” operator specifies both the main effects and interaction of two variables.

Let's fit a model that includes the main effects for **hlthstat** and **diabetes** and use the “#” operator to include their interaction.

.regress bpsystol i.hlthstat i.diabetes i.hlthstat#i.diabetes

Source | SS df MS | Number of obs = 10,335 | |

F(9, 10325) = 86.92 | |||

Model | 396524.045 9 44058.2272 | Prob > F = 0.0000 | |

Residual | 5233449.31 10,325 506.871604 | R-squared = 0.0704 | |

Adj R-squared = 0.0696 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 22.514 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Very good | 2.636051 .6417076 4.11 0.000 1.37818 3.893922 | |

Good | 7.648725 .6272209 12.19 0.000 6.419251 8.8782 | |

Fair | 13.50647 .7408272 18.23 0.000 12.0543 14.95863 | |

Poor | 14.77223 1.032484 14.31 0.000 12.74837 16.7961 | |

diabetes | ||

Diabetic | 5.780232 4.618696 1.25 0.211 -3.273308 14.83377 | |

hlthstat# | ||

diabetes | ||

Very good # | ||

Diabetic | 17.43339 5.726714 3.04 0.002 6.207924 28.65886 | |

Good # | ||

Diabetic | 4.023894 5.032308 0.80 0.424 -5.840404 13.88819 | |

Fair # | ||

Diabetic | 7.316062 4.97969 1.47 0.142 -2.445096 17.07722 | |

Poor # | ||

Diabetic | 3.445358 5.09316 0.68 0.499 -6.538222 13.42894 | |

_cons | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |

We could fit the same model using the “##” operator.

.regress bpsystol i.hlthstat##i.diabetes

Source | SS df MS | Number of obs = 10,335 | |

F(9, 10325) = 86.92 | |||

Model | 396524.045 9 44058.2272 | Prob > F = 0.0000 | |

Residual | 5233449.31 10,325 506.871604 | R-squared = 0.0704 | |

Adj R-squared = 0.0696 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 22.514 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Very good | 2.636051 .6417076 4.11 0.000 1.37818 3.893922 | |

Good | 7.648725 .6272209 12.19 0.000 6.419251 8.8782 | |

Fair | 13.50647 .7408272 18.23 0.000 12.0543 14.95863 | |

Poor | 14.77223 1.032484 14.31 0.000 12.74837 16.7961 | |

diabetes | ||

Diabetic | 5.780232 4.618696 1.25 0.211 -3.273308 14.83377 | |

hlthstat# | ||

diabetes | ||

Very good # | ||

Diabetic | 17.43339 5.726714 3.04 0.002 6.207924 28.65886 | |

Good # | ||

Diabetic | 4.023894 5.032308 0.80 0.424 -5.840404 13.88819 | |

Fair # | ||

Diabetic | 7.316062 4.97969 1.47 0.142 -2.445096 17.07722 | |

Poor # | ||

Diabetic | 3.445358 5.09316 0.68 0.499 -6.538222 13.42894 | |

_cons | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |

We can include interactions with continuous variables too.

.regress bpsystol i.diabetes##c.age

Source | SS df MS | Number of obs = 10,349 | |

F(3, 10345) = 1071.05 | |||

Model | 1335031.79 3 445010.595 | Prob > F = 0.0000 | |

Residual | 4298248.26 10,345 415.490407 | R-squared = 0.2370 | |

Adj R-squared = 0.2368 | |||

Total | 5633280.05 10,348 544.38346 | Root MSE = 20.384 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

diabetes | ||

Diabetic | -5.669005 4.952369 -1.14 0.252 -15.37661 4.038595 | |

age | .6303981 .0119464 52.77 0.000 .6069808 .6538154 | |

diabetes# | ||

c.age | ||

Diabetic | .2233087 .0804934 2.77 0.006 .065526 .3810913 | |

_cons | 100.5111 .5969456 168.38 0.000 99.34096 101.6812 | |

We can even include three-way and higher-order interactions using the “#” and “##” operators.

.regress bpsystol i.hlthstat##i.diabetes##c.age

Source | SS df MS | Number of obs = 10,335 | |

F(19, 10315) = 173.56 | |||

Model | 1363865.23 19 71782.3807 | Prob > F = 0.0000 | |

Residual | 4266108.12 10,315 413.582949 | R-squared = 0.2423 | |

Adj R-squared = 0.2409 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 20.337 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Very good | -.2522701 1.571793 -0.16 0.872 -3.333289 2.828748 | |

Good | -1.269239 1.640212 -0.77 0.439 -4.484373 1.945895 | |

Fair | -1.892737 2.323042 -0.81 0.415 -6.446351 2.660877 | |

Poor | -1.470403 4.440142 -0.33 0.741 -10.17394 7.233137 | |

diabetes | ||

Diabetic | 5.648359 16.10149 0.35 0.726 -25.91369 37.21041 | |

hlthstat# | ||

diabetes | ||

Very good # | ||

Diabetic | .6634293 26.12969 0.03 0.980 -50.55583 51.88269 | |

Good # | ||

Diabetic | -16.56507 18.00713 -0.92 0.358 -51.86255 18.7324 | |

Fair # | ||

Diabetic | -7.761426 18.83079 -0.41 0.680 -44.67343 29.15058 | |

Poor # | ||

Diabetic | -5.055061 20.09251 -0.25 0.801 -44.44028 34.33016 | |

age | .5505586 .0261998 21.01 0.000 .499202 .6019153 | |

hlthstat# | ||

c.age | ||

Very good | .026618 .0352546 0.76 0.450 -.0424879 .0957239 | |

Good | .084684 .0349617 2.42 0.015 .0161522 .1532157 | |

Fair | .1210264 .0438944 2.76 0.006 .0349849 .2070679 | |

Poor | .0900039 .0752338 1.20 0.232 -.057469 .2374768 | |

diabetes# | ||

c.age | ||

Diabetic | -.1428421 .2867743 -0.50 0.618 -.7049754 .4192913 | |

hlthstat# | ||

diabetes# | ||

c.age | ||

Very good # | ||

Diabetic | .2297988 .4324672 0.53 0.595 -.6179209 1.077518 | |

Good # | ||

Diabetic | .3910658 .316956 1.23 0.217 -.2302295 1.012361 | |

Fair # | ||

Diabetic | .3139083 .3258971 0.96 0.335 -.3249132 .9527298 | |

Poor # | ||

Diabetic | .26957 .3465917 0.78 0.437 -.409817 .948957 | |

_cons | 102.2407 1.127687 90.66 0.000 100.0302 104.4512 | |

We have already learned that Stata treats predictor variables as continuous by default. But the opposite is true with interaction operators. Both “#” and “##” treat variables as categorical predictors if you do not specify a prefix. So typing **hlthstat##diabetes** would work. But typing **diabetes##age** would make a mess because **age** would be treated as a categorical variable by default. When in doubt, use the “i.” and “c.” prefixes to avoid mistakes.

The prefixes also have a “distributive property” when used with parentheses. The syntax below treats **hlthstat** and **diabetes** as categorical predictors and fits a model that includes their main effects as well as their interactions with **age**. Note that the model will not include the interaction of **hlthstat** and **diabetes**.

.regress bpsystol i.(hlthstat diabetes)##c.age

Source | SS df MS | Number of obs = 10,335 | |

F(11, 10323) = 298.72 | |||

Model | 1359359.05 11 123578.096 | Prob > F = 0.0000 | |

Residual | 4270614.3 10,323 413.698954 | R-squared = 0.2415 | |

Adj R-squared = 0.2406 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 20.34 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Very good | -.5801787 1.56339 -0.37 0.711 -3.644726 2.484369 | |

Good | -1.453802 1.627043 -0.89 0.372 -4.643121 1.735517 | |

Fair | -2.078403 2.286625 -0.91 0.363 -6.56063 2.403824 | |

Poor | -.9296666 4.211361 -0.22 0.825 -9.18475 7.325417 | |

diabetes | ||

Diabetic | -5.664698 5.022147 -1.13 0.259 -15.50908 4.179683 | |

age | .5433911 .0259983 20.90 0.000 .4924295 .5943527 | |

hlthstat# | ||

c.age | ||

Very good | .0382409 .034885 1.10 0.273 -.0301404 .1066222 | |

Good | .0887067 .0345224 2.57 0.010 .021036 .1563773 | |

Fair | .1300174 .0430386 3.02 0.003 .0456535 .2143813 | |

Poor | .0888559 .0713922 1.24 0.213 -.0510867 .2287985 | |

diabetes# | ||

c.age | ||

Diabetic | .2067666 .0816404 2.53 0.011 .0467356 .3667976 | |

_cons | 102.4518 1.122841 91.24 0.000 100.2508 104.6528 | |

We can also use the “#” and “##” operators to specify polynomial terms for continuous variables. For example, we may wish to fit a model that includes both **age** and the square of **age** in our model. We can do this by interacting **age** with itself.

.regress bpsystol c.age##c.age

Source | SS df MS | Number of obs = 10,351 | |

F(2, 10348) = 1592.42 | |||

Model | 1326071.99 2 663035.995 | Prob > F = 0.0000 | |

Residual | 4308598.04 10,348 416.370123 | R-squared = 0.2353 | |

Adj R-squared = 0.2352 | |||

Total | 5634670.03 10,350 544.412563 | Root MSE = 20.405 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

age | .0345687 .0859928 0.40 0.688 -.1339939 .2031312 | |

c.age#c.age | .0066366 .0009157 7.25 0.000 .0048417 .0084315 | |

_cons | 112.2463 1.808325 62.07 0.000 108.7017 115.791 | |

We could include a term for **age** cubed.

.regress bpsystol c.age##c.age##c.age

Source | SS df MS | Number of obs = 10,351 | |

F(3, 10347) = 1065.37 | |||

Model | 1329759.5 3 443253.167 | Prob > F = 0.0000 | |

Residual | 4304910.52 10,347 416.053979 | R-squared = 0.2360 | |

Adj R-squared = 0.2358 | |||

Total | 5634670.03 10,350 544.412563 | Root MSE = 20.397 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

age | -1.107037 .3929805 -2.82 0.005 -1.877355 -.3367196 | |

c.age#c.age | .0329455 .0088844 3.71 0.000 .0155303 .0503607 | |

c.age#c.age# | ||

c.age | -.0001879 .0000631 -2.98 0.003 -.0003116 -.0000642 | |

_cons | 112.2463 1.808325 62.07 0.000 108.7017 115.791 | |

We can also include the square of **age** when we include an interaction of **age** with another variable.

.regress bpsystol i.diabetes##c.age c.age#c.age

Source | SS df MS | Number of obs = 10,349 | |

F(4, 10344) = 817.53 | |||

Model | 1353111.75 4 338277.939 | Prob > F = 0.0000 | |

Residual | 4280168.29 10,344 413.782704 | R-squared = 0.2402 | |

Adj R-squared = 0.2399 | |||

Total | 5633280.05 10,348 544.38346 | Root MSE = 20.342 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

diabetes | ||

Diabetic | -.8886553 4.994811 -0.18 0.859 -10.67945 8.902141 | |

age | .0640567 .0865028 0.74 0.459 -.1055054 .2336188 | |

diabetes# | ||

c.age | ||

Diabetic | .1403559 .0813022 1.73 0.084 -.0190122 .2997239 | |

c.age#c.age | .0061116 .0009246 6.61 0.000 .0042992 .0079239 | |

_cons | 111.823 1.812009 61.71 0.000 108.2711 115.3749 | |

You can read more about factor-variable notation in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.

Read more in the *Stata Base Reference Manual*; see **[R] regress**. And in the *Stata Userâ€™s Guide*, see **[U] 11 Factor variables**.