I don't think "punishment" is the original rationale for adjusted R^2, although that is often cited as one of its benefits. Rather, R^2 is biased upwards, especially in small samples. Adjusted R^2 corrects for that.

McClendon discusses this in "Multiple Regression and Causal Analysis", 1994, pp. 81-82.

Basically he says that sampling error will always cause R^2 to be greater than zero, i.e. even if no variable has an effect R^2 will be positive in a sample. When there are no effects, across multiple samples you will see estimated coefficients sometimes positive, sometimes negative, but either way you are going to get a non-zero positive R^2. Further, when there are many Xs for a given sample size, there is more opportunity for R^2 to increase by chance.

So, adjusted R^2 wasn't primarily designed to "punish" you for mindlessly including extraneous variables (although it has that effect), it was just meant to correct for the inherent upward bias in regular R^2.

