Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

As others have indicated, there are _much_ better regression-based approaches to what appears to be the underlying research question here. There is also a separate issue of whether it is better to work with wage or its logarithm. In contrast here I focus on the pure Stata issues of generating variables with conditional means, as they are needed in other circumstances (e.g. for graphics). Consider Chiara's code: bys occupation: egen wage=mean(lwage) ge wage1f=wage if occupation==1 & fem==1 ge wage1m=wage if occupation==1 & fem==0 She wants the mean for each occupation and each gender. But the first statement mixes the genders together. For that reason the next two statements cannot identify separate means for different genders. The results will be the same for both genders and a given occupation. This code would do what Chiara seems to want. I switch to a generic response -y-: bysort occupation : egen y_f = mean(y / (fem == 1)) bysort occupation : egen y_m = mean(y / (fem == 0)) Note that dividing by 1 gives the numerator and dividing by 0 gives missing. Missings are ignored by -egen- in this case. See also Cox, N.J. 2011. Compared with .... Stata Journal 11(2): 305-314 Abstract. Many problems in data management center on relating values to values in other observations, either within a dataset as a whole or within groups such as panels. This column reviews some basic Stata techniques helpful for such tasks, including the use of subscripts, summarize, by:, sum(), cond(), and egen. Several techniques exploit the fact that logical expressions yield 1 when true and 0 when false. Dividing by zero to yield missings is revealed as a surprisingly valuable device. On Tue, Mar 27, 2012 at 12:41 PM, Chiara Mussida <cmussida@gmail.com> wrote: > I have to calculate the difference of mean log wages of men and women. > My dataset contains the variable lwage which is the log of wages. I > tried to generate the mean wage (also by occupation, for a more > detailed difference): > > bys occupation: egen wage=mean(lwage) > ge wage1f=wage if occupation==1 & fem==1 > ge wage1m=wage if occupation==1 & fem==0 > ge diff1=wage1m - wage1f if occupation==1 > > but this gave me a variable diff1 with no observations, since the mean > lwage for men is missing when the mean lwage for men it is not, and > viceversa. Again, if I do replace the missing values of men and women > with 0 this gives me false results (I know there is a difference > between missing and 0). > > How should I get my variable diff= mean(lwage men) - mean(lwage > women)? Total difference and/or difference by occupation. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

