You are not going to gain anything, except that pleasing reviewers is good for your career. The 10% dummy will mean that there is not as much information in your data as you would hope, but no amount of statistical trickery will create information that is not present in your data... Conceptually, what the reviewer asked you to do seems to correspond with propensity score matching, and there are tools in Stata available for such an analysis, see: -findit propensity score-. There are good rasons for using propensity score matching (and equally good reasons for not doing so, it all depends on the exact nature of your research question, your data, etc.) but a sparse dummy is not one of them. Hope this helps, Maarten On Tue, May 1, 2012 at 10:22 AM, Andrea Rispoli <andrea.rspl@gmail.com> wrote: > Dear Stan, > Thank you. This is the request of a reviewer. Would you recommend that > I simply chose a random sample? > > On Tue, May 1, 2012 at 3:13 AM, Stas Kolenikov <skolenik@gmail.com> wrote: >> So why exactly do you want to do this? You will only lose in >> precision, provided your model is OK; if it is badly misspecified, >> then God only knows how your coefficients could jump around, so you >> probably should not trust either specification, anyway. >> >> On Mon, Apr 30, 2012 at 6:44 PM, Andrea Rispoli <andrea.rspl@gmail.com> wrote: >>> Dear Statalisters, >>> I am running a regression model: y=f(x, age, size) where x is a dummy >>> variable that can take value 1 or 0. >>> Since in my sample x=1 for 10% of the sample and x=0 for 90% of the >>> sample, I would like to identify a random subsample among the group >>> x=0 so that it is more "comparable" in terms of size with the >>> subsample for which x=1. >>> >>> My problem is that I would like that the selected subsample (in which >>> x=0) matched the characteristics of the first subsample (x=1) on the >>> other dimensions (e,g age and size). >>> For instance, if I take the subsample x=1, mean of age = 37, mean of size=45. >>> I would like to randomly select the second subsample (x=0), so that >>> mean of age = 37, mean of size=45 as it is the case in the first >>> subsample (x=1). >>> >>> Do you have any suggestions on how I could achieve such result in stata? >>> >>> Thank you very much in advance for all your help!!! >>> Kind Regards >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> >> -- >> Stas Kolenikov, also found at http://stas.kolenikov.name >> Small print: I use this email account for mailing lists only. >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

