Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | <carlo.lazzaro@tiscalinet.it> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: R: qnorm and ttest question |
Date | Fri, 3 Feb 2012 10:44:10 +0100 |
Dear Stata, Please take a look to Stata Faq #2. Point 3 ("Please note that many members are less inclined to answer anonymous emails, sometimes to the point of ignoring them on principle".) As far as your query is concerned, I agree with David Hoaglin that with 20,000 observations you should not come across any probems with t-test. However, another way may be to calculate a bootstrap p-value (please, see the code below): ----------------------------code begins-------------------------------------------- drop _all set obs 100 g group=1 in 1/50 replace group=2 in 51/100 g worked_hour=(60*runiform()) replace worked_hour=100 in 45/47 by group, sort: swilk worked_hour replace worked_hour=100 in 95/100 by group, sort: swilk worked_hour ttest worked_hour, by(group) unequal return list scalar t=r(t) g comb_mean=((r(mu_1)*r(N_1))+(r(mu_2)*r(N_2)))/(r(N_1)+r(N_2)) sum worked_hour if group==1, meanonly replace worked_hour= worked_hour-r(mean)+comb_mean if group==1 sum worked_hour if group==2, meanonly replace worked_hour= worked_hour-r(mean)+comb_mean if group==2 bootstrap r(t), reps(10000) nodots strata(group) saving(C:\Users\user\Desktop\bootstrap.dta, every(1) double replace) seed(12345) : ttest worked_hour, by(group) unequal use "C:\Users\user\Desktop\bootstrap.dta", clear generate indicator = abs(t)>=abs(scalar(t)) summarize indicator, meanonly display "p_bootstrap = " r(mean) ---------------------------------- Code ends------------------------------------------------------------------------ ----------------------------- HTH and Kind Regards, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Stata Inviato: giovedì 2 febbraio 2012 21:10 A: statalist@hsphsun2.harvard.edu Oggetto: st: qnorm and ttest question Hello, I try to see the data for "total worked hour in the past week" is normal distribution or not. I used qnorm and got a graph which most of dots fall on/closed to the line but the left side tail is above the line as "worked-hour" is always non negative. what should I say about this distribution? I want to do ttest on 2 groups. Is it correct that they should be normal distribution in order ttest result to be void? Can I apply CLT and assume them as normal distribution as my sample is greater than 20,000? I have tried the sktest and they did not pass the test. Any advice on how to handle these problem? Thanks. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/