Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Missing observations |

Date |
Fri, 21 Jun 2013 08:34:55 +0100 |

You started out with what looked like a data management question about -drop-, a topic I think I understand. Now this is a question about analysing your data. I have never worked with returns -- indeed I can not even remember the formula for a return. But your problem is now, if I understand you correctly, comparing time series of returns calculated over different time scales. Given the serial and scale dependence here, _none_ of the standard machinery of t tests, Mann-Whitney U tests, bootstrapping etc. carries over. Whoever is telling you to do otherwise should be able to explain to you why I am wrong and it is legitimate to treat returns as independent. Why anyone would study returns if they thought that is beyond me. I find it difficult to believe that no literature exists, but you should be able to understand why I don't know what it is. Nick njcoxstata@gmail.com On 20 June 2013 19:44, Csaba Kertai <csaba.kertai@hotmail.co.uk> wrote: > Thank you Nick. Could you let me know what is not clear about this, please? Let me explain what I want to do in another way. I have 9 variables each having different number of values. These 9 variables are return variables (e.g. 1-year raw return, 2-year raw return etc.) and I need to compare the means/medians/25th/75th/90th percentiles and the percentage of positive values (within one 'group') of these variables to see whether, say, the median difference between the 1-yr raw return 'group' and the 2-yr raw return 'group' is significant. For this, I have to use traditional parametric tests (i.e. the t-test) and non-parametric bootstrapping. > > Could you help me with this, please? I've been scouring the Internet for a solution to testing percentile differences but it seems that there's not much on this particular issue. > There are basically three things I cannot get my head round: how to test the median difference of 2 'groups' (tried 'signrank' and 'signtest' but these tests are paired tests), the percentiles difference of two 'groups', and the difference of the percentage of positive values between 2 'groups'. > > So you say that one solution could be to stack the 9 variables on top of each other and then group them by, say, inserting a second column (grouping variable) with numbers that will identify the 9 groups? > > Thank you > > > ---------------------------------------- >> Subject: Re: st: Missing observations >> From: njcoxstata@gmail.com >> Date: Thu, 20 Jun 2013 18:29:32 +0100 >> To: statalist@hsphsun2.harvard.edu >> >> This is really isn't clear to me, but it may be that -var1- and -var2- should be stacked on top of each other. >> >> Nick >> njcoxstata@gmail.com >> >> On 20 Jun 2013, at 15:41, Csaba <csaba.kertai@hotmail.co.uk> wrote: >> >>> Nick, >>> >>> Thank you for your reply. Yes you are right I muddled up observations with values. I meant to write values not observations. My problem is that if I use 'drop if missing(var2)' that will drop values for each variable in my data set. >>> >>> I need to compare the means/medians of 2 variables. Var1 has 1125 non-missing values, var2 has 169 non-missing values. I might be doing sth wrong but when I try using bootstrapping I get a message saying that I should drop any missing values as bootstrapping cannot distinguish between missing and non-missing values. That's why I want to drop missing values for Var2. Basically, I want to achieve the same result as with the unpaired two-sample mean comparison test but with bootstrapping. >>> >>> Thanks a lot! >>> >>> On 20 Jun 2013, at 12:32, Nick Cox <njcoxstata@gmail.com> wrote: >>> >>>> -drop- as used here drops entire observations (outside Stata >>>> observations are known as rows, cases, records). You seem to be under >>>> the impression that there is an operation >>>> >>>> drop missing values >>>> >>>> that is somehow different from >>>> >>>> -drop- observations >>>> >>>> but I don't know what that would look like. >>>> >>>> In your example if -var2- has only 169 non-missing values (_not_ >>>> observations) then >>>> >>>> drop if missing(var2) >>>> >>>> will leave precisely 169 observations. I don't understand how that is >>>> a surprise or what else you want. >>>> >>>> Nick >>>> njcoxstata@gmail.com >>>> >>>> >>>> On 20 June 2013 11:17, Csaba Kertai <csaba.kertai@hotmail.co.uk> wrote: >>>> >>>>> I need a bit of help with dropping missing observations. If I use 'drop if missing(var)' or drop if 'var'==. etc. many other observations are dropped as well. More precisely, var1 has 1125 observations and var2 has 169 observations. I want to drop missing observations for var2 but if I use drop if var2==. then this will keep only 169 observations for each variable. I only want to drop values that are missing. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Missing observations***From:*Csaba Kertai <csaba.kertai@hotmail.co.uk>

**Re: st: Missing observations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Missing observations***From:*Csaba <csaba.kertai@hotmail.co.uk>

**Re: st: Missing observations***From:*njcoxstata@gmail.com

**RE: st: Missing observations***From:*Csaba Kertai <csaba.kertai@hotmail.co.uk>

- Prev by Date:
**st: predictions out of sample with spatial regression** - Next by Date:
**st: error codes returned by NL function evaluator program to use dummy variables** - Previous by thread:
**RE: st: Missing observations** - Next by thread:
**Re: st: Missing observations** - Index(es):