# st: RE: AW: RE: AW: questions about panel data analysis and outliers.

 From "Nick Cox" To Subject st: RE: AW: RE: AW: questions about panel data analysis and outliers. Date Mon, 17 May 2010 12:59:24 +0100

```"Not hard" here could be read as "easy": although I do try to use
simpler words if they serve the purpose, the tone is not the same.

Any way, here is a Stata example:

set obs 100
set seed 2803
gen y = _n + 10 * rnormal()
gen x = _n
scatter y x
replace y = 0 if x == 90
scatter y x

Nick
n.j.cox@durham.ac.uk

<http://en.wikipedia.org/wiki/Sidney_Morgenbesser>

"During a lecture the Oxford linguistic philosopher J. L. Austin made
the claim that although a double negative in English implies a positive
meaning, there is no language in which a double positive implies a
negative. To which Morgenbesser responded in a dismissive tone, "Yeah,
yeah." (Some have it quoted as "Yeah, right.")"

Martin Weiss
============

" True, although it's not hard to find outliers on a bivariate
distribution which aren't outliers on either marginal."

The "double negation" is baffling me. Is it hard or not to find them?

Nick Cox
========

True, although it's not hard to find outliers on a bivariate
distribution which aren't outliers on either marginal.

1. Omit the putative outlier and see how much difference it makes.

2. Decide you should be using a transformation or non-identity link
function.

Martin Weiss
============

" My second question is: how can I estimate correlation without
outliers?"

You can qualify on the candidate variables not being outliers based on
their
univariate distribution, if that is what you mean:

*************
sysuse auto, clear
qui su mpg,d
gen byte within=inrange(mpg, r(p5), r(p95))
qui su weight,d
gen byte within2=inrange(weight, r(p5), r(p95))
corr mpg weight if within & within2
corr mpg weight
*************

Amatoallah ouchen

I have a panel data (T=3 and N=45) and  I want to perform a robust
regression, so I would like to know if it is ok if I cope with this
just as a simple  cross sectional  analysis (because the number of my
time serie is so thin).
what do you think about that?
My second question is: how can I estimate correlation without outliers?

