Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Regression across variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: Regression across variables
Date   Wed, 12 Nov 2003 13:37:07 -0000

You're correct. I misread this problem. 
I have a new problem in that I have 
to guess what the Excel syntax does, 
but it looks fairly transparent. 

You should -reshape-, I suggest. 

. reshape long affrxr2tag, i(array_id) string 

Put the controls in a variable, e.g. with 

. egen control = fill(0.25 0.5 1 2 4 0.25 0.5 1 2 4) 

or with -repeat()- from -egenmore- on SSC 

. egen control = repeat(), v(0.25 0.5 1 2 4)

then 

. bysort array_id : regress affrxr2tag control 

-statsby- could be vital here. 

Alternatively, 

1. Jeroen Weesie wrote a -slope()- for -egen-. 

. findit _gslope 

I don't think it's what your problem quite needs. 

2. Nick Winter wrote a -corr()- for -egen-. That's 
in -egenmore- from SSC. 

I'd still check the linearity carefully by 
looking at a series of graphs. 

Nick 
n.j.cox@durham.ac.uk 

Wallace, John
> 
> Thanks for your reply, Nick
> I was trying to keep my examples general in the belief that 
> it would be more
> broadly useful for others, but for clarity's sake, here's a 
> more explicit
> example.
> 
> Some of the developmental arrays made by my company have probes
> complementary (in the DNA sense) to control reagents at specific
> concentrations in the sample fluid.  One way to measure the 
> quality of the
> arrays is to perform a regression of signal for those 
> probes against the
> known concentration of the control reagents in the sample.  
> I've found that
> the slope and r-squared of the least-squares linear 
> regression correlates
> nicely with other measures of array quality, but computing 
> the fit isn't
> trivial.  At the moment I export the probe intensities from 
> the analysis
> software into excel, line them up against the 
> concentrations for the control
> reagents, and use Excel's Slope(y,x) and Rsq(y,x) functions 
> to get the
> parameters I'm looking for.
> I would prefer to do that in Stata, for all the reasons we 
> love Stata.  The
> data looks like:
> 
>        array_id   a~a_x_at   a~b_x_at   a~c_x_at   a~d_x_at 
>   a~e_x_at  
>   1.     930877       12.4       22.7       51.5        108 
>      293.5  
>   2.     930878        7.6         13       53.1         99 
>      244.2  
>   3.     930898       17.7         37       90.4        198 
>      436.6  
>   4.     930879       11.5       18.2       55.7        114 
>      277.8  
>   5.     930884       11.3       24.1       56.6      126.7 
>      301.3  
>   6.     930885       13.3       19.8         57        139 
>      270.1  
> 
> the variable names are truncated from affxr2taga_x_at, 
> affxr2tagb_x_at, etc
> 
> The Controls are at the following concentrations
> TagA: 0.25 E-12M (i.e. 250 femtomolar)
> TagB	0.5 E-12M
> TagC	1.0 E-12M
> TagD	2.0 E-12M
> TagE	4.0 E-12M
> 
> So, in Excel I would have cells like
> 	A	B	C	D	E	
> R1	0.25	0.5	1.0	2.0	4.0
> R2	12.4	22.7	51.5	108	293.5
> 
> And in column F I would use =SLOPE(A2:E2,A1:E1) to get the 
> slope of the
> linear regression and =RSQ(A2:E2,A1:E1) to get the coefficient of
> determination.
> 
> In stata terms, each observation would get a value in new 
> variables "slope"
> and "fit".  I've seen some egen commands like rmean() or 
> rsd() that works at
> the observation level like that; calculating values in new 
> variables from a
> function performed "across" variables for each observation.
> 
> One approach I thought about was using -xpose- to switch 
> observations with
> variables, then generating a new variable "conc" and doing 
> a plain ol'
> regression of array_id vs conc.  That's less attractive 
> though, because
> xpose mangles your dataset (even using the ,varnames 
> option, you can't get
> the original variable names back by running -xpose- again)     
> 
> It seems to me, from reading your earlier replies that you 
> think I'd like
> to, for example, calculate how much the 6 measures of 
> a~a_x_at correlate
> with a constant of 0.25.  That's not the case; I'm 
> interested in how the
> slope of (a-e vs pM) varies from array to array.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index