[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Wallace, John" <John_Wallace@affymetrix.com> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Regression across variables |

Date |
Tue, 11 Nov 2003 15:20:16 -0800 |

Thanks for your reply, Nick I was trying to keep my examples general in the belief that it would be more broadly useful for others, but for clarity's sake, here's a more explicit example. Some of the developmental arrays made by my company have probes complementary (in the DNA sense) to control reagents at specific concentrations in the sample fluid. One way to measure the quality of the arrays is to perform a regression of signal for those probes against the known concentration of the control reagents in the sample. I've found that the slope and r-squared of the least-squares linear regression correlates nicely with other measures of array quality, but computing the fit isn't trivial. At the moment I export the probe intensities from the analysis software into excel, line them up against the concentrations for the control reagents, and use Excel's Slope(y,x) and Rsq(y,x) functions to get the parameters I'm looking for. I would prefer to do that in Stata, for all the reasons we love Stata. The data looks like: array_id a~a_x_at a~b_x_at a~c_x_at a~d_x_at a~e_x_at 1. 930877 12.4 22.7 51.5 108 293.5 2. 930878 7.6 13 53.1 99 244.2 3. 930898 17.7 37 90.4 198 436.6 4. 930879 11.5 18.2 55.7 114 277.8 5. 930884 11.3 24.1 56.6 126.7 301.3 6. 930885 13.3 19.8 57 139 270.1 the variable names are truncated from affxr2taga_x_at, affxr2tagb_x_at, etc The Controls are at the following concentrations TagA: 0.25 E-12M (i.e. 250 femtomolar) TagB 0.5 E-12M TagC 1.0 E-12M TagD 2.0 E-12M TagE 4.0 E-12M So, in Excel I would have cells like A B C D E R1 0.25 0.5 1.0 2.0 4.0 R2 12.4 22.7 51.5 108 293.5 And in column F I would use =SLOPE(A2:E2,A1:E1) to get the slope of the linear regression and =RSQ(A2:E2,A1:E1) to get the coefficient of determination. In stata terms, each observation would get a value in new variables "slope" and "fit". I've seen some egen commands like rmean() or rsd() that works at the observation level like that; calculating values in new variables from a function performed "across" variables for each observation. One approach I thought about was using -xpose- to switch observations with variables, then generating a new variable "conc" and doing a plain ol' regression of array_id vs conc. That's less attractive though, because xpose mangles your dataset (even using the ,varnames option, you can't get the original variable names back by running -xpose- again) It seems to me, from reading your earlier replies that you think I'd like to, for example, calculate how much the 6 measures of a~a_x_at correlate with a constant of 0.25. That's not the case; I'm interested in how the slope of (a-e vs pM) varies from array to array. -JW -----Original Message----- From: Nick Cox [mailto:n.j.cox@durham.ac.uk] Sent: Tuesday, November 11, 2003 11:33 AM To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: RE: Regression across variables Don't be misled; I am not a statistician myself and indeed have no formal training in it worth that name. However, whatever is posted on Statalist is open to challenge by anyone who can expose error and/or put forward a better solution, irrespective of background. As I understand it, your molarity values are not variables at all, but constants which act as gold standards or targets for your variables. Whether it makes sense to combine the analyses is difficult to say without understanding the experimental set-up. There is much advantage in a unified analysis, especially if in some sense the errors behave similarly across molarities, but deciding that might be helped by an initial exploratory analysis, such as . dotplot A B C D Things might look simpler on a log scale. Nick n.j.cox@durham.ac.uk Wallace, John > Thanks Nick - any implication of non-orthodoxy is purely my > ignorance in > these matters. My formal stat background is pretty weak. > What I was trying > to show is that there is in effect a variable orthogonal to > the matrix of > observations (the Molarity value) that I would like to > regress the row of > values for each observation against the row of Molarity > values (rather than > the column of A values against the column of B values, for example). > > The question would be how to introduce the molarity values > into the dataset > (each variable corresponds to a concentration level that is > being tested) > and how to tell stata to use it in the regression. > > If the answer is the same, I'll just have to plug away and > see if I can > figure out how my mental picture fits into what you said. > > I appreciate the help! Nick Cox > As I understand it, this is more orthodox > than you imply, and you could think > of the analysis as a series of regressions, except that > you have no covariates, at least that you're > showing us. That's not fatal, however. > > . regress A > > says in effect estimate the mean of A, > and much of the output you get is based > on the assumption that A follows, or > should follow, a normal (Gaussian, central) > distribution. > > Following that with > > . test _cons = 0.5 > > is, perhaps, a long-winded way of going > > . ttest A = 0.5 > > except that if you do have covariates, > the -regress- framework is the one on > which you can build. Ronan Conroy's > paper in SJ 2(3) 2002 is a very nice > example of this principle. > > Having said that, the assumption of normality > is important. It wouldn't surprise me if the > distributions were skewed and (say) gamma-like, > so that -glm- is then a better framework. Wallace, John > > > > Hi Statalisters. I'm trying to get Stata to perform a > > regression in a data > > structure different from the usual yvar xvar arrangement. > > I'll diagram the > > data set to show what I mean: > > > > Molarity 0.5 1 2 3 > > > > Variable A B C D > > Observ1 .22 .45 .99 1.4 > > Observ2 .23 .5 .98 1.5 > > Observ3 .19 .38 1.1 1.42 > > > > Molarity in this case would be the constant associated with > > each variable. > > The observations are measurements of the system attempting > > to quantify the > > molarity. The idea would be to generate additional > > variables that contain > > the various regression results of the observations vs Molarity. > > > > My data set at this point is just variable name against > > observation number. > > I don't know how to associate each variable with the > > corresponding molarity, > > or how to tell Stata to perform a regression in this way. > > Do I have to > > -reshape- or is there another way? * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: RE: Regression across variables***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Graph Syntax Wall Chart? [WAS: Stata 8 graph bugs?]** - Next by Date:
**st: time between last contact and death** - Previous by thread:
**st: RE: RE: Regression across variables** - Next by thread:
**st: RE: RE: Regression across variables** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |