Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Nakoneshny <scnakone@ucalgary.ca> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: unexplained discrepancy between |

Date |
Wed, 21 Dec 2011 10:53:35 -0700 |

Hello all, We are working with a dataset of biomarker expression data. A colleague created some dummy variables using the median value as a dichotomous cut point for high / low expression. We also felt that this process would lend itself extremely well to using a loop. Here is the code we wrote / executed: --- begin code --- local y cnratio_min_p cnratio_max_p cnratio_mean_p //create the median cut points for each min, max and mean ratio foreach x of local y { egen `x'_median = median(`x') label variable `x'_median "Median Cut Point `x'" } *** create hilo vars with loop *** local x cnminhilo cnmaxhilo cnmeanhilo foreach var of local x { gen `var' = . foreach val in `y' { replace `var' = 1 if `val' > `val'_median & `val' < . replace `var' = 0 if `val' <= `val'_median } } *** Here's the old school way to create hilo variables for each of cn min max and mean *** gen cnminhilo_jcd = . replace cnminhilo_jcd=1 if cnratio_min_p > cnratio_min_p_median & cnratio_min_p < . replace cnminhilo_jcd=0 if cnratio_min_p <= cnratio_min_p_median label variable cnminhilo_jcd "CN Ratio > Median Cutpoint of Min" tab cnminhilo_jcd,m gen cnmaxhilo_jcd = . replace cnmaxhilo_jcd =1 if cnratio_max_p > cnratio_max_p_median & cnratio_max_p < . replace cnmaxhilo_jcd =0 if cnratio_max_p <= cnratio_max_p_median label variable cnmaxhilo_jcd "CN Ratio > Median Cutpoint of Max" tab cnmaxhilo_jcd,m gen cnmeanhilo_jcd = . replace cnmeanhilo_jcd =1 if cnratio_mean_p > cnratio_mean_p_median & cnratio_mean_p < . replace cnmeanhilo_jcd =0 if cnratio_mean_p <= cnratio_mean_p_median label variable cnmeanhilo_jcd "CN Ratio > Median Cutpoint of Mean" tab cnmeanhilo_jcd,m --- end code --- We then crosstabbed the results from each method to validate the results and found some discrepancies. Here is the output: --- begin code --- . tab cnminhilo cnminhilo_jcd,m | CN Ratio > Median Cutpoint of | Min cnminhilo | 0 1 . | Total -----------+---------------------------------+---------- 0 | 51 6 0 | 57 1 | 6 50 0 | 56 . | 0 0 13 | 13 -----------+---------------------------------+---------- Total | 57 56 13 | 126 . tab cnmaxhilo cnmaxhilo_jcd,m | CN Ratio > Median Cutpoint of | Max cnmaxhilo | 0 1 . | Total -----------+---------------------------------+---------- 0 | 50 7 0 | 57 1 | 7 49 0 | 56 . | 0 0 13 | 13 -----------+---------------------------------+---------- Total | 57 56 13 | 126 . tab cnmeanhilo cnmeanhilo_jcd,m | CN Ratio > Median Cutpoint of | Mean cnmeanhilo | 0 1 . | Total -----------+---------------------------------+---------- 0 | 57 0 0 | 57 1 | 0 56 0 | 56 . | 0 0 13 | 13 -----------+---------------------------------+---------- Total | 57 56 13 | 126 --- end code --- We then explored the data and found that the 6 obs where cnminhilo==1 & cnminhilo_jcd==0 were incorrectly coded in cnminhilo. The same held true for the other discrepancies in cnminhilo and cnmaxhilo. We've looked at the syntax of the loop and cannot see any differences between it and the longer hand-coding method used. We're at a loss to explain why and how these discrepancies arose. If it helps at all, all variables used here are stored as floats and we're using Stata/IC 11.2 for Mac. Hopefully someone can help enlighten us. Thanks, Steve * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: unexplained discrepancy between***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression** - Next by Date:
**Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression** - Previous by thread:
**st: Hierarchical Bayes with MCMC** - Next by thread:
**st: RE: unexplained discrepancy between** - Index(es):