Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Sergiy Radyakin <serjradyakin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: how to automate sorting and how to automate extracting info from a sort |

Date |
Wed, 22 Dec 2010 19:25:32 -0500 |

Anjie, two points of concern are: 1) interpretation: Eric's code and that of my own differs in interpretation of what the "second smallest numeric value" is. For a column of 1,1,1,2,2,2,3,3,3 my code returns 2 (second smallest unique value), Eric's returns 1 (second smallest value). 2) starting point: You wrote "Imagine you have a very large matrix" and "Ultimately, you want to create a table 865 by 2 table" from this I concluded that you start with (and want to obtain) a matrix, not a dataset. If you are starting with a dataset, things get easier: sysuse census, clear keep pop* set obs `=_N+1' quietly { ds foreach v in `r(varlist)' { quietly levelsof `v', local (levs) quietly replace `v'=`:word 2 of `levs'' in `=_N' // again second smallest unique value } } keep in `=_N' list Still not sure why do you expect 2xN matrix as a result. Best, Sergiy Radyakin On Wed, Dec 22, 2010 at 3:45 PM, Eric Booth <ebooth@ppri.tamu.edu> wrote: > Anjie - > > I missed your statement: >> Ultimately, you want to create a table 865 by 2 table, where for each column, you have that one value (which is, again, the second smallest numeric value of all the rows in that column). > > > add this to end of my example to get the data you describe: > ****! > //added// > drop po* > keep in 1 > li > **! > > - Eric > __ > Eric A. Booth > Public Policy Research Institute > Texas A&M University > ebooth@ppri.tamu.edu > Office: +979.845.6754 > Fax: +979.845.0249 > http://ppri.tamu.edu > > > On Dec 22, 2010, at 2:03 PM, Eric Booth wrote: > >> <> >> >> >> This approach would give you new variables (prefixed with "m_") that are constants equal to the second smallest value in each column/variable of interest: >> ****************! >> sysuse census, clear >> keep pop* >> ** >> ds >> foreach v in `r(varlist)' { >> sort `v' >> tempvar i j >> g `i' = _n >> g `j' = `v' if `i'==2 >> egen m_`v' = max(`j') >> } >> ** >> sort pop >> tabstat m_* >> l pop m_pop in 1/5 >> ***************! >> >> - Eric >> __ >> Eric A. Booth >> Public Policy Research Institute >> Texas A&M University >> ebooth@ppri.tamu.edu >> Office: +979.845.6754 >> Fax: +979.845.0249 >> http://ppri.tamu.edu >> >> >> >> On Dec 22, 2010, at 12:21 PM, Anjanette Chan Tack wrote: >> >>> Hi Statalist -- >>> >>> I was wondering if you large dataset manipulation masters could advise me on a smart way to do the following: >>> >>> Imagine you have a very large matrix, with 865 columns and thousands of rows. What you want from this matrix is to get the second smallest numeric value of all the rows within a column, for each column. Ultimately, you want to create a table 865 by 2 table, where for each column, you have that one value (which is, again, the second smallest numeric value of all the rows in that column). >>> >>> To do this in stata the slow way (which is the only way I know), you could start with the first column, sort the values of the rows, pull out the value of the entry in the second row and record it in the 865 by 2 table you want to build. Then you repeat for the second column, the third, and so on. This is a very repetitive process, and mindless. Is there a way to tell stata, or some other program, to sort each column individually, and to pull out the value of the second row, and to produce a nice table with this information for all columns? >>> >>> Presently I have 8 datasets that I'd have to do this for. So using the slow way, I'd have to do this little operation 865*8 times. You can imagine that I would be very happy to find a way out of doing this by hand. If you have a smart way to automate this process, your help would save me from a very miserable time. >>> >>> Let me know if you have any suggestions. I have access to stata, so if you can think of a command to automate this with it would be excellent. If there are other programs that could do this work, and they aren't too hard to learn to use for this task, I'd be willing to try that as well. I can also turn these giant matrices into any format (csv, tab delimited, dbf files). The files generated are too large for excel to display them in full, but I can also have them generated in pieces (e.g. the matrix for one dataset broken into 4 files, for example). This wouldn't affect the sorting, as the sorting goes by column. >>> >>> Thanks for your help! >>> >>> Anjie. >>> >>> >>> ------------------------------- >>> Anjanette M. Chan Tack >>> PhD student >>> University of Chicago Department of Sociology >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: how to automate sorting and how to automate extracting info from a sort***From:*Anjanette Chan Tack <amc75@uchicago.edu>

**Re: st: how to automate sorting and how to automate extracting info from a sort***From:*Eric Booth <ebooth@ppri.tamu.edu>

**Re: st: how to automate sorting and how to automate extracting info from a sort***From:*Eric Booth <ebooth@ppri.tamu.edu>

- Prev by Date:
**Re: st: Dispersion parameter for a Negative Binomial model within GEE framework** - Next by Date:
**st: how to generate ranked variable for each year in panel data** - Previous by thread:
**Re: st: how to automate sorting and how to automate extracting info from a sort** - Next by thread:
**st: All possible combinations of X "nodes+neighbor clusters" that sum to a given number range** - Index(es):