Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Bryan Sayer <bsayer@chrr.osu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Creating a second output data set |

Date |
Fri, 09 Sep 2011 18:08:41 -0400 |

Great, thanks! I tend to think more along the lines of FORTRAN.

Basically, in the loop, margprob looks like this: replace margprob[J] = margprob[J] + jointprob replace margprob[K] = margprob[K] + jointprob Where J and K are the observation number of the memory data set. Does this work? Bryan Sayer Monday to Friday, 8:30 to 5:00 Phone: (614) 442-7369 FAX: (614) 442-7329 BSayer@chrr.osu.edu On 9/9/2011 5:25 PM, Nick Cox wrote:

Your code shades between Stata and incomplete Stata, as you will know. However, a key principle here is that -postfile- never sees the locals in your calling program. It just gets passed their values. That's not a problem. It's the way to get round the basic fact that one program's locals are invisible to another program. Also these two lines definitely won't work local N_total egen double `N_total'=total(`count') The first defines the local N_total as blank, which is equivalent to not defining it at all. So, Stata will read the second line as egen double = total(`count') which will fail, as no new variable name is supplied. That said, there is no need to create a variable just to hold a total. su `count', meanonly will leave r(sum) in memory and the value of that can be put somewhere appropriate, into a local or a scalar or directly into another file. On Fri, Sep 9, 2011 at 8:49 PM, Bryan Sayer<bsayer@chrr.osu.edu> wrote:So I am still a bit confused about how -postfile- works when I want to preserve the data in memory. Specifically, how I generate the variables that I want in my -postfile- output versus the new one I do want to add to the data set in memory. I'm thinking I want to use a local (maybe macro?) variable for my results that go to -postfile-? In other words, how do I distinguish variables between the two files. Also, how do I accumulate results for my new variable that goes in my memory data set. I need to accumulate a sum for two observations in memory on each post to -postfile-. Here is what I have so far, but with the last part calculating the marginal probability (note that the joint probability calculation should be on one line): program jointprob args design infile outfile psu count margprob tempvar psu1 psu2 pi_one pi_joint tempfile results /* set up the file with the joint probabilities */ postfile `results' `psu1' `psu2' using "`outfile'" ,replace /* get the number of observations and the total count */ local N=_N local N_total egen double `N_total'=total(`count') quietly { /* read the input data set and create combinations of N items taken 2 at a time, without replacement */ forvalues J = 1/`N'{ forvalues K = 1/`N'{ if `K'>`J'{ psu1=`psu'[`J'] psu2=`psu'[`K'] pi_joint=(`count[`J']'*`count[`K']'/`N_total') * ((1/(`N_total'-`count[`J']')+(1/(`N_total'-`count[`K']')) post `results' psu1 psu2 pi_joint } } } } Bryan Sayer Monday to Friday, 8:30 to 5:00 Phone: (614) 442-7369 FAX: (614) 442-7329 BSayer@chrr.osu.edu On 9/7/2011 9:44 AM, Roger Newson wrote:-postfile- will still work if there is an existing dataset in the memory. However, the new dataset will be built in a file. Best wishes Roger Roger B Newson BSc MSc DPhil Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: r.newson@imperial.ac.uk Web page: http://www.imperial.ac.uk/nhli/r.newson/ Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ Opinions expressed are those of the author, not of the institution. On 07/09/2011 14:40, Bryan Sayer wrote:-postfile- will post my results, but my reading of how it works seems to indicate that my original data set cannot be open at the same time. The examples appear to me to clear the existing data set from memory. Admittedly, this is without me having tried anything yet, but am I not reading it correctly? What I need to do is a double loop through the input data set, outputting a record on each iteration of each loop. So I need the input data set open in memory, and a second file to post the results to. Are there any examples of something similar? Thanks! Bryan Sayer Monday to Friday, 8:30 to 5:00 Phone: (614) 442-7369 FAX: (614) 442-7329 BSayer@chrr.osu.edu On 9/6/2011 4:58 PM, Roger Newson wrote:I think you are looking for the -postfile- utility. In Stata, type help postfile to find out more. HTH. Best wishes Roger Roger B Newson BSc MSc DPhil Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: r.newson@imperial.ac.uk Web page: http://www.imperial.ac.uk/nhli/r.newson/ Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ Opinions expressed are those of the author, not of the institution. On 06/09/2011 21:53, Bryan Sayer wrote:I need to create an output data set that will differ in the content and number of observations from the input file. The observations will be created one at a time, based on the input data set. Specifically, I am creating all combinations of N objects taken two at a time. I will probably also do permutations. The input data set (to start with) consists of N records with two variables, the primary sampling unit (PSU) and a size variable associated with the PSU (a count variable). I want to create two output data sets. One is each combination of PSU with the associated joint probability. The second has the same structure as the input data set but includes the marginal probability, calculated as the sum of the joint probabilities associated with the PSU (which are accumulated as each combination is created). The part I am stuck on is how to output the data set of combinations. Can someone point me to a program that outputs a file as calculations are made? (For those interested, this is for probability proportional to size (PPS) sampling. See, for example, Levy and Lemeshow "Sampling of Populations, chapter 11). Here is an example of one stratum: Input data set (with marginal probability added) District Size pi(i) LUWEERO 12,466 0.916858 KAMPALA 3,459 0.542857 TORORO 2,815 0.448739 KAMULI 549 0.091546 Total 19,289 Output data set: COMBINATIONS pi(I,j) LUWEERO,KAMPALA 0.468854 LUWEERO,TORORO 0.377069 LUWEERO,KAMULI 0.070934 KAMPALA,TORORO 0.062531 KAMPALA,KAMULI 0.011473 TORORO,KAMULI 0.009139* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Creating a second output data set***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Creating a second output data set***From:*Bryan Sayer <bsayer@chrr.osu.edu>

**Re: st: Creating a second output data set***From:*Roger Newson <r.newson@imperial.ac.uk>

**Re: st: Creating a second output data set***From:*Bryan Sayer <bsayer@chrr.osu.edu>

**Re: st: Creating a second output data set***From:*Roger Newson <r.newson@imperial.ac.uk>

**Re: st: Creating a second output data set***From:*Bryan Sayer <bsayer@chrr.osu.edu>

**Re: st: Creating a second output data set***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**st: My stata won't -tab- with my value labels - why?** - Next by Date:
**Re: st: My stata won't -tab- with my value labels - why?** - Previous by thread:
**Re: st: Creating a second output data set** - Next by thread:
**Re: st: Creating a second output data set** - Index(es):