Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Nick Cox <[email protected]>

[email protected]

Re: st: Creating a second output data set

Fri, 9 Sep 2011 22:25:57 +0100

Your code shades between Stata and incomplete Stata, as you will know. However, a key principle here is that -postfile- never sees the locals in your calling program. It just gets passed their values. That's not a problem. It's the way to get round the basic fact that one program's locals are invisible to another program. Also these two lines definitely won't work local N_total egen double `N_total'=total(`count') The first defines the local N_total as blank, which is equivalent to not defining it at all. So, Stata will read the second line as egen double = total(`count') which will fail, as no new variable name is supplied. That said, there is no need to create a variable just to hold a total. su `count', meanonly will leave r(sum) in memory and the value of that can be put somewhere appropriate, into a local or a scalar or directly into another file. On Fri, Sep 9, 2011 at 8:49 PM, Bryan Sayer <[email protected]> wrote: > So I am still a bit confused about how -postfile- works when I want to > preserve the data in memory. Specifically, how I generate the variables > that I want in my -postfile- output versus the new one I do want to add to > the data set in memory. > > I'm thinking I want to use a local (maybe macro?) variable for my results > that go to -postfile-? In other words, how do I distinguish variables > between the two files. > > Also, how do I accumulate results for my new variable that goes in my memory > data set. I need to accumulate a sum for two observations in memory on each > post to -postfile-. > > Here is what I have so far, but with the last part calculating the marginal > probability (note that the joint probability calculation should be on one > line): > > program jointprob > args design infile outfile psu count margprob > tempvar psu1 psu2 pi_one pi_joint > tempfile results > /* set up the file with the joint probabilities */ > postfile `results' `psu1' `psu2' using "`outfile'" ,replace > /* get the number of observations and the total count */ > local N=_N > local N_total > egen double `N_total'=total(`count') > > quietly { > /* read the input data set and create combinations of N items > taken 2 at a time, without replacement */ > forvalues J = 1/`N'{ > forvalues K = 1/`N'{ > if `K'>`J'{ > psu1=`psu'[`J'] > psu2=`psu'[`K'] > > pi_joint=(`count[`J']'*`count[`K']'/`N_total') * > ((1/(`N_total'-`count[`J']')+(1/(`N_total'-`count[`K']')) > post `results' psu1 psu2 pi_joint > } > } > } > } > > > Bryan Sayer > Monday to Friday, 8:30 to 5:00 > Phone: (614) 442-7369 > FAX: (614) 442-7329 > [email protected] > > > On 9/7/2011 9:44 AM, Roger Newson wrote: >> >> -postfile- will still work if there is an existing dataset in the >> memory. However, the new dataset will be built in a file. >> >> Best wishes >> >> Roger >> >> >> Roger B Newson BSc MSc DPhil >> Lecturer in Medical Statistics >> Respiratory Epidemiology and Public Health Group >> National Heart and Lung Institute >> Imperial College London >> Royal Brompton Campus >> Room 33, Emmanuel Kaye Building >> 1B Manresa Road >> London SW3 6LR >> UNITED KINGDOM >> Tel: +44 (0)20 7352 8121 ext 3381 >> Fax: +44 (0)20 7351 8322 >> Email: [email protected] >> Web page: http://www.imperial.ac.uk/nhli/r.newson/ >> Departmental Web page: >> >> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ >> >> >> Opinions expressed are those of the author, not of the institution. >> >> On 07/09/2011 14:40, Bryan Sayer wrote: >>> >>> -postfile- will post my results, but my reading of how it works seems to >>> indicate that my original data set cannot be open at the same time. The >>> examples appear to me to clear the existing data set from memory. >>> >>> Admittedly, this is without me having tried anything yet, but am I not >>> reading it correctly? >>> >>> What I need to do is a double loop through the input data set, >>> outputting a record on each iteration of each loop. So I need the input >>> data set open in memory, and a second file to post the results to. >>> >>> Are there any examples of something similar? >>> >>> Thanks! >>> >>> Bryan Sayer >>> Monday to Friday, 8:30 to 5:00 >>> Phone: (614) 442-7369 >>> FAX: (614) 442-7329 >>> [email protected] >>> >>> >>> On 9/6/2011 4:58 PM, Roger Newson wrote: >>>> >>>> I think you are looking for the -postfile- utility. In Stata, type >>>> >>>> help postfile >>>> >>>> to find out more. >>>> >>>> HTH. >>>> >>>> Best wishes >>>> >>>> Roger >>>> >>>> >>>> Roger B Newson BSc MSc DPhil >>>> Lecturer in Medical Statistics >>>> Respiratory Epidemiology and Public Health Group >>>> National Heart and Lung Institute >>>> Imperial College London >>>> Royal Brompton Campus >>>> Room 33, Emmanuel Kaye Building >>>> 1B Manresa Road >>>> London SW3 6LR >>>> UNITED KINGDOM >>>> Tel: +44 (0)20 7352 8121 ext 3381 >>>> Fax: +44 (0)20 7351 8322 >>>> Email: [email protected] >>>> Web page: http://www.imperial.ac.uk/nhli/r.newson/ >>>> Departmental Web page: >>>> >>>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ >>>> >>>> >>>> >>>> >>>> Opinions expressed are those of the author, not of the institution. >>>> >>>> On 06/09/2011 21:53, Bryan Sayer wrote: >>>>> >>>>> I need to create an output data set that will differ in the content and >>>>> number of observations from the input file. The observations will be >>>>> created one at a time, based on the input data set. >>>>> >>>>> Specifically, I am creating all combinations of N objects taken two >>>>> at a >>>>> time. I will probably also do permutations. >>>>> >>>>> The input data set (to start with) consists of N records with two >>>>> variables, the primary sampling unit (PSU) and a size variable >>>>> associated with the PSU (a count variable). I want to create two output >>>>> data sets. One is each combination of PSU with the associated joint >>>>> probability. The second has the same structure as the input data set >>>>> but >>>>> includes the marginal probability, calculated as the sum of the joint >>>>> probabilities associated with the PSU (which are accumulated as each >>>>> combination is created). >>>>> >>>>> The part I am stuck on is how to output the data set of combinations. >>>>> Can someone point me to a program that outputs a file as calculations >>>>> are made? >>>>> >>>>> (For those interested, this is for probability proportional to size >>>>> (PPS) sampling. See, for example, Levy and Lemeshow "Sampling of >>>>> Populations, chapter 11). >>>>> >>>>> Here is an example of one stratum: >>>>> >>>>> Input data set (with marginal probability added) >>>>> >>>>> District Size pi(i) >>>>> LUWEERO 12,466 0.916858 >>>>> KAMPALA 3,459 0.542857 >>>>> TORORO 2,815 0.448739 >>>>> KAMULI 549 0.091546 >>>>> Total 19,289 >>>>> >>>>> >>>>> Output data set: >>>>> >>>>> COMBINATIONS pi(I,j) >>>>> LUWEERO,KAMPALA 0.468854 >>>>> LUWEERO,TORORO 0.377069 >>>>> LUWEERO,KAMULI 0.070934 >>>>> KAMPALA,TORORO 0.062531 >>>>> KAMPALA,KAMULI 0.011473 >>>>> TORORO,KAMULI 0.009139 >>>>> >>>>> >>>>> >>>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

