Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating a second output data set

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Creating a second output data set
Date	Fri, 9 Sep 2011 22:25:57 +0100

Your code shades between Stata and incomplete Stata, as you will know.

However, a key principle here is that -postfile- never sees the locals
in your calling program. It just gets passed their values. That's not
a problem. It's the way to get round the basic fact that one program's
locals are invisible to another program.

Also these two lines definitely won't work

       local N_total
       egen double `N_total'=total(`count')

The first defines the local N_total as blank, which is equivalent to
not defining it at all. So, Stata will read the second line as

egen double = total(`count')

which will fail, as no new variable name is supplied.

That said, there is no need to create a variable just to hold a total.

su `count', meanonly

will leave r(sum) in memory and the value of that can be put somewhere
appropriate, into a local or a scalar or directly into another file.

On Fri, Sep 9, 2011 at 8:49 PM, Bryan Sayer <[email protected]> wrote:

> So I am still a bit confused about how -postfile- works when I want to
> preserve the data in memory.  Specifically, how I generate the variables
> that I want in my -postfile- output versus the new one I do want to add to
> the data set in memory.
>
> I'm thinking I want to use a local (maybe macro?) variable for my results
> that go to -postfile-?  In other words, how do I distinguish variables
> between the two files.
>
> Also, how do I accumulate results for my new variable that goes in my memory
> data set.  I need to accumulate a sum for two observations in memory on each
> post to -postfile-.
>
> Here is what I have so far, but with the last part calculating the marginal
> probability (note that the joint probability calculation should be on one
> line):
>
> program jointprob
> args design infile outfile psu count margprob
> tempvar psu1 psu2 pi_one pi_joint
> tempfile results
>        /* set up the file with the joint probabilities */
>        postfile `results' `psu1' `psu2' using "`outfile'" ,replace
>        /* get the number of observations and the total count */
>        local N=_N
>        local N_total
>        egen double `N_total'=total(`count')
>
> quietly {
>        /* read the input data set and create combinations of N items
>           taken 2 at a time, without replacement */
>        forvalues J = 1/`N'{
>                forvalues K = 1/`N'{
>                        if `K'>`J'{
>                                psu1=`psu'[`J']
>                                psu2=`psu'[`K']
>
>  pi_joint=(`count[`J']'*`count[`K']'/`N_total') *
> ((1/(`N_total'-`count[`J']')+(1/(`N_total'-`count[`K']'))
>                                post `results' psu1 psu2 pi_joint
>                                }
>                }
>        }
> }
>
>
> Bryan Sayer
> Monday to Friday, 8:30 to 5:00
> Phone: (614) 442-7369
> FAX:  (614) 442-7329
> [email protected]
>
>
> On 9/7/2011 9:44 AM, Roger Newson wrote:
>>
>> -postfile- will still work if there is an existing dataset in the
>> memory. However, the new dataset will be built in a file.
>>
>> Best wishes
>>
>> Roger
>>
>>
>> Roger B Newson BSc MSc DPhil
>> Lecturer in Medical Statistics
>> Respiratory Epidemiology and Public Health Group
>> National Heart and Lung Institute
>> Imperial College London
>> Royal Brompton Campus
>> Room 33, Emmanuel Kaye Building
>> 1B Manresa Road
>> London SW3 6LR
>> UNITED KINGDOM
>> Tel: +44 (0)20 7352 8121 ext 3381
>> Fax: +44 (0)20 7351 8322
>> Email: [email protected]
>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>> Departmental Web page:
>>
>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>>
>>
>> Opinions expressed are those of the author, not of the institution.
>>
>> On 07/09/2011 14:40, Bryan Sayer wrote:
>>>
>>> -postfile- will post my results, but my reading of how it works seems to
>>> indicate that my original data set cannot be open at the same time. The
>>> examples appear to me to clear the existing data set from memory.
>>>
>>> Admittedly, this is without me having tried anything yet, but am I not
>>> reading it correctly?
>>>
>>> What I need to do is a double loop through the input data set,
>>> outputting a record on each iteration of each loop. So I need the input
>>> data set open in memory, and a second file to post the results to.
>>>
>>> Are there any examples of something similar?
>>>
>>> Thanks!
>>>
>>> Bryan Sayer
>>> Monday to Friday, 8:30 to 5:00
>>> Phone: (614) 442-7369
>>> FAX: (614) 442-7329
>>> [email protected]
>>>
>>>
>>> On 9/6/2011 4:58 PM, Roger Newson wrote:
>>>>
>>>> I think you are looking for the -postfile- utility. In Stata, type
>>>>
>>>> help postfile
>>>>
>>>> to find out more.
>>>>
>>>> HTH.
>>>>
>>>> Best wishes
>>>>
>>>> Roger
>>>>
>>>>
>>>> Roger B Newson BSc MSc DPhil
>>>> Lecturer in Medical Statistics
>>>> Respiratory Epidemiology and Public Health Group
>>>> National Heart and Lung Institute
>>>> Imperial College London
>>>> Royal Brompton Campus
>>>> Room 33, Emmanuel Kaye Building
>>>> 1B Manresa Road
>>>> London SW3 6LR
>>>> UNITED KINGDOM
>>>> Tel: +44 (0)20 7352 8121 ext 3381
>>>> Fax: +44 (0)20 7351 8322
>>>> Email: [email protected]
>>>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>>>> Departmental Web page:
>>>>
>>>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>>>>
>>>>
>>>>
>>>>
>>>> Opinions expressed are those of the author, not of the institution.
>>>>
>>>> On 06/09/2011 21:53, Bryan Sayer wrote:
>>>>>
>>>>> I need to create an output data set that will differ in the content and
>>>>> number of observations from the input file. The observations will be
>>>>> created one at a time, based on the input data set.
>>>>>
>>>>> Specifically, I am creating all combinations of N objects taken two
>>>>> at a
>>>>> time. I will probably also do permutations.
>>>>>
>>>>> The input data set (to start with) consists of N records with two
>>>>> variables, the primary sampling unit (PSU) and a size variable
>>>>> associated with the PSU (a count variable). I want to create two output
>>>>> data sets. One is each combination of PSU with the associated joint
>>>>> probability. The second has the same structure as the input data set
>>>>> but
>>>>> includes the marginal probability, calculated as the sum of the joint
>>>>> probabilities associated with the PSU (which are accumulated as each
>>>>> combination is created).
>>>>>
>>>>> The part I am stuck on is how to output the data set of combinations.
>>>>> Can someone point me to a program that outputs a file as calculations
>>>>> are made?
>>>>>
>>>>> (For those interested, this is for probability proportional to size
>>>>> (PPS) sampling. See, for example, Levy and Lemeshow "Sampling of
>>>>> Populations, chapter 11).
>>>>>
>>>>> Here is an example of one stratum:
>>>>>
>>>>> Input data set (with marginal probability added)
>>>>>
>>>>> District Size pi(i)
>>>>> LUWEERO 12,466 0.916858
>>>>> KAMPALA 3,459 0.542857
>>>>> TORORO 2,815 0.448739
>>>>> KAMULI 549 0.091546
>>>>> Total 19,289
>>>>>
>>>>>
>>>>> Output data set:
>>>>>
>>>>> COMBINATIONS pi(I,j)
>>>>> LUWEERO,KAMPALA 0.468854
>>>>> LUWEERO,TORORO 0.377069
>>>>> LUWEERO,KAMULI 0.070934
>>>>> KAMPALA,TORORO 0.062531
>>>>> KAMPALA,KAMULI 0.011473
>>>>> TORORO,KAMULI 0.009139
>>>>>
>>>>>
>>>>>
>>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/statalist/faq
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Creating a second output data set
  - From: Bryan Sayer <[email protected]>

References:
- st: Creating a second output data set
  - From: Bryan Sayer <[email protected]>
- Re: st: Creating a second output data set
  - From: Roger Newson <[email protected]>
- Re: st: Creating a second output data set
  - From: Bryan Sayer <[email protected]>
- Re: st: Creating a second output data set
  - From: Roger Newson <[email protected]>
- Re: st: Creating a second output data set
  - From: Bryan Sayer <[email protected]>

Prev by Date: st: Change variable
Next by Date: st: My stata won't -tab- with my value labels - why?
Previous by thread: Re: st: Creating a second output data set
Next by thread: Re: st: Creating a second output data set
Index(es):
- Date
- Thread