Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: add up variable / quantile


From   "Scharnigg, Stan (Stud. SBE)" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: add up variable / quantile
Date   Thu, 14 Apr 2011 17:04:14 +0200

I tried a different approach now. I created two new variables (var1, var2), and I want to use those
two variables to create a third variable (var3). Var3 needs to be "1" if var1 has the same values as var2

So I tried this:
gen var3 = 0
replace var3 = 1 if var1==var2

However, this is not working. Is something like this possible in Stata? 
________________________________________
Van: [email protected] [[email protected]] namens Nick Cox [[email protected]]
Verzonden: donderdag 14 april 2011 13:10
Aan: [email protected]
Onderwerp: Re: st: add up variable / quantile

Doing anything by year will necessarily split observations for
different accounts into different groups. You may need to sort by
account and then year and then what you want to compare will be
adjacent.

Or do something like this

egen p2000 = total(performance * (year == 2000)), by(accountid)
egen p2001 = total(performance * (year == 2001)), by(accountid)
gen diff = p2001 - p2000
egen tag = tag(accountid)
list accountid p2000 p2001 diff if tag

Nick

On Thu, Apr 14, 2011 at 12:00 PM, Scharnigg, Stan (Stud. SBE)
<[email protected]> wrote:
> I have to do this to get some extra descriptive statistics. However, I am wondering if you have
> any tips how I could do the following:
>
> - I want to know how the top performing group of 2000 performed in e.g. 2001
>
> simplified example, I have five variables:
> AccountID
> Top_quantile_2000
> gross_performance
> year //
> new_accountID_2000 // only the accountIDs of the top performing group in 2000
>
> If I do something like:
> by year: summarize gross_performance_years if new_accountID_2000 >=12 & new_accountID_2000 <=42313 // value 12 and 42313 are lowest/highest accountIDs in that variable
>
> I only get a result in 2000 and 0 observations in the next year (2001). However, most of these accountIDs are also active in 2001. Do you maybe have suggestions
> for codes which can help me?  Thank you very much.
>
>
>
>
>
>
>
>
>
> ________________________________________
> Van: [email protected] [[email protected]] namens Nick Cox [[email protected]]
> Verzonden: donderdag 14 april 2011 8:17
> Aan: [email protected]
> Onderwerp: Re: st: add up variable / quantile
>
> Good. I had in my mind "Why not use -xtile- anyway?", but sorry, that
> didn't make it to my previous post.
>
> Nick
>
> On Wed, Apr 13, 2011 at 10:26 PM, Scharnigg, Stan (Stud. SBE)
> <[email protected]> wrote:
>> Thank you, that was indeed the problem. I solved it with the xtile command.
>> ________________________________________
>> Van: [email protected] [[email protected]] namens Nick Cox [[email protected]]
>> Verzonden: woensdag 13 april 2011 22:07
>> Aan: '[email protected]'
>> Onderwerp: RE: st: add up variable / quantile
>>
>> I don't follow this well, but I see that you are including a comparison with r(p75), which is presumably thought to be left over from some previous command.
>>
>> However, r-class results are ephemeral and don't stick around forever. In particular, they get overwritten by -egen-, which does its own internal -count- at some point.
>>
>> If, however, r(p75) is undefined, then it's treated as missing and your comparison would be whether values were missing or greater than that, which is evidently not true for your data.
>>
>> I rarely understand why people who have quantitative data want to degrade it to indicator variables. That sounds most unstatistical to me.
>>
>> That aside, I think you need to reissue whatever command produced the r(p75) before you try to use it.
>>
>> Nick
>> [email protected]
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Scharnigg, Stan (Stud. SBE)
>> Sent: 13 April 2011 20:42
>> To: [email protected]
>> Subject: RE: st: add up variable / quantile
>>
>> I still have a problem with this.
>>
>> My goal is to identify the top quantile for each years (total 6 years) in different variables
>>
>> What I did so far:
>> -----------------------------------------------------------
>> gen year=year(newdate) // create years instead of normal dates
>> egen gross_performance_years=total(gross_performance), by(accountID year) // create gross_performance per year
>> egen tag_year=tag(RekeningID year) // tag a 1 for each year per accountID
>> -----------------------------------------------------------
>>
>> this all works fine, however now I want to create the top quantile variable for each year. So i did the following:
>>
>> gen topq_2000=gross_performance_years >=r(p75) & year==y(2000), however this doesn't work. I only get "0" as value.
>>
>> I also tried this:
>>
>> generate topq_2000 = 0
>> replace topq_2000 = 1 if gross_performance_year >=r(p75) & tag_year==1 & year==2002
>>
>> but without succes
>> Does anybody has some tips how I can do this?
>>
>> ________________________________________
>> Van: [email protected] [[email protected]] namens Nick Cox [[email protected]]
>> Verzonden: dinsdag 29 maart 2011 12:44
>> Aan: [email protected]
>> Onderwerp: Re: st: add up variable / quantile
>>
>> On Tue, Mar 29, 2011 at 11:30 AM, Scharnigg, Stan (Stud. SBE)
>> <[email protected]> wrote:
>>> Look at the help for -egen-. You want
>>>
>>> egen total_gross = total(gross), by(accountID)
>>> -----------
>>>
>>> Thank you, but I have some additional questions:
>>>
>>> A. I have data for 6 years (72 months). What if I want to add up the gross_performance for e.g the first 12 months. So, I guess I need to
>>> create different variables for different time periods, but I am not sure how to do that. One extensive possibility might be that I create a different dataset
>>> for every period, but I guess there might also be another solution
>>>
>>> accountID; gross_performance; date
>>> 1                -.1                              jan_00
>>> 1                 .2                              febr_00
>>> 3                 .1                              jan_00
>>> 3                 .1                              febr_00
>>
>> You can specify -if- on an -egen- command. Different summaries will in
>> general require new variables (but not new datasets).
>>
>>> B. If I use egen total_gross = total(gross_performance), by(accountID) I get many duplicate values. In some cases I have
>>> 72 duplicate values. What is the best way to delete the duplicate values, so that they won't show up if I do some tests. I don't
>>> think that renaming them to "0" is an option then.
>>
>> You can use
>>
>> egen tag = tag(accountID)
>>
>> and then add
>>
>> ... if tag
>>
>> to commands to ensure that each summary is used once only. You cannot
>> delete (in Stata -drop-) without losing other information.
>>
>> Alternatively, -collapse- will yield a reduced dataset with one
>> observation for each account.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index