Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merging same variables within a dataset


From   Eric Booth <ebooth@ppri.tamu.edu>
To   "<statalist@hsphsun2.harvard.edu>" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: merging same variables within a dataset
Date   Fri, 19 Aug 2011 21:08:49 +0000

<>

Quick note:

Just to keep the code example similar to my first example, I kept the 

foreach p in Position {

which before was

foreach p in Position Company {

even though we don't need the `p' loop since there is only one element (Position) to consider.  This might cause confusion if someone just looks at the latest posting, but I was trying to make minimal changes to the code so that Dmitriy could track the changes he requested.

- Eric

On Aug 19, 2011, at 4:01 PM, Eric Booth wrote:

> <>
> 
> Sorry for the delay -- I've been away from the office.  In case you haven't solved your problem yet:
> 
> You could change the "by Name:" parts of the code in my example to "by Name Company", where appropriate, however, this isn't as straightforward after the data are reshaped.  So, I think a faster solution is to tack on the company to the name and -bysort- by that combined variable to avoid making big changes to my example:
> 
> 
> **************!
> clear
> inp str12(Name Position) Salary Bonus ///
> Compensation str10(Company)
> John  Manager    10  10 20 a
> Jack  Manager    20  20 30 a
> Brian  Director   10  10 40  c
> Amy  Manager    20  20 30   d
> John  Director    10  10 20  a
> Amy  Director    20  30 20   d
> Amy  Other 	10	12	30 f
> end
> 
> //added//
>   replace Name = Name + " - " + Company
> 
> **Position and Companies in one String:
> g i = _n
> bys Name: g j = _n
> qui su j
> loc max `r(max)'
> reshape wide Position, i(i) j(j)
> 
> foreach p in Position {
> forval n = 1/`max' {
> gsort Name -`p'`n'
> by Name: carryforward `p'`n', replace
> }
> **
> egen `p'_all = concat(`p'*) , punct(" ")
> replace `p'_all = trim(`p'_all)
> drop `p'?
> **fix duplicates in _all vars:
> forval n = 1/`=_N' {
> 	loc j = `p'_all[`n']
> 	loc j: list uniq local(j)
> 	replace `p'_all = "`j'" in `n'
> 	}
> }
> 
> 
> **keep totals by Name:
> collapse (sum) Salary Bonus Compensation  ///
> 	(first) Position_all Company, by(Name)
> 
> 
> //added//
> g len = length(Name)
> replace Name = substr(Name, 1, len-3)
> drop len
> **************!
> - Eric
> 
> 
> On Aug 15, 2011, at 9:27 AM, Dmitriy Glumov wrote:
> 
>> Eric,
>> 
>> Thank you for your response, it was very helpful. From your example,
>> how would I separate data for two different Amys (one from company d,
>> the second one from company f) without aggregating it? In other words,
>> what should I change in order for Stata to add up the numbers for two
>> Amys from company d but keep Amy from f separate? Once again, thank
>> you for the consideration and, if anyone can help solve this problem,
>> it would be very much appreciated.
>> 
>> Dmitri
>> 
>> On Fri, Aug 12, 2011 at 4:10 PM, Eric Booth <ebooth@ppri.tamu.edu> wrote:
>>> <>
>>> 
>>> I'd use -reshape- + -collapse-, not -merge-.
>>> You'll need to get carryforward for this example (-findit carryforward-):
>>> **************!
>>> clear
>>> inp str12(Name Position) Salary Bonus ///
>>> Compensation str10(Company)
>>> John  Manager    10  10 20 a
>>> Jack  Manager    20  20 30 a
>>> Brian  Director   10  10 40  c
>>> Amy  Manager    20  20 30   d
>>> John  Director    10  10 20  a
>>> Amy  Director    20  30 20   d
>>> Amy  Other      10      12      30 f
>>> end
>>> 
>>> 
>>> **Position and Companies in one String:
>>> g i = _n
>>> bys Name: g j = _n
>>> qui su j
>>> loc max `r(max)'
>>> reshape wide Position Company, i(i) j(j)
>>> 
>>> foreach p in Position Company {
>>> forval n = 1/`max' {
>>> gsort Name -`p'`n'
>>> by Name: carryforward `p'`n', replace
>>> }
>>> **
>>> egen `p'_all = concat(`p'*) , punct(" ")
>>> replace `p'_all = trim(`p'_all)
>>> drop `p'?
>>> }
>>> 
>>> 
>>> **keep totals by Name:
>>> collapse (sum) Salary Bonus Compensation  ///
>>>       (first) Position_all Company_all, by(Name)
>>> **************!
>>> - Eric
>>> __
>>> Eric A. Booth
>>> Public Policy Research Institute
>>> Texas A&M University
>>> ebooth@ppri.tamu.edu
>>> 
>>> On Aug 12, 2011, at 1:33 PM, Dmitriy Glumov wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I have just started using Stata and have ran into a problem. I would
>>>> like Stata to identify observations within a dataset that have the
>>>> same name, and add their respective values. To give an example, the
>>>> sample of the dataset looks something like this:
>>>> 
>>>> Name Position Salary Bonus Compensation Company
>>>> John  Manager    10       10         20                   a
>>>> Jack  Manager    20       20         30                   a
>>>> Brian  Director    10       10         40                   c
>>>> Amy  Manager    20       20         30                   d
>>>> John  Director     10       10         20                   a
>>>> Amy  Director     20       30         20                   d
>>>> 
>>>> 
>>>> And I want it to end up looking like this:
>>>> 
>>>> Name   Position              Salary Bonus Compensation Company
>>>> 
>>>> John  Manager/Director       20       20         40                   a
>>>> Jack  Manager                   20       20         30                   a
>>>> Brian  Director                   10       10         40                    c
>>>> Amy  Manager/Director      40       50         50                    d
>>>> 
>>>> In short, I would like Stata to identify same names and then add
>>>> salary, bonus, and compensation together, while also keeping the
>>>> company name and position. It is best if the company displays only one
>>>> label during the merge ("a" instead of "aa" or "a-a", etc) but, as far
>>>> as position, either merging or appending or just displaying either
>>>> name would be fine (so ManagerDirector or just one of the names would
>>>> also work), I just don't want it to create an error during the
>>>> transformation process.
>>>> 
>>>> Thank you for your consideration and any help with this would be
>>>> greatly appreciated.
>>>> 
>>>> Dmitri
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index