Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Eric Booth <ebooth@ppri.tamu.edu> |

To |
"<statalist@hsphsun2.harvard.edu>" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: merging same variables within a dataset |

Date |
Fri, 19 Aug 2011 21:08:49 +0000 |

<> Quick note: Just to keep the code example similar to my first example, I kept the foreach p in Position { which before was foreach p in Position Company { even though we don't need the `p' loop since there is only one element (Position) to consider. This might cause confusion if someone just looks at the latest posting, but I was trying to make minimal changes to the code so that Dmitriy could track the changes he requested. - Eric On Aug 19, 2011, at 4:01 PM, Eric Booth wrote: > <> > > Sorry for the delay -- I've been away from the office. In case you haven't solved your problem yet: > > You could change the "by Name:" parts of the code in my example to "by Name Company", where appropriate, however, this isn't as straightforward after the data are reshaped. So, I think a faster solution is to tack on the company to the name and -bysort- by that combined variable to avoid making big changes to my example: > > > **************! > clear > inp str12(Name Position) Salary Bonus /// > Compensation str10(Company) > John Manager 10 10 20 a > Jack Manager 20 20 30 a > Brian Director 10 10 40 c > Amy Manager 20 20 30 d > John Director 10 10 20 a > Amy Director 20 30 20 d > Amy Other 10 12 30 f > end > > //added// > replace Name = Name + " - " + Company > > **Position and Companies in one String: > g i = _n > bys Name: g j = _n > qui su j > loc max `r(max)' > reshape wide Position, i(i) j(j) > > foreach p in Position { > forval n = 1/`max' { > gsort Name -`p'`n' > by Name: carryforward `p'`n', replace > } > ** > egen `p'_all = concat(`p'*) , punct(" ") > replace `p'_all = trim(`p'_all) > drop `p'? > **fix duplicates in _all vars: > forval n = 1/`=_N' { > loc j = `p'_all[`n'] > loc j: list uniq local(j) > replace `p'_all = "`j'" in `n' > } > } > > > **keep totals by Name: > collapse (sum) Salary Bonus Compensation /// > (first) Position_all Company, by(Name) > > > //added// > g len = length(Name) > replace Name = substr(Name, 1, len-3) > drop len > **************! > - Eric > > > On Aug 15, 2011, at 9:27 AM, Dmitriy Glumov wrote: > >> Eric, >> >> Thank you for your response, it was very helpful. From your example, >> how would I separate data for two different Amys (one from company d, >> the second one from company f) without aggregating it? In other words, >> what should I change in order for Stata to add up the numbers for two >> Amys from company d but keep Amy from f separate? Once again, thank >> you for the consideration and, if anyone can help solve this problem, >> it would be very much appreciated. >> >> Dmitri >> >> On Fri, Aug 12, 2011 at 4:10 PM, Eric Booth <ebooth@ppri.tamu.edu> wrote: >>> <> >>> >>> I'd use -reshape- + -collapse-, not -merge-. >>> You'll need to get carryforward for this example (-findit carryforward-): >>> **************! >>> clear >>> inp str12(Name Position) Salary Bonus /// >>> Compensation str10(Company) >>> John Manager 10 10 20 a >>> Jack Manager 20 20 30 a >>> Brian Director 10 10 40 c >>> Amy Manager 20 20 30 d >>> John Director 10 10 20 a >>> Amy Director 20 30 20 d >>> Amy Other 10 12 30 f >>> end >>> >>> >>> **Position and Companies in one String: >>> g i = _n >>> bys Name: g j = _n >>> qui su j >>> loc max `r(max)' >>> reshape wide Position Company, i(i) j(j) >>> >>> foreach p in Position Company { >>> forval n = 1/`max' { >>> gsort Name -`p'`n' >>> by Name: carryforward `p'`n', replace >>> } >>> ** >>> egen `p'_all = concat(`p'*) , punct(" ") >>> replace `p'_all = trim(`p'_all) >>> drop `p'? >>> } >>> >>> >>> **keep totals by Name: >>> collapse (sum) Salary Bonus Compensation /// >>> (first) Position_all Company_all, by(Name) >>> **************! >>> - Eric >>> __ >>> Eric A. Booth >>> Public Policy Research Institute >>> Texas A&M University >>> ebooth@ppri.tamu.edu >>> >>> On Aug 12, 2011, at 1:33 PM, Dmitriy Glumov wrote: >>> >>>> Hello, >>>> >>>> I have just started using Stata and have ran into a problem. I would >>>> like Stata to identify observations within a dataset that have the >>>> same name, and add their respective values. To give an example, the >>>> sample of the dataset looks something like this: >>>> >>>> Name Position Salary Bonus Compensation Company >>>> John Manager 10 10 20 a >>>> Jack Manager 20 20 30 a >>>> Brian Director 10 10 40 c >>>> Amy Manager 20 20 30 d >>>> John Director 10 10 20 a >>>> Amy Director 20 30 20 d >>>> >>>> >>>> And I want it to end up looking like this: >>>> >>>> Name Position Salary Bonus Compensation Company >>>> >>>> John Manager/Director 20 20 40 a >>>> Jack Manager 20 20 30 a >>>> Brian Director 10 10 40 c >>>> Amy Manager/Director 40 50 50 d >>>> >>>> In short, I would like Stata to identify same names and then add >>>> salary, bonus, and compensation together, while also keeping the >>>> company name and position. It is best if the company displays only one >>>> label during the merge ("a" instead of "aa" or "a-a", etc) but, as far >>>> as position, either merging or appending or just displaying either >>>> name would be fine (so ManagerDirector or just one of the names would >>>> also work), I just don't want it to create an error during the >>>> transformation process. >>>> >>>> Thank you for your consideration and any help with this would be >>>> greatly appreciated. >>>> >>>> Dmitri >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: merging same variables within a dataset***From:*Dmitriy Glumov <glumovdm@gmail.com>

**Re: st: merging same variables within a dataset***From:*Eric Booth <ebooth@ppri.tamu.edu>

**Re: st: merging same variables within a dataset***From:*Dmitriy Glumov <glumovdm@gmail.com>

**Re: st: merging same variables within a dataset***From:*Eric Booth <ebooth@ppri.tamu.edu>

- Prev by Date:
**Re: st: merging same variables within a dataset** - Next by Date:
**Re: st: mac editor for big files** - Previous by thread:
**Re: st: merging same variables within a dataset** - Next by thread:
**Re: st: merging same variables within a dataset** - Index(es):