[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
gjhxmu@sina.com |

To |
statalist<statalist@hsphsun2.harvard.edu> |

Subject |
Re: RE: st: RE: a specific data management problem |

Date |
Wed, 23 Dec 2009 17:47:48 +0800 |

Martin, thank you very much again. I found the solution went well after I modified "bys id: gen order=_n" as "bys id (item): gen order=_n" . There is nothing wrong with string. In my last posting "how to judge whether the value of a string variable is the same in every group." is another data management problem which has nothing to do with the above problem. For example" input id str6 x 1 a 1 b 2 a 2 a end I want to check whether x is the same in every id. How to do it ? Thank you for any help. Best regards, Rose ----- Original Message ----- From: Martin Weiss <martin.weiss1@gmx.de> To: <statalist@hsphsun2.harvard.edu> Subject: RE: st: RE: a specific data management problem Date: 2009-12-23 17:32:56 <> True, my solution depended critically on the assumption that 1) every orphan, i.e. group with only one observation, should be kept. 2) groups with more than one observation have the "total" observation on number 1 (_n==1) Any departure from this rule will indeed cause problems. What is the rule in your data? Re strings, what is the problem there? I split the strings into tokens and used the first one to form my groups. Where does this approach lead to errors? HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of gjhxmu@sina.com Sent: Mittwoch, 23. Dezember 2009 10:19 To: statalist Subject: Re: st: RE: a specific data management problem Martin, thank you very much for you help. There is something wrong with the solution. It seems that the variable order generated in each id is not correct if I change the order of the data I input. clear input id str20 item amount 1 "material includes:A" 550 1 "material includes:B" 300 1 labor 400 1 manufacturing 200 2 material 800 2 labor 500 2 "labor includes:a" 300 2 "labor includes:b" 200 3 labor 600 3 material 700 1 material 1000 end The result is as follows,which is not what I expect. +-----------------------------------+ | id item amount | |-----------------------------------| | 1 material includes:A 550 | | 1 labor 400 | | 1 manufacturing 200 | |-----------------------------------| | 2 labor includes:a 300 | | 2 labor includes:b 200 | | 2 material 800 | |-----------------------------------| | 3 labor 600 | | 3 material 700 | +-----------------------------------+ By the way,another problem is how to judge whether the value of a string variable is the same in every group. Thank you for any help. Best regards, Rose ----- Original Message ----- From: Martin Weiss <martin.weiss1@gmx.de> To: <statalist@hsphsun2.harvard.edu> Subject: st: RE: a specific data management problem Date: 2009-12-23 15:51:57 <> ******* clear input id str20 item amount 1 material 1000 1 "material includes:A" 550 1 "material includes:B" 300 1 labor 400 1 manufacturing 200 2 material 800 2 labor 500 2 "labor includes:a" 300 2 "labor includes:b" 200 3 labor 600 3 material 700 end bys id: gen order=_n split item bys id item1 (order): egen subtotal=total((_n>1)*amount) bys id item1:gen byte keepobs=_N==1 bys id item1: replace keepobs=_n==1 & amount!=subtotal bys id item1 (order): gen byte first=amount[1]==subtotal[1] bys id item1 (order): gen byte dummy=(_n!=1) & (first) keep if keepobs | dummy sort id order drop item1 item2 subtotal keepobs first dummy order l, noo sepby(id) ******* HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of gjhxmu@sina.com Sent: Mittwoch, 23. Dezember 2009 06:48 To: statalist Subject: st: a specific data management problem ear statalists, I encountered a data management problem. Let take a exerpt of my data to clarify my problem. clear input id str20 item amount 1 material 1000 1 "material includes:A" 550 1 "material includes:B" 300 1 labor 400 1 manufacturing 200 2 material 800 2 labor 500 2 "labor includes:a" 300 2 "labor includes:b" 200 3 labor 600 3 material 700 end The characteristic of the data is that in every id the item(s) for which there are details is(are) variational. What I expect is as follows. By id, if the sum of the detailed item equals the related total,drop the total observation and keep the detailed ones. Otherwise,keep the total observation and drop the detailed ones. Specifically, the result of the above data is 1 material 1000 1 labor 400 1 manufacturing 200 2 material 800 2 labor includes:a 300 2 labor includes:b 200 3 labor 600 3 material 700 Could anyone help me ? Thank you very much. Best regards, Rose. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: RE: st: RE: a specific data management problem***From:*"Martin Weiss" <martin.weiss1@gmx.de>

- Prev by Date:
**RE: st: RE: a specific data management problem** - Next by Date:
**st: R: RE: RE: Factor Analysis: which explained variance?** - Previous by thread:
**RE: st: RE: a specific data management problem** - Next by thread:
**RE: RE: st: RE: a specific data management problem** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |