[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RE: st: RE: a specific data management problem

From	[email protected]
To	statalist<[email protected]>
Subject	Re: RE: st: RE: a specific data management problem
Date	Wed, 23 Dec 2009 17:47:48 +0800

Martin,

thank you very much again.
I found the solution went well after I modified "bys id: gen order=_n" as "bys id (item): gen order=_n" .
There is nothing wrong with string.
In my last posting "how to judge whether the value of a string variable is the same in every group." is another data management problem which has nothing to do with the above problem.

For example"

input id str6 x
1 a
1 b
2 a
2 a
end
I want to check whether x is the same in every id. How to do it ?
Thank you for any help.
Best regards,

Rose


----- Original Message -----
From: Martin Weiss <[email protected]>
To: <[email protected]>
Subject: RE: st: RE: a specific data management problem
Date: 2009-12-23 17:32:56


<>

True, my solution depended critically on the assumption that

1) every orphan, i.e. group with only one observation, should be kept. 

2) groups with more than one observation have the "total" observation on
number 1 (_n==1)

Any departure from this rule will indeed cause problems. What is the rule in
your data?

Re strings, what is the problem there? I split the strings into tokens and
used the first one to form my groups. Where does this approach lead to
errors?


HTH
Martin


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Mittwoch, 23. Dezember 2009 10:19
To: statalist
Subject: Re: st: RE: a specific data management problem

Martin,
thank you very much for you help.
There is something wrong with the solution. It seems that the variable order
generated in each id is not correct if I change the order of the data I
input.

clear
input id str20 item amount
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
1 material 1000
end

The result is as follows,which is not what I expect.


+-----------------------------------+
| id item amount |
|-----------------------------------|
| 1 material includes:A 550 |
| 1 labor 400 |
| 1 manufacturing 200 |
|-----------------------------------|
| 2 labor includes:a 300 |
| 2 labor includes:b 200 |
| 2 material 800 |
|-----------------------------------|
| 3 labor 600 |
| 3 material 700 |
+-----------------------------------+



By the way,another problem is how to judge whether the value of a string
variable is the same in every group.

Thank you for any help.
Best regards,
Rose

----- Original Message -----
From: Martin Weiss <[email protected]>
To: <[email protected]>
Subject: st: RE: a specific data management problem
Date: 2009-12-23 15:51:57


<>

*******
clear
input id str20 item amount
1 material 1000
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
end


bys id: gen order=_n
split item
bys id item1 (order): egen subtotal=total((_n>1)*amount)
bys id item1:gen byte keepobs=_N==1
bys id item1: replace keepobs=_n==1 & amount!=subtotal
bys id item1 (order): gen byte first=amount[1]==subtotal[1]
bys id item1 (order): gen byte dummy=(_n!=1) & (first)
keep if keepobs | dummy
sort id order
drop item1 item2 subtotal keepobs first dummy order
l, noo sepby(id)


*******


HTH
Martin

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Mittwoch, 23. Dezember 2009 06:48
To: statalist
Subject: st: a specific data management problem

ear statalists,
I encountered a data management problem. Let take a exerpt of my data to
clarify my problem.

clear
input id str20 item amount
1 material 1000
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
end

The characteristic of the data is that in every id the item(s) for which
there are details is(are) variational.
What I expect is as follows. By id, if the sum of the detailed item equals
the related total,drop the total observation and keep the detailed ones.
Otherwise,keep the total observation and drop the detailed ones.

Specifically, the result of the above data is 
1 material 1000
1 labor 400
1 manufacturing 200
2 material 800
2 labor includes:a 300
2 labor includes:b 200
3 labor 600
3 material 700

Could anyone help me ? Thank you very much.

Best regards,
Rose.


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: RE: st: RE: a specific data management problem
  - From: "Martin Weiss" <[email protected]>

Prev by Date: RE: st: RE: a specific data management problem
Next by Date: st: R: RE: RE: Factor Analysis: which explained variance?
Previous by thread: RE: st: RE: a specific data management problem
Next by thread: RE: RE: st: RE: a specific data management problem
Index(es):
- Date
- Thread