# RE: RE: st: RE: a specific data management problem

 From "Nick Cox" <[email protected]> To <[email protected]> Subject RE: RE: st: RE: a specific data management problem Date Mon, 4 Jan 2010 20:25:45 -0000

```In addition, note the FAQ

How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html

That is the same problem, modulo an operator.

Nick
[email protected]

Martin Weiss 23 December 2009

"I want to check whether x is the same in every id. How to do it ?"

-sort- and friends do operate on strings, so just:

*************
clear*
input id str6 x
1 a
1 b
2 a
2 a
3 s
3 a
3 f
4 e
5 e
5 t
5 q
end

bys id (x): gen byte allthesame=x[1]==x[_N]
l if allthesame, noo sepby(id)
l if !allthesame, noo sepby(id)
*************

[email protected]

thank you very much again.
I found the solution went well after I modified "bys id: gen order=_n"
as
"bys id (item): gen order=_n" .
There is nothing wrong with string.
In my last posting "how to judge whether the value of a string variable
is
the same in every group." is another data management problem which has
nothing to do with the above problem.

For example"

input id str6 x
1 a
1 b
2 a
2 a
end
I want to check whether x is the same in every id. How to do it ?

From: Martin Weiss <[email protected]>

True, my solution depended critically on the assumption that

1) every orphan, i.e. group with only one observation, should be kept.

2) groups with more than one observation have the "total" observation on
number 1 (_n==1)

Any departure from this rule will indeed cause problems. What is the
rule in

Re strings, what is the problem there? I split the strings into tokens
and
used the first one to form my groups. Where does this approach lead to
errors?

[email protected]

thank you very much for you help.
There is something wrong with the solution. It seems that the variable
order
generated in each id is not correct if I change the order of the data I
input.

clear
input id str20 item amount
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
1 material 1000
end

The result is as follows,which is not what I expect.

+-----------------------------------+
| id item amount |
|-----------------------------------|
| 1 material includes:A 550 |
| 1 labor 400 |
| 1 manufacturing 200 |
|-----------------------------------|
| 2 labor includes:a 300 |
| 2 labor includes:b 200 |
| 2 material 800 |
|-----------------------------------|
| 3 labor 600 |
| 3 material 700 |
+-----------------------------------+

By the way,another problem is how to judge whether the value of a string
variable is the same in every group.

From: Martin Weiss <[email protected]>

*******
clear
input id str20 item amount
1 material 1000
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
end

bys id: gen order=_n
split item
bys id item1 (order): egen subtotal=total((_n>1)*amount)
bys id item1:gen byte keepobs=_N==1
bys id item1: replace keepobs=_n==1 & amount!=subtotal
bys id item1 (order): gen byte first=amount[1]==subtotal[1]
bys id item1 (order): gen byte dummy=(_n!=1) & (first)
keep if keepobs | dummy
sort id order
drop item1 item2 subtotal keepobs first dummy order
l, noo sepby(id)

[email protected]

I encountered a data management problem. Let take a exerpt of my data to
clarify my problem.

clear
input id str20 item amount
1 material 1000
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
end

The characteristic of the data is that in every id the item(s) for which
there are details is(are) variational.
What I expect is as follows. By id, if the sum of the detailed item
equals
the related total,drop the total observation and keep the detailed ones.
Otherwise,keep the total observation and drop the detailed ones.

Specifically, the result of the above data is
1 material 1000
1 labor 400
1 manufacturing 200
2 material 800
2 labor includes:a 300
2 labor includes:b 200
3 labor 600
3 material 700

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```