From "Lijun Song" To Subject st: confusion about psmatch2 Date Sat, 27 Mar 2004 11:44:20 -0500

```Hi,

As a beginner at Stata, I am confused about the psmatch2.

Suppose I am interested in the causal effect of college (a 1-0
treatment indicator) on income. I also assume that race, sex, and
family background (such as Mother and Father's SEI) will influence the
treatment assignment. I also could not deny that race,sex, and family
background also influence the outcome of interests, income directly.

Then, after regressing income on college, I arrive at an estimated
coffecient. After that, I use "psmatch2 college race sex masei pasei".
I think the causal effect estimated by OLS estimator should be higher
than those by Propensity Score Matching right?

But my results show that the causal effect estimated by OLS is smaller
than these by Propensity Score Matching. Why?

In addition, the causal effect estimated by OLS is ATT or ATE?

Thanks.

Lijun

---------- Original message ----------
From: n.j.cox@durham.ac.uk
To: statalist@hsphsun2.harvard.edu
Sent: Saturday, March 27, 2004 3:54:26 PM
Subject: st: RE: Date Data/Asserts

You identify the problems very clearly, and you
are aware of some useful tools here. However,
a variable so messy is not going to surrender
faced with just a single weapon of mess dissection.

Here's one way of proceeding.

I copied your data and argued as follows.

1. The value is most commonly the first word
of the string, but we want it as a number.

. gen fvalue = real(word(ferritin,1))
(140 missing values generated)

2. The date is most commonly the second word
of the string, interpreted as a (recent) daily date.

. gen fdate = date(word(ferritin,2),"mdy",2050)
(162 missing values generated)

3. I see lots of values with "U"s. I want to check
that when "U" occurs it occurs only as itself.

. assert ferritin == "U" if index(ferritin, "U")

Here -assert- says nothing. No news is good news,
i.e. Stata assents to the assertion. (In this
case, all "U"s are bad news, i.e. no information
at all.)

4. So I know all about the "U" problem. What values
are missing, but not because of "U"?

. list ferritin fvalue if mi(fvalue) & ferritin != "U"

+------------------------+
|      ferritin   fvalue |
|------------------------|
178. |        normal        . |
225. |                      . |
286. | >1400 3/24/99        . |
324. |   > 2K 1/7/02        . |
+------------------------+

You have to decide what to do about these.

5. What dates are missing, but not because of
these?

. list ferritin fdate if mi(fdate) & ferritin != "U"

+---------------------------+
|          ferritin   fdate |
|---------------------------|
66. |         5760 5/02       . |
70. |              4968       . |
98. |         4782 6/02       . |
117. |         1137 4/02       . |
121. |         3237 3/01       . |
|---------------------------|
126. |         1060 8/02       . |
148. | 1422 ng/ml 3/1998       . |
154. |         1014 8/01       . |
178. |            normal       . |
209. |  646 mg/ml 9/2000       . |
|---------------------------|
210. |              1938       . |
212. |               127       . |
225. |                         . |
278. |          129 6/00       . |
289. |              2000       . |
|---------------------------|
300. |              4995       . |
301. |         2748 9/02       . |
302. |        187 3/2000       . |
303. |               489       . |
305. |        3564 11/01       . |
|---------------------------|
307. |          862 1990       . |
324. |       > 2K 1/7/02       . |
333. |         3039 2/02       . |
338. |   1777 ng/ml 5/02       . |
352. |           40 3/02       . |
|---------------------------|
357. |               953       . |
+---------------------------+

Similarly, you have to decide what to do here. There
are, it seems, various kinds of problem:

* No date supplied.
* A year only supplied.
* A month and year only supplied.
* A day, month and year supplied.
* More than two words in the string.

In fact, we would have been better off
trying to extract the date from
-word(ferritin,-1)-, and that would
yield one further daily date.

You can use -assert- to test for truth
or falsity. Thus

. assert 42 == int(42)

asserts that 42 is an integer. More
usefully

. assert x == int(x)

asserts that -x- contains integer
values only, as only for integers
is -x- unchanged by applying the -int()-
function.

Similarly

. assert real(date(stringvar, "mdy", 2050)) < .

asserts that all values of stringvar can
be treated as daily dates. By itself -assert- has
very little syntax, and using it requires knowledge
of other parts of Stata.

Nick
n.j.cox@durham.ac.uk

Ward Hagar
>
> I am new to Stata and new to databases. I've been trying to
> get an access database "Stata ready". I can insheet it well
> and have "cleaned up" most of the variables. However, I've
> hit a snag with one variable named "Ferritin", reproduced
> below for those interested. (n= 362).
>
> My goal is to separate the value form the date.
>
> But note:
> 1. Most have a four digit value
> 2. Many have a date of the test in an inconsistent format
> 3. Many have "U" for unknown
> 4. One has "normal" for a value
> 5. A few have the ">" operator before the value
> 6. One has the units for the measurement
>
>
> I've tried the date() functions, split, and word() and all
> cause some other problem.
>
> This is important because the ferritin is the first of a list
> of similarly (mis)coded variables.
>
> Is there a Stata-esque approach to this, or am I left with
> line-by-line cleanup?
>
> Second question is whether ASSERT can be used to test whether
> a value is a date, integer, string, etc?
>
> Many thanks.
>
> . l ferritin
>
>      +-------------------+
>      |          ferritin |
>      |-------------------|
>   1. |       540 11/6/02 |
>   2. |                 U |

> 348. |                 U |
> 349. |                 U |
> 350. |       132 2/20/02 |
>      |-------------------|
> 351. |                 U |
> 352. |           40 3/02 |
> 353. |        97 3/29/02 |
> 354. |      6151 5/22/02 |
> 355. |                 U |
>      |-------------------|
> 356. |       1390 5/8/02 |
> 357. |               953 |
> 358. |         54 8/2/02 |
> 359. |      2779 8/12/02 |
> 360. |                 U |
>      |-------------------|
> 361. |                 U |
> 362. |                 U |
>      +-------------------+

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```