Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Match two samples in stata

From   "Sarah Edgington" <[email protected]>
To   <[email protected]>
Subject   RE: st: Match two samples in stata
Date   Thu, 26 Jul 2012 15:26:18 -0700

Are you sure they matched the samples as opposed to just stratifying the
models?  If you want to run your models on some particular subset of size or
industry categories, that's trivially easy.  Just keep the observations that
match the criteria you're interested in.

Actually matching is more complicated and you're going to need a very clear
definition of your exact matching rules to do it.

Probably you'll have to decide what your rules for matching are and then
write a dofile that implements them.  How you do it depends a lot on what
exactly you want to do.  Just saying you want to match isn't specific enough
to advise you.

Some things to consider:
Does each firm need a unique matching firm?  If so, you will need to have at
least as many firms of each size and industry in your non-issuing sample as
you have in your bond issuing sample.  
Do matches have to be exact or are you looking for firms that are close to
the same size or from a similar industry?  If you require that the matches
be exact that's probably easier to develop the rules for, except you have to
think carefully about what the implications of not having a match are.

You haven't provided any information about what these variables actually
look like.  Is size continuous or categorical? (if it's continuous you
probably won't be able to match on exact size and might want to consider
categorizing it first).  How many different industries do you have?  Do you
have a lot of sizeXindustry cross categorizations that have a small number
of firms in one or the other of your samples?

I'm going to assume for a moment that you want exact matches between the two
samples, that each firm will be matched to one unique firm in the other
sample, and that you've decided to just drop any firms you can't generate
matches for.  Do not mistake this for a recommendation that you choose these
rules.  It's just an example.  You have to decide what rules you want to
One way to do such a match would be to generate two data files.  One
contains firms from sample A and one contains firms from sample B.
If you don't care about anything other than matching on firm size and
industry you could just do a many to many merge between the two files using
size and industry as your match variables.  This will result in a single
line of data for each match.  What I would do is something like this:

Create a file that is just firm ID, industry, and size for sample a.  Do the
same for sample b, but give the firm ID variable a different name in the
sample b file.
Do a many to many merge.  Keep only the matches.
Then gen match_id=_n

You would then have a data set that gives you a unique identifier for each
match and the id numbers of the two firms that match.  You can then write
some code that matches this unique match identifier back to your files using
the firm IDs for each sample.  Then you could append them together.  I'm not
sure what you'll do with it then because you haven't specified what you want
your final analysis file to look like.  Presumably the theory is that you
rerun your model using only observations that had matches, though I'm still
at a loss about why you would do that.

I'm still not convinced this is a good idea at all.  But that's one way to
produce one particular kind of match.  At a certain point you're going to
have to actually do the work of clearly defining what you want in very
precise terms to actually be able to figure out exactly what you want to
tell Stata to do.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of KASHEFIPOUR E.
Sent: Thursday, July 26, 2012 2:46 PM
To: [email protected]
Subject: RE: st: Match two samples in stata

Dear Sarah,

I am following some other papers who matched their samples based on size and
industry. I totally agree with you that if I match the two samples I won't
be able to test the impact of size and industry on the probability of
issuing bonds. However, I would like to use the matched comparison as a
robustness check like other papers. Please let me know if you know any
command? I appreciate your help.


-----Original Message-----
From: [email protected] on behalf of Sarah Edgington
Sent: Thu 26/07/2012 21:38
To: [email protected]
Subject: RE: st: Match two samples in stata
I don't understand why you need a comparison group.  
If all you want to do is predict which companies issue bonds, you just need
to run the regression.  You'll want to control for size and industry in that

Let's assume for a moment that you're just regressing whether a bond was
issued on size and industry.  If you constrain the sample of non-issuers to
have the same distribution of those variables as the sample of issuers how
does that help you?  You won't be able to determine whether companies in
certain industries were more likely to issue bonds or whether large
companies were more likely to issue bonds because you've already constrained
the distributions of those two variables to be equal across your two outcome
categories.  Why would you want to do that?


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of KASHEFIPOUR E.
Sent: Thursday, July 26, 2012 1:19 PM
To: [email protected]
Subject: RE: st: Match two samples in stata

Hi Sarah,

Many thanks for your great help.

Actually, I am trying to define a comparison group for a particular sample
who issues bond. In particular, I want to match non-issuers (comparison
sample) with issuers (test sample) based on size and industry to find the
probability of issuing bonds. Simply, I want to run a logit model to find
the probability of issuing bonds, but need to define a comparison group
before running the regression. If this is clear, please let me know your
suggestion and which command would be useful to define a comparison sample.
I do not need to use PSM. 


-----Original Message-----
From: [email protected] on behalf of Sarah Edgington
Sent: Thu 26/07/2012 19:33
To: [email protected]
Subject: RE: st: Match two samples in stata
I think you need to define what you mean by "matching" in this context.  As
you've noted, you've received some answers about propensity score matching
in particular.  That's one strategy for matching samples.  It's not the only
one, though.
The problem seems to be that you haven't made it clear what your end goal
is.  Are you trying to define a comparison group for a particular sample?  
How to do the matching really depends on what you're going to do with the
information afterward.  From some of your previous posts it sounds like
whether a company issued a bond is the outcome you're interested in
studying.  If that's the case you don't want to match the bond issuers with
non-bond issuers, you want to model the process.  Probably by running a
logit (or probit) model as suggested in previous threads.  In that case you
would be controlling for size and industry but there's no matching involved.

If you define your research question and what strategy you want to use to
answer it this list may be able to help you implement that strategy.  But so
far you haven't done that, and you can't really expect people to magically
know what kind of matching you want to do or what your data needs to look
like to meet your goals.

I could certainly tell you how to write code that would take your sample A
and select from sample B the first firm of the same size and industry.
That's unlikely to actually be useful for the vast majority of research
questions and if whether a company issues bonds is the outcome you want to
investigate then it's almost certainly the wrong strategy completely.
Nonetheless, it can be easily enough done.  You'd still have to deal
questions like:  what do you do when a firm in sample A doesn't have an
exact match in sample B?  That complication is one of the reasons strategies
like propensity score matching tend to be popular.

If you're unclear about what your question is and what kind of analysis you
want to do you will probably benefit a great deal from figuring that out
before you tackle the question of how to write the relevant Stata code.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of KASHEFIPOUR E.
Sent: Thursday, July 26, 2012 8:44 AM
To: [email protected]
Subject: RE: st: Match two samples in stata

Hi Ronnie,

The provided answers were for Propensity Score Matching, but now I am
looking at very simple matching. I appreciate if you could forward any
previous answer related to this in the case if it was ignored.


-----Original Message-----
From: [email protected] on behalf of Ronnie Babigumira
Sent: Thu 26/07/2012 16:29
To: [email protected]
Subject: Re: st: Match two samples in stata

Others have provided useful answers to you questions about matching yet
somehow I feel that you have ignored these answers. Did you for example look
at the references Adam shared? 

Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the
implementation of propensity score matching. Journal of Economic Surveys,
22, 31-72.

Stuart, E. A. (2010). Matching methods for causal inference: A review and a
look forward. Statistical Science, 25, 1-21.



On Thursday, July 26, 2012 at 6:22 PM, KASHEFIPOUR E. wrote:

> Hi all,
> I have two samples and want to match them with their size and 
> industry. Is
there anyway to do it in stata?
> Best,
> Eln
> Attachments: 
> - winmail.dat

*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index