Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Sarah Edgington" <sedging@ucla.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Match two samples in stata |

Date |
Thu, 26 Jul 2012 15:26:18 -0700 |

Are you sure they matched the samples as opposed to just stratifying the models? If you want to run your models on some particular subset of size or industry categories, that's trivially easy. Just keep the observations that match the criteria you're interested in. Actually matching is more complicated and you're going to need a very clear definition of your exact matching rules to do it. Probably you'll have to decide what your rules for matching are and then write a dofile that implements them. How you do it depends a lot on what exactly you want to do. Just saying you want to match isn't specific enough to advise you. Some things to consider: Does each firm need a unique matching firm? If so, you will need to have at least as many firms of each size and industry in your non-issuing sample as you have in your bond issuing sample. Do matches have to be exact or are you looking for firms that are close to the same size or from a similar industry? If you require that the matches be exact that's probably easier to develop the rules for, except you have to think carefully about what the implications of not having a match are. You haven't provided any information about what these variables actually look like. Is size continuous or categorical? (if it's continuous you probably won't be able to match on exact size and might want to consider categorizing it first). How many different industries do you have? Do you have a lot of sizeXindustry cross categorizations that have a small number of firms in one or the other of your samples? I'm going to assume for a moment that you want exact matches between the two samples, that each firm will be matched to one unique firm in the other sample, and that you've decided to just drop any firms you can't generate matches for. Do not mistake this for a recommendation that you choose these rules. It's just an example. You have to decide what rules you want to use. One way to do such a match would be to generate two data files. One contains firms from sample A and one contains firms from sample B. If you don't care about anything other than matching on firm size and industry you could just do a many to many merge between the two files using size and industry as your match variables. This will result in a single line of data for each match. What I would do is something like this: Create a file that is just firm ID, industry, and size for sample a. Do the same for sample b, but give the firm ID variable a different name in the sample b file. Do a many to many merge. Keep only the matches. Then gen match_id=_n You would then have a data set that gives you a unique identifier for each match and the id numbers of the two firms that match. You can then write some code that matches this unique match identifier back to your files using the firm IDs for each sample. Then you could append them together. I'm not sure what you'll do with it then because you haven't specified what you want your final analysis file to look like. Presumably the theory is that you rerun your model using only observations that had matches, though I'm still at a loss about why you would do that. I'm still not convinced this is a good idea at all. But that's one way to produce one particular kind of match. At a certain point you're going to have to actually do the work of clearly defining what you want in very precise terms to actually be able to figure out exactly what you want to tell Stata to do. -Sarah -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of KASHEFIPOUR E. Sent: Thursday, July 26, 2012 2:46 PM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Match two samples in stata Dear Sarah, I am following some other papers who matched their samples based on size and industry. I totally agree with you that if I match the two samples I won't be able to test the impact of size and industry on the probability of issuing bonds. However, I would like to use the matched comparison as a robustness check like other papers. Please let me know if you know any command? I appreciate your help. Best, Eln -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu on behalf of Sarah Edgington Sent: Thu 26/07/2012 21:38 To: statalist@hsphsun2.harvard.edu Subject: RE: st: Match two samples in stata I don't understand why you need a comparison group. If all you want to do is predict which companies issue bonds, you just need to run the regression. You'll want to control for size and industry in that model. Let's assume for a moment that you're just regressing whether a bond was issued on size and industry. If you constrain the sample of non-issuers to have the same distribution of those variables as the sample of issuers how does that help you? You won't be able to determine whether companies in certain industries were more likely to issue bonds or whether large companies were more likely to issue bonds because you've already constrained the distributions of those two variables to be equal across your two outcome categories. Why would you want to do that? -Sarah -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of KASHEFIPOUR E. Sent: Thursday, July 26, 2012 1:19 PM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Match two samples in stata Hi Sarah, Many thanks for your great help. Actually, I am trying to define a comparison group for a particular sample who issues bond. In particular, I want to match non-issuers (comparison sample) with issuers (test sample) based on size and industry to find the probability of issuing bonds. Simply, I want to run a logit model to find the probability of issuing bonds, but need to define a comparison group before running the regression. If this is clear, please let me know your suggestion and which command would be useful to define a comparison sample. I do not need to use PSM. Best, Eln -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu on behalf of Sarah Edgington Sent: Thu 26/07/2012 19:33 To: statalist@hsphsun2.harvard.edu Subject: RE: st: Match two samples in stata Eln, I think you need to define what you mean by "matching" in this context. As you've noted, you've received some answers about propensity score matching in particular. That's one strategy for matching samples. It's not the only one, though. The problem seems to be that you haven't made it clear what your end goal is. Are you trying to define a comparison group for a particular sample? How to do the matching really depends on what you're going to do with the information afterward. From some of your previous posts it sounds like whether a company issued a bond is the outcome you're interested in studying. If that's the case you don't want to match the bond issuers with non-bond issuers, you want to model the process. Probably by running a logit (or probit) model as suggested in previous threads. In that case you would be controlling for size and industry but there's no matching involved. If you define your research question and what strategy you want to use to answer it this list may be able to help you implement that strategy. But so far you haven't done that, and you can't really expect people to magically know what kind of matching you want to do or what your data needs to look like to meet your goals. I could certainly tell you how to write code that would take your sample A and select from sample B the first firm of the same size and industry. That's unlikely to actually be useful for the vast majority of research questions and if whether a company issues bonds is the outcome you want to investigate then it's almost certainly the wrong strategy completely. Nonetheless, it can be easily enough done. You'd still have to deal questions like: what do you do when a firm in sample A doesn't have an exact match in sample B? That complication is one of the reasons strategies like propensity score matching tend to be popular. If you're unclear about what your question is and what kind of analysis you want to do you will probably benefit a great deal from figuring that out before you tackle the question of how to write the relevant Stata code. -Sarah -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of KASHEFIPOUR E. Sent: Thursday, July 26, 2012 8:44 AM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Match two samples in stata Hi Ronnie, The provided answers were for Propensity Score Matching, but now I am looking at very simple matching. I appreciate if you could forward any previous answer related to this in the case if it was ignored. Best -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu on behalf of Ronnie Babigumira Sent: Thu 26/07/2012 16:29 To: statalist@hsphsun2.harvard.edu Subject: Re: st: Match two samples in stata Eln, Others have provided useful answers to you questions about matching yet somehow I feel that you have ignored these answers. Did you for example look at the references Adam shared? Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22, 31-72. Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25, 1-21. Ronnie -- 010100100110111101101110011011100110100101100101 On Thursday, July 26, 2012 at 6:22 PM, KASHEFIPOUR E. wrote: > Hi all, > > I have two samples and want to match them with their size and > industry. Is there anyway to do it in stata? > > Best, > Eln > > > Attachments: > - winmail.dat > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Match two samples in stata***From:*"KASHEFIPOUR E." <E.KashefiPour@swansea.ac.uk>

**References**:**RE: st: Propensity score matching in stata***From:*"KASHEFIPOUR E." <E.KashefiPour@swansea.ac.uk>

**st: Match two samples in stata***From:*"KASHEFIPOUR E." <E.KashefiPour@swansea.ac.uk>

**Re: st: Match two samples in stata***From:*Ronnie Babigumira <rb.glists@gmail.com>

**RE: st: Match two samples in stata***From:*"KASHEFIPOUR E." <E.KashefiPour@swansea.ac.uk>

**RE: st: Match two samples in stata***From:*"Sarah Edgington" <sedging@ucla.edu>

**RE: st: Match two samples in stata***From:*"KASHEFIPOUR E." <E.KashefiPour@swansea.ac.uk>

**RE: st: Match two samples in stata***From:*"Sarah Edgington" <sedging@ucla.edu>

**RE: st: Match two samples in stata***From:*"KASHEFIPOUR E." <E.KashefiPour@swansea.ac.uk>

- Prev by Date:
**RE: st: Match two samples in stata** - Next by Date:
**Re: st: Match two samples in stata** - Previous by thread:
**RE: st: Match two samples in stata** - Next by thread:
**RE: st: Match two samples in stata** - Index(es):