Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Merging annual data and monthly data few months after year end
From
"Abdalla, Ahmed" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: Merging annual data and monthly data few months after year end
Date
Mon, 20 Jan 2014 13:22:34 +0000
Dear statalist
Though I was able to merge and tsset my data, i thought (as Nick suggested) to check carefully my duplicates.
I found something very surprising:
duplicates tag permno yr mth, gen(isdup)
duplicates tag, gen(dup)
These commands creates tags for 90% of my data observations that are not really duplicates at all (I can see that they are not duplicates). For example, I get:
Permno date yr mth price isdup dup
12005 30dec1988 1988 12 10.75 4288 4288
12005 31jan1989 1989 1 10.5 4288 4288
12005 28feb1989 1989 2 10.25 4288 4288
Why does this occur?
________________________________________
From: [email protected] <[email protected]> on behalf of Nick Cox <[email protected]>
Sent: 19 January 2014 11:56
To: [email protected]
Subject: Re: st: Merging annual data and monthly data few months after year end
There is no formal recipe or solution here. You need to inspect the
duplicates and try to see why they occur. The answers could be
anything from repetitions of identical observations in the original
data entry, through some simple error in data entry, to a logic error
in what you are trying to do. The answer lies in the original data
files you used, which we can't see.
Nick
[email protected]
On 19 January 2014 01:46, Abdalla, Ahmed <[email protected]> wrote:
> I get the first error message when I merge, and when I skip the merge and run the rest of the code, I get the second error message when I tsset.
> I think the duplicates arise because some firms change their fiscal year end from a year to another. However, this should not be a problem because I sort my data using a time variable of yr and mth, and the month will still differ even if the year is the same!
> Is the code correct? do you think I can do something more? The " duplicates list " indicates that I have duplicates but even when I drop them -duplicates drop- I still get the same error messages as mentioned above when I run the code!
>
> what do you think ?
>
>
>
> ________________________________________
> From: [email protected] <[email protected]> on behalf of Nick Cox <[email protected]>
> Sent: 19 January 2014 01:39
> To: [email protected]
> Subject: Re: st: Merging annual data and monthly data few months after year end
>
> As I wrote in an earlier post
>
> "In running a block of code, Stata should stop at the first error, so
> where does that occur precisely?"
>
> A further key point seems that to be that you have duplicates in the
> -merge-d dataset that is what you need to look at carefully. Checking
> that no duplicates occur in the datasets to be merged is not
> sufficient.
>
> Also, you need to be trying to understand why duplicates are occurring
> in the individual datasets, and whether they are duplicates on all
> variables. Otherwise you may be discarding important information.
> Nick
> [email protected]
>
>
> On 19 January 2014 01:30, Abdalla, Ahmed <[email protected]> wrote:
>> I read it again carefully and edited my code accordingly (below). Unfortunately, I get the same two error messages!!
>> Can you help? Thanks in advance.
>>
>> use ccm.dta,clear
>> format datadate %d
>> * yr and mth were not reported in the original data in both datasets, however I wrote them in my example above*
>> gen yr=year(datadate)
>> gen mth=month(datadate)
>> rename lpermno permno
>> drop if permno==.
>> duplicates tag permno yr mth, gen(isdup)
>> edit if isdup
>> drop isdup
>> sort permno yr mth
>> save ccm_2.dta,replace
>>
>> use crsp.dta,clear
>> format date %d
>> gen yr=year(date)
>> gen mth=month(date)
>> drop if permno==.
>> duplicates tag permno yr mth, gen(isdup)
>> edit if isdup
>> drop isdup
>> sort permno yr mth
>>
>> merge 1:1 permno yr mth using ccm_2.dta
>> gen firmid= group(permno)
>> gen timeid = ym(yr, mth)
>> tsset firmid timeid
>> gen prc3=f3.prc
>> keep if _merge==3
>>
>>
>>
>>
>> ________________________________________
>> From: [email protected] <[email protected]> on behalf of Nick Cox <[email protected]>
>> Sent: 19 January 2014 01:02
>> To: [email protected]
>> Subject: Re: st: Merging annual data and monthly data few months after year end
>>
>> The FAQ does explain: given a report of repeated time values, you can
>> use -duplicates- to see what they are.
>>
>> I too spent much time, writing -duplicates- and writing about it to
>> try to make this easy for people.
>> Nick
>> [email protected]
>>
>>
>> On 19 January 2014 00:55, Abdalla, Ahmed <[email protected]> wrote:
>>> Thanks Nick, I followed your suggestions and run my code in pieces to see what error messages I get:
>>> 1- I get the error message : variables permno yr mth do not uniquely identify observations in the master data, when I run the merge command
>>> 2- Also, repeated time values within panel , when I try to -tsset- my panel !!
>>>
>>> I also used gen timeid = ym(yr, mth) as you suggests , but still get same error messages!
>>> I read the FAQ but I really can't figure out how to debug the code. Do you have another code or any other suggestions?
>>> I spent much time in this and can't really fix it!
>>>
>>> Thanks
>>>
>>>
>>>
>>> ________________________________________
>>> From: [email protected] <[email protected]> on behalf of Nick Cox <[email protected]>
>>> Sent: 18 January 2014 11:25
>>> To: [email protected]
>>> Subject: Re: st: Merging annual data and monthly data few months after year end
>>>
>>> Thanks for the extra details. The -merge- shouldn't produce duplicates
>>> on -permno-, -yr- and -mth- because the whole point is to match
>>> observations on those variables.
>>>
>>> In running a block of code, Stata should stop at the first error, so
>>> where does that occur precisely? The -tsset-? Or before?
>>>
>>> Note that
>>>
>>> egen timeid = group(yr mth)
>>>
>>> won't work well with gaps and in any case is much less direct than
>>>
>>> gen timeid = ym(yr, mth)
>>>
>>> which will produce monthly dates.
>>>
>>> There is generic advice in the FAQ
>>>
>>> How do I deal with a report of repeated time values within panel?
>>>
>>> Dealing with reports of repeated time values within panel
>>>
>>> http://www.stata.com/support/faqs/data-management/repeated-time-values/index.html
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 18 January 2014 02:36, Abdalla, Ahmed <[email protected]> wrote:
>>>> I get an error message that variable does not uniquely identify observations. I understand that the problem might be because in the ccm.dta (annual data) each observation id (permno) has accounting data for one month for each fiscal year, which is the month at which the fiscal year ends. In the other data set, crsp.dta , each observation id (permno) has stock price data for the 12 months in each year. I need to link the annual accounting data at the fiscal year end (in the month the fiscal year ends) to the stock price at the same months and three months thereafter.
>>>> I read thread you mentioned before posting my question, but still can't figure out how to solve my problem?
>>>> Any suggestions will be much appreciated!
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: [email protected] <[email protected]> on behalf of Nick Cox <[email protected]>
>>>> Sent: 17 January 2014 20:43
>>>> To: [email protected]
>>>> Subject: Re: st: Merging annual data and monthly data few months after year end
>>>>
>>>> "doesn't work" is not informative.
>>>>
>>>> However, the recent thread
>>>> http://www.stata.com/statalist/archive/2014-01/msg00589.html raised a
>>>> similar-sounding problem.
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 17 January 2014 18:35, Abdalla, Ahmed <[email protected]> wrote:
>>>>> Dear Statalist
>>>>> I have two databases:
>>>>> a) ccm.dta that includes "annual" accounting data at fiscal year end for a set of firms over 20 years (however, unbalanced).
>>>>>
>>>>> Permno (firm identifier) datadate yr mth x1 x2
>>>>>
>>>>> 10001 30june1988 1988 6
>>>>> 10001 30june1989 1989 6
>>>>> 10001 31december1990 1990 12
>>>>> 10002 30april1988 1988 4
>>>>> 10002 30april1989 1989 4
>>>>> 10002 31may 1990 5
>>>>>
>>>>> Note that a company may change its fiscal year end. A company that reports accounting data at the end of June this year may report data at December in the next year (or sometimes at the same year).
>>>>>
>>>>> b) crsp.dta that includes "monthly"stock prices for the same set of firms over the same period (however, unbalanced)
>>>>>
>>>>> Permno (firm identifier) datadate yr mth x3(stockprice)
>>>>> 10001 29january1988 1988 1
>>>>> 10001 29february1988 1988 2
>>>>> 10001 31march1988 1988 3
>>>>> 10001 29april1988 1988 4
>>>>> 10001 31may1988 1988 5
>>>>> 10001 30june1988 1988 6
>>>>> 10001 31july1988 1988 7
>>>>> .... ....
>>>>> .... ...31dec1988 1988 12
>>>>>
>>>>> 10001 29january1989 1989 1
>>>>> 10001 29february1989 1989 2
>>>>> 10001 31march1989 1989 3
>>>>> 10001 29april1989 1989 4
>>>>> 10001 31may1989 1989 5
>>>>> 10001 30june1989 1989 6
>>>>> 10001 31july1989 1989 7
>>>>> .... ....
>>>>> .... ...31dec1989 1989 8
>>>>>
>>>>> 10002 29january1988 1988 1
>>>>> 10002 29february1988 1988 2
>>>>> 10002 31march1988 1988 3
>>>>> 10002 29april1988 1988 4
>>>>> 10002 31may1988 1988 5
>>>>> 10002 30june1988 1988 6
>>>>> 10002 31july1988 1988 7
>>>>> .... ....
>>>>> .... ...31dec1989 1988 12
>>>>>
>>>>>
>>>>> I want to merge both data sets and link annual accounting data in ccm.dta to stock prices reported at the end of fiscal year and three months after the end of the fiscal year. Here is the code I used:
>>>>>
>>>>> use ccm.dta
>>>>> duplicates drop
>>>>> format datadate %d
>>>>> * yr and mth were not reported in the original data in both datasets, however I wrote them in my example above*
>>>>> gen yr=year(datadate)
>>>>> gen mth=month(datadate)
>>>>> drop if permno==.
>>>>> sort permno yr mth
>>>>> save ccm_2.dta
>>>>>
>>>>> use crsp.dta,clear
>>>>> duplicates drop
>>>>> format date %d
>>>>> gen yr=year(date)
>>>>> gen mth=month(date)
>>>>> drop if permno==.
>>>>> sort permno yr mth
>>>>> merge 1:1 permno yr mth using ccm_2.dta
>>>>> egen firmid= group(permno)
>>>>> egen timeid=group(yr mth)
>>>>> tsset firmid timeid
>>>>> gen prc3=f3.prc
>>>>> keep if _merge==3
>>>>>
>>>>>
>>>>> The code didn't work and gives error messages that I couldn't fix and especially when I "tsset" my panel.
>>>>>
>>>>> Would you please suggest a proper code to merge these two datasets and link x1 x2 (annual data) to x3 (stock prices) at fiscal year end and three months after fiscal year end ?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/