Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Fwd: tscollap query


From   Christopher F Baum <[email protected]>
To   [email protected]
Subject   st: Fwd: tscollap query
Date   Tue, 27 Jan 2004 07:03:29 -0500



Begin forwarded message:

From: Restrepo J <[email protected]>
Date: January 27, 2004 6:57:53 AM EST
To: Christopher F Baum <[email protected]>
Subject: RE: tscollap query

This is great! Thank you very much for your help and directions.
Jorge

-----Original Message-----
From: Christopher F Baum [mailto:[email protected]]
Sent: 27 January 2004 02:09
To: Restrepo J
Subject: Re: tscollap query

Dear Jorge

Thanks for the clarifying information. I don't think there's anything fancy
needed here. Say that your 19,000 records of raw data are coded with the
calendar day of the event and the location. Then create three new Stata date
variables: the week in which the event took place, the month, the year.
Those are all simple applications of Stata's date functions. Then you can
readily collapse the raw data on any one of those indicators, as you say,
independently of the location (for the whole country). That can be done with
just the collapse command. If you wanted to generate a similar
lower-frequency time series for each region, you could collapse on both the
date variable and the region indicator. So it is not really an issue of
reducing the frequency of a variable (which is what tscollap does--changing
monthly inflation to quarterly or annual, let's say) but rather grouping the
raw data and averaging them (or summing them, or counting them) over a lower
data frequency -- really a matter of "binning", but binning by calendar date
intervals. Similar to the kind of work that people do with hospital-stay
data, which can be aggregated over time, regions, etc.

Best wishes
Kit

On Jan 26, 2004, at 3:52 PM, Restrepo J wrote:

Thank you very much for your prompt reply! Maybe a short explanation
of my dataset would clarify my question.

This is a crime dataset that I collected with other colleagues over
the last year. It includes more than 19 thousand events for which we
have around 20 variables, including the day in which the event took
place and the location.
I would need to associate each event with a time period and a spatial
code, which I plan to do with -tabulate-. What I really want then is
to generate a panel from those events for n regions and t months (or,
say, days).

My problem is the degree of "aggregation" for each one of these
dimensions.
Regarding time, I need to reduce the frequency of variables using sum
and count in order to obtain, dayly, weekly, monthly, and yearly
variables independently of the location (for the whole country). Later
I can generate panels by region and month, say.

Would then it be possible to use tscollap or a modification of it to
generate daily and monthly variables from the initial time code in
each event? Is there another command to "generate" panels from event data?

Many thanks.

Jorge A Restrepo

-----Original Message-----
From: Kit Baum [mailto:[email protected]]
Sent: 26 January 2004 17:42
To: Restrepo J
Cc: [email protected]
Subject: Re: tscollap query

Dear Jorge

tscollap is just a convenience, "hard wiring" various features of
-collapse- to take advantage of what we know about calendar-time data.
Since -collapse- can generate arbitrary subsets of a dataset, where
you specify exactly how the data are to be collapsed, I should think
that you can use -collapse- to deal with your event-oriented data.
I'm not sure how, though, you plan to "aggregate the variables on each
event according to different time
periodicity (days, weeks, months, quarter, years) and/or by spatial
clusters
(both defined by the user and statistically generated)". It seems to
me that an observation in the result data
would either be identified by a time period, or by an event, but could
not very well be associated with both; and how would you deal with a
result dataset in which the observations either belong to a time
period or to a specific event?

Best wishes
Kit

On Jan 26, 2004, at 12:24 PM, Restrepo J wrote:

Dear Professor Baum:

I am taking the liberty of writing you regarding the program you
wrote in Stata for collapsing time series. Nick Cox pointed to it in
statalist after my query in Statalist:

I am working with a relatively large data set organised by events
(19.000
in
total). Each event has time and spatial descriptors and several
variables.

My query is: Is there a command-routine-program for Stata to
aggregate the variables on each event according to different time
periodicity (days, weeks, months, quarter, years) and/or by spatial
clusters
(both defined by the user and statistically generated)? I would
appreciate any kind of guidance on this question.

It seems that your program in combination with the command Collapse
would allow me to do most of what I need, as I would be able to
aggregate from monthly to lower frequencies. The problem is that I
have my original data in "event" form, i.e. a collection of daily
events that I would need also to collapse to daily and monthly
frequencies. Do you know of a way of doing this? Is it possible to do
this with your programme or with a variation of it?

Best regards,


Jorge Alberto Restrepo
____________________________
Department of Economics
Royal Holloway-University of London
Egham Hill, Egham, Surrey
TW20 0EX, United Kingdom



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index