Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: logistic regression to assume equal number of observations


From   "Delahanty, Ryan" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: logistic regression to assume equal number of observations
Date   Tue, 19 Jul 2011 05:33:11 -0500

I have a case-control (cc) dataset of some 10,000 people (5,000 cases and 5,000 controls).
Each person has experienced up to 10,000 possible events. 
Each event can have a state 0-4. 
In addition, these people were collected at one of two different sites.
Some people have experienced more events than others (total_events).

Imagine the data takes this form:
Person		cc	event	event_state	site	total_events
1		1	1	1		1	3
1		1	10	2		1	3
1		1	100	1		1	3
2		0	10	4		0	2
2		0	1000	3		0	2
3		1	100	1		1	1

Unique lines do not exist for all person-event combinations, just for instances where a person experienced the event (i.e. event_state!=0). I'm trying to carry out a regression of the using -logistic- (Stata/SE 11.0 Win 64-bit) using the following form in a do-file:
xi, prefix(): logistic cc  i.event_state site total_events if event==1
xi, prefix(): logistic cc  i.event_state site total_events if event==2
...
xi, prefix(): logistic cc  i.event_state site total_events if event==10,000

For the most common events, I get a number of observations near 10,000 (i.e. everyone has experienced the event), but for most regressions my numbers will be between 2 and 10,000. My problem is that because having 10,000 entries for each possible individual-event combination would be prohibitively large (100M lines), I need to find an alternate way to run the above regression that will give me an identical result to what I would get if I ran a file of the size described, so that each regression has 10,000 observations. So for person 2 above, even though the event_state for event 1 would be 0, I want them included in the regression above (as with all people who would have event states of 0). How can I do this without including lines for all instances of event_state=0? Ideas about restructuring the data are also welcome so long as the equivalent regression could be run and the file does not become prohibitively large.

Sorry I'm not familiar with a common dataset to help better illustrate this point. Any help is appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index