[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"dmka" <[email protected]> |

To |
<[email protected]> |

Subject |
st: Data Mining/Statistical Modeling Question |

Date |
Sat, 12 Mar 2005 18:51:59 -0600 |

Hi Users: I have historical monthly cash inflows at an account level from Jan 2001 to date. Each account as a monthly cash inflow. In some months the account has no cash inflows, in other months it has. Each account has a beginning balance (debt) . The monthly cash inflows are payments in relation to the beginning balance. There is additional attributes at account level such region, age of account since the bankruptcy file date, where the account has a payment plan or not, whether an account is paid in full or has balance, whether an account is chapter 7 or 13 bankruptcy etc. This data set has accounts that have positive beginning balance, zero beginning balance and missing beginning balance. Furthermore, on the cash inflow side, there are accounts that are paying off the beginning balances (hence have outstanding positive balance), accounts that have fully paid (hence have zero outstanding balance), accounts that overpaid (negative outstanding balance) and accounts that have never paid anything (have the full beginning balance). I want to develop a model to predict monthly cash inflows. My first question is as follows: since I have data from Jan 2001 to date, how do I get a sample from these data. Do I pick the most recent month and randomly pick accounts from this month, since I will be predicting monthly cash inflows. Do I sample the first two latest months, or the first three latest month and take the mean monthly cash inflow? Do I sample the all accounts from Jan 2001 to present and use a mean monthly payment? What I want is to predict my cash inflows on a monthly basis. Since my dependent variable is continuous, I plan to do a multiple linear regression model. However, I have one problem here, each account as a beginning balance and with time , the beginning balance declines and so does the cash inflow. Eventually the account is paid off and the balance is zero. Is the linear regression model appropriate? Should my dependent variable be normalized by my beginning balance such that I would be predicting recovery rate? Should I normalize it by the balance of the prior month? Furthermore, should I use all the accounts in the pool? ie. those with zero beginning balance, those with positive beginning balance, those with missing beginning balance. Should I use only those accounts that have paid something and ignore those that have overpaid and fully paid their balances? In short , I have additional attributes at the time the account declared itself bankrupt. Therafter, I have only their monthly payments and of course their remaining balance data. Thanks, Doyle. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: gnbreb first differences** - Next by Date:
**st: Why estimation in first step of two-step Heckman model is different from Probit model for selection** - Previous by thread:
**st: gnbreb first differences** - Next by thread:
**st: Why estimation in first step of two-step Heckman model is different from Probit model for selection** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |