# st: Data Mining/Statistical Modeling Question

 From "dmka" To Subject st: Data Mining/Statistical Modeling Question Date Sat, 12 Mar 2005 18:51:59 -0600

```Hi Users:

I have historical monthly cash inflows at an account level from Jan 2001 to
date.  Each account as a monthly cash inflow. In some months the account has
no cash inflows, in other months it has. Each account has a
beginning balance (debt) . The monthly  cash inflows are payments in
relation to the beginning balance. There is additional attributes at account
level such  region,  age of account since the bankruptcy file date, where
the account has a payment plan or not,  whether an account is paid in full
or has  balance, whether an account is chapter 7 or 13 bankruptcy etc. This
data set has accounts that  have  positive beginning balance, zero beginning
balance and missing beginning balance. Furthermore, on the cash inflow side,
there are accounts that are paying off the beginning balances (hence have
outstanding positive balance), accounts that have fully paid (hence have
zero outstanding  balance),  accounts that overpaid (negative outstanding
balance) and accounts that have never paid anything (have the full beginning
balance).

I want to develop a model to predict monthly cash inflows. My first question
is as follows: since I have data from Jan 2001 to date, how do I get a
sample from these data. Do I pick the most recent month and
randomly pick accounts from this month, since I will be predicting monthly
cash inflows. Do I sample  the first two latest months, or the first three
latest month and take the mean monthly cash inflow? Do I sample the all
accounts from  Jan 2001  to present and use a mean monthly payment?  What I
want is to predict my cash inflows on a monthly basis. Since my dependent
variable is continuous,  I plan to do a multiple linear regression model.
However, I have one problem here, each account as a beginning balance and
with time , the beginning balance declines  and so does the cash inflow.
Eventually the account is paid
off and the balance is zero. Is the linear regression model appropriate?
Should my dependent variable be normalized by my beginning balance such that
I  would be predicting recovery rate? Should I normalize it by the balance
of the prior month?

Furthermore, should I use all the accounts in the pool? ie. those with zero
beginning balance, those with positive beginning balance, those with missing
beginning balance. Should I use only those accounts that have paid something
and ignore those that have overpaid and fully paid their balances?

In short , I have additional attributes at the time the account declared
itself bankrupt. Therafter, I have only their monthly payments and of course
their remaining balance data.

Thanks,

Doyle.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```