Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <[email protected]> |

To |
[email protected] |

Subject |
Re: st: latent factor model with stochastic gradient method/alternating least squares |

Date |
Wed, 6 Oct 2010 09:05:50 -0500 |

Looks like you are attacking the Netflix challenge. Haven't they closed it? ;) You can cast this as a -gllamm- problem, but with thousands of raters/items it won't be very fast. Otherwise, you can program this with -moptimize-, although you would have to figure out the appropriate scaling of your latents. On Tue, Oct 5, 2010 at 4:22 PM, Dimitriy V. Masterov <[email protected]> wrote: > Suppose I observe lots of consumer ratings (explicit or implicit) of > thousands of items, which may themselves be combinations of other > items. The rating matrix is pretty sparse because most consumers only > try/rate a few of the items. I would like to model how items ratings > are related to each other for the purpose of making recommendations of > new items to try. I am thinking of this as a missing rating problem. > The characteristics of the items are many and are not easily modelled > with fewer dimensions. > > My approach to this problem is to map consumers and items into a joint > latent factor space of dimension f, so that consumer-item interactions > are modeled as inner products in that space. Each item i is associated > with a vector q_i in R^f, which measures the extent to which that item > possesses the latent factors. The vector p_u in R^f measures the > interest of the consumer in each of the latent factors. > > I would like to model the rating for item i by consumer u as: > > r_ui = mu + b_u + b_i + b_u*b_i + q_i'*p_u, > > where mu is a constant which is the same for all products, b_u is a > user fixed effect, b_i is an item fixed effect, and the inner product > of q and p captures the consumer's overall interest in the item's > latent characteristics. The fixed effects are meant to capture the > idea that some items may be more popular and the fact that some users > may rate more harshly, and that these may interact. For example, a > popular item my be judged to be especially poor by a harsh critic. > > For a given f, I would like to find b_i, b_u, mu, and the vectors q_i > and p_u to minimize the sum of squared residuals: > > sum[(r_ui - mu - b_u - b_i - b_u*b_i - q_i'*p_i)^2] for all items and > users that are observed. > > I would like to use these parameters to estimate the rankings of > products that have not been sampled by some consumers. > > I believe it is possible to estimate these parameters with stochastic > gradient descent optimization or with alternating least squares. Does > anyone know if those methods are possible with Mata/Stata or if > there's a way to recast this problem in another way? > > Dimitriy Masterov > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: latent factor model with stochastic gradient method/alternating least squares***From:*"Dimitriy V. Masterov" <[email protected]>

**References**:**st: latent factor model with stochastic gradient method/alternating least squares***From:*"Dimitriy V. Masterov" <[email protected]>

- Prev by Date:
**st: I get the error-message "file not estimates r(610);" when using - mi estimate -** - Next by Date:
**st: Random seeder: -setrngseed- now available on SSC** - Previous by thread:
**st: latent factor model with stochastic gradient method/alternating least squares** - Next by thread:
**Re: st: latent factor model with stochastic gradient method/alternating least squares** - Index(es):