Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Dimitriy V. Masterov" <dvmaster@gmail.com> |

To |
Statalist <statalist@hsphsun2.harvard.edu> |

Subject |
st: latent factor model with stochastic gradient method/alternating least squares |

Date |
Tue, 5 Oct 2010 17:22:45 -0400 |

Suppose I observe lots of consumer ratings (explicit or implicit) of thousands of items, which may themselves be combinations of other items. The rating matrix is pretty sparse because most consumers only try/rate a few of the items. I would like to model how items ratings are related to each other for the purpose of making recommendations of new items to try. I am thinking of this as a missing rating problem. The characteristics of the items are many and are not easily modelled with fewer dimensions. My approach to this problem is to map consumers and items into a joint latent factor space of dimension f, so that consumer-item interactions are modeled as inner products in that space. Each item i is associated with a vector q_i in R^f, which measures the extent to which that item possesses the latent factors. The vector p_u in R^f measures the interest of the consumer in each of the latent factors. I would like to model the rating for item i by consumer u as: r_ui = mu + b_u + b_i + b_u*b_i + q_i'*p_u, where mu is a constant which is the same for all products, b_u is a user fixed effect, b_i is an item fixed effect, and the inner product of q and p captures the consumer's overall interest in the item's latent characteristics. The fixed effects are meant to capture the idea that some items may be more popular and the fact that some users may rate more harshly, and that these may interact. For example, a popular item my be judged to be especially poor by a harsh critic. For a given f, I would like to find b_i, b_u, mu, and the vectors q_i and p_u to minimize the sum of squared residuals: sum[(r_ui - mu - b_u - b_i - b_u*b_i - q_i'*p_i)^2] for all items and users that are observed. I would like to use these parameters to estimate the rankings of products that have not been sampled by some consumers. I believe it is possible to estimate these parameters with stochastic gradient descent optimization or with alternating least squares. Does anyone know if those methods are possible with Mata/Stata or if there's a way to recast this problem in another way? Dimitriy Masterov * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: latent factor model with stochastic gradient method/alternating least squares***From:*Stas Kolenikov <skolenik@gmail.com>

- Prev by Date:
**Re: st: Panel UnitRoot Test** - Next by Date:
**st: reverse vincenty** - Previous by thread:
**st: New version of -bspline- on SSC** - Next by thread:
**Re: st: latent factor model with stochastic gradient method/alternating least squares** - Index(es):