Blog: Machine Learning Equations by Saurabh Verma

Blog: Bayesian Sets.

• Tutorials

Bayesian Sets Graphical Model:

Bayesian sets are simple graphical models used to expand a set. For instance, suppose you are given a set of few words (or items) which we refer as “seeds” and we wish to expand this set to include all similar words from the given text corpus. Then, we can employ Bayesian sets which rank each item based on its importance of belonging to seed set.

Let be a data set of items, and be an item from this set. Assume the user provides a query set which is a small subset of , bayesian sets rank each item by . This probability ratio can be interpreted as the ratio of the joint probability of observing and to the probability of independently observing and .

Assume that the parameterized model is where are the parameters as shown in figure above. Here, we assume that is represented by a binary feature vector and is the weight associated with feature . Then,

For each (Note: vector component is and bold letters are vectors):

The conjugate prior for the parameters of a Bernoulli distribution is the Beta distribution:

The log of the score is linear in features of :

Thus, bayesian sets essentially performs feature selection.

For models where is parametrized using exponential families, we have a similar expression but may or may not be linear in features of .

Exponential Families:

References:

  1. Ghahramani, Zoubin, and Katherine A. Heller. “Bayesian sets.” NIPS. Vol. 2. 2005. [PDF]
  2. Verma, Saurabh, and Estevam R. Hruschka Jr. “Coupled bayesian sets algorithm for semi-supervised learning and information extraction.” Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2012. [PDF]
comments powered by Disqus