Blog: Machine Learning Equations by Saurabh Verma

Blog: Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs (NIPS 2017).

• Tutorials

How can we convert a graph into a Feature Vector?

Graph is a fundamental but complicated structure to work with from machine learning point of view. Think about it, a machine learning model usually require inputs in some form of mathematical objects like vectors, matrices, real number sequences and then produces the desired outputs. So, the question is how we can convert a graph into a mathematical object that is suitable for performing machine learning tasks such as classification or regression on graphs.

For past one year, I have been working on developing machine learning models for graph(s) in order to solve two specific problems:

1) Graph Classification: Given a collection of graphs and associated graph labels where ( is the number of classes), we need to come up with a learning model or function that can classify or predict the label associated with a graph.

2) Node Classification: Given a graph , where is vertex set, is edge set & is adjacency matrix and labels where associated with each node, our task is to classify or predict the labels on each node. That is, come up a learning function .

Blog: Stability of Learning Algorithms Part 3.

• Tutorials

Extending to Ranking Algorithms

Blog: Stability of Learning Algorithms Part 2.

• Tutorials

Extending to Randomized Algorithms

Let be a set of random parameters.

Let’s assume that our algorithm has uniform stability , if it satisfies,

Blog: Stability of Learning Algorithms Part 1.

• Tutorials

This is walk-through on how to derive generalization bounds based on the stability of algorithms. Understanding the proof helps to extend or modify the results according to your needs. For instance, I had a loss function which was unbounded, and since to apply generalization results your loss function must be bounded, I was stuck. So, this my effort to state the proof as clear as possible along with the assumptions needed.

Blog: Dirichlet Distributions.

• Tutorials

Dirichlet Distribution is a distribution over distributions. More specifically, it is a distribution over pmfs (probability mass functions). You can imagine, as if there is a bag of dices, and each dice has a corresponding pmf (related to six possible outcomes). Now picking a dice is like sampling a particular pmf from a distribution. The probability of picking a dice, which results in a pmf, comes from the Dirichlet distribution .

Let be a pmf (or a point in simplex ), where and . Here is dirichlet parameter and . Then, the probability density of is given by:

Graphical Model:

Suppose is a set of samples drawn from pmf where . For eg. could be a sequence of outputs of a dice . Let are pmfs. Then:

Let be the number of outcomes in sequence of samples that is equal to event where , and let .

Therefore,

Blog: Bayesian Sets.

• Tutorials

Bayesian Sets Graphical Model:

Bayesian sets are simple graphical models used to expand a set. For instance, suppose you are given a set of few words (or items) which we refer as “seeds” and we wish to expand this set to include all similar words from the given text corpus. Then, we can employ Bayesian sets which rank each item based on its importance of belonging to seed set.

Let be a data set of items, and be an item from this set. Assume the user provides a query set which is a small subset of , bayesian sets rank each item by . This probability ratio can be interpreted as the ratio of the joint probability of observing and to the probability of independently observing and .

Assume that the parameterized model is where are the parameters as shown in figure above. Here, we assume that is represented by a binary feature vector and is the weight associated with feature . Then,

For each (Note: vector component is and bold letters are vectors):

The conjugate prior for the parameters of a Bernoulli distribution is the Beta distribution:

Blog: Why Regularized Auto-Encoders learn Sparse Representation?

• PaperList

This paper has some great insights to offer in design of autoencoders. As title suggests, it addresses “whether regularized helps in learning sparse representation of the data theoretically?”

Here is the setup: we have an input which is mapped to latent space via autoencoder where is encoder activation function, is the weight matrix, and is the encoder bias and is hidden representation or outputs of hidden units.

For analysis, paper assumes that decoder is linear i.e. decodes back the encoded hidden representation and loss is squared loss function. Therefore, for learning parameters of autoencoder; objective function is:

Now we are interested in sparsity of , hidden representation.

Besides above, paper makes two more assumptions.

Assumption 1: Data is drawn from a distribution for which and , where is identity matrix.

Basically, it assumes that data is whitened which is reasonable for some cases. There is one more assumption from analysis point of view which is needed for the derivation of theorems, we will see that later.

Finally, for a give data sample , each hidden unit gets activated if pre-activation unit is greater than threshold:

Blog: Learning Convolutional Networks for Graphs

• PaperList

For past couple of months, I have been wondering about how convolutional networks can be applied to graphs. As we know, convolutional networks have became the state of the art in image classification and also in natural language processing. It is easy to see how convolutional networks exploits the locality and translation invariance properties in an image. People have now also realized on how to exploit those properties in NLP using convolutional networks efficiently. It is time to look at other, more general, domains such as graphs where notion of locality or receptive field needs to be defined. At this point, we can start thinking about graph neighborhood concepts as it is going to play a major role in connecting with receptive fields of convolutional networks.

Reading List: ICML Papers 2016

• PaperList

To make a habit of reading more papers, I have decided to write comments, may be some quick or sometime detail, on the papers which I find interesting and related to my research area. Hopefully, this will come handy in solving my own research problems.

Starting with ICML 2016 Papers: