Monotonic Neural Network

There are many situations when one would like to recover a monotonic functions from (noisy) data. For example, in a regression setting, that is usually called an isotonic regression model. This can be useful, for example, when modeling a relationship of the type (happiness) ~ F(income) for some unkown function F that one can quite safely assume monotonic increasing.

How do we build a neural architecture such that, for any set of neural weights θRΘ^{} the neural network represents a function Fθ:RdRF_{}: ^d that is increasing along any of the dd coordinates? That is an old problem, and there are quite a few solutions apparently. For example, I will describe here the basic approach of this 1997 paper (1) by J. Sill.

It is based on the fact that if f1(x),f2(x),,fK(x)f_1(x), f_2(x), , f_K(x) are K1K increasing and continuous functions, then so are their pointwise minimum and pointwise maximum. In other words, the two functions m(x)m(x) and M(x)M(x) defined as

m(x)=mink=1K  fk(x)

and

M(x)=maxk=1K  fk(x)

are also both increasing and continuous functions. To construct an increasing” neural architecture, one can consequently try to build K1K relatively simple increasing functions f1(x),,fK(x)f_1(x), , f_K(x) and define the output of the neural network as their minimum,

F(x)=mink=1K  fk(x).

Now, to construct each one of these simple increasing functions fk(x)f_k(x), one can choose G1G vectors with non-negative coordinates ωk,1,,ωk,GR+d^{k,1}, , ^{k,G} ^d_+ and biases bk,1,,bk,GRb^{k,1}, , b^{k,G} and define

fk(x)=maxk=1K  ωk,g,x+bk,g

where u,v=u1v1++udvdu,v = u_1 v_1 + + u_d v_d is the usual Euclidean dot-product. To inplement this, one can use a soft-plus operation, i.e. ωk,g=SoftPlus(ωk,g){k,g} = (^{k,g}) with no constraint on ωk,g{k,g}, or anything equivalent, to make sure that these weights are nonnegative. Below, I have implemented this with K=20K=20 and G=10G=10, initialized all the weights randomly from a centred Gaussian with unit variance, and repeated the experiment 100100 times.

Now, we can try to implement a standard regression, but with the constraint that the function is increasing (i.e. isotonic regression). It suffices to minimize the standard Mean Square Error (MSE) with a monotomic neural net. Below shows the dynamics of learning on a simple 1D dataset {(xi,yi)}i=1N{(x_i, y_i)}_{i=1}^N and with a standard SGD optimizer.

There are variants of this approach for designing neural networks that always represent convex functions — I’ll try to show some simulations in a next blog-post. These monotonic networks are useful in many settings — in a next blog-post, I will use them for implementing a (deep) quantile-regression model…


references

  1. Sill, J. (1997). Monotonic networks. Advances in neural information processing systems, 10.

Tags
neuralnet

Date
November 20, 2022