Gaussian Processes on latent representations
7.10.3. Gaussian Processes on latent representations¶
Although the representative power of a Gaussian process is quite high due to the use of kernels, a single kernel measure may not be sufficient to represent very complicated relationships. Although kernel engineering, i.e. combining multiple kernels, may provide some improvement of the representative power, this may still be insufficient. The kernel evaluation is bound to the original \(d\)-dimensional space \(\mathbb{R}^d\). However, the Euclidean distance as used in stationary kernels may not be the best way of measuring similarity between the samples \(x_i \in \mathbb{R}^d\), on which the kernel approximation relies. Thus, a transformation \(f_t: \mathbb{R}^d \rightarrow \mathbb{R}^q\) to a latent space \(\mathbb{R}^q\) can be deployed to improve the effectivenes of the Eulidean distance on the new latent samples \(u_i = f_t(x_i)\).
Both [2] and [5] propose to use deep \(\mathcal{GP}\), where multiple \(\mathcal{GP}\) are stacked and the outputs of the previous layer \(\mathcal{GP}_{i-1}\) are used as the inputs of the current layer \(\mathcal{GP}_{i}\). In other words, \(l\) number of \(\mathcal{GP}\)s are stacked, where the first layers \(\mathcal{GP}_{1, \dots, l-1}\) correspond to the transformation function \(f_t(\cdot)\) and the posterior of the last layer \(\mathcal{GP}\) corresponds to the predicted posterior. Both publications differ in the way the training and posterior prediction are formulated. In [2], a variational posterior is used. This assumes an independence of the outputs of each layer. In [5], a doubly stochastic variational inference is performed instead, where the assumption of independence is dropped.
Besides stacking \(\mathcal{GP}\)s, various neural network architectures are also proposed for the transformation \(f_t\). Most importantly, exact [6] and approximate [7] deep kernel learning were proposed, where a deep neural network (DNN) with reducing number of neurons were used as \(f_t(\dot)\) to reduce the dimensionality. This approach is especially useful, if the original representation contains redundant features such as in images.
Moreover, using variational autoencoders with exact [1] and sparse [3] \(\mathcal{GP}\) were also proposed. Since variational autoencoder seek to learn independent Gaussians as latent variables [4], using them in a \(\mathcal{GP}\) context becomes more straightforward while providing the advantages of probabilistic treatment. Finally, [] derive an exact equivalence between \(\mathcal{GP}\) and DNN, followed by a proposal to use \(\mathcal{GP}\) prior on DNN, yielding neural network Gaussian Processes. As such, \(f_t(\cdot)\) consists of all latent layers of the DNN before the output layer.
- 1
Francesco Paolo Casale, Adrian V. Dalca, Luca Saglietti, Jennifer Listgarten, and Nicoló Fusi. Gaussian process prior variational autoencoders. In NeurIPS. 2018.
- 2(1,2)
Andreas Damianou and Neil D. Lawrence. Deep Gaussian processes. In Carlos M. Carvalho and Pradeep Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 31 of Proceedings of Machine Learning Research, 207–215. Scottsdale, Arizona, USA, 29 Apr–01 May 2013. PMLR. URL: https://proceedings.mlr.press/v31/damianou13a.html.
- 3
Metod Jazbec, Matt Ashman, Vincent Fortuin, Michael Pearce, Stephan Mandt, and Gunnar Rätsch. Scalable gaussian process variational autoencoders. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, 3511–3519. PMLR, 13–15 Apr 2021. URL: https://proceedings.mlr.press/v130/jazbec21a.html.
- 4
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. 2014. arXiv:1312.6114.
- 5(1,2)
Hugh Salimbeni and Marc Deisenroth. Doubly stochastic variational inference for deep gaussian processes. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper/2017/file/8208974663db80265e9bfe7b222dcb18-Paper.pdf.
- 6
Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. Deep kernel learning. In Arthur Gretton and Christian C. Robert, editors, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of Proceedings of Machine Learning Research, 370–378. Cadiz, Spain, 09–11 May 2016. PMLR. URL: https://proceedings.mlr.press/v51/wilson16.html.
- 7
Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. Stochastic variational deep kernel learning. In NIPS. 2016.