Deep neural networks (DNNs) with the flexibility to learn good top-layer representations have eclipsed shallow kernel methods without that flexibility. Here, we take inspiration from DNNs to develop the deep kernel machine. Optimizing the deep kernel machine objective is equivalent to exact Bayesian inference (or noisy gradient descent) in an infinitely wide Bayesian neural network or deep Gaussian process, which has been scaled carefully to retain representation learning. Our work thus has important implications for theoretical understanding of neural networks. In addition, we show that the deep kernel machine objective has more desirable properties and better performance than other choices of objective. Finally, we conjecture that the deep kernel machine objective is unimodal. We give a proof of unimodality for linear kernels, and a number of experiments in the nonlinear case in which all deep kernel machines initializations we tried converged to the same solution.