Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods


In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochas- tic quasi-Newton method by using partial Hessian informa- tion as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our pro- posed method is quite competitive to the state-of-the-art methods.

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
Hongyu Chen 陈泓宇
Hongyu Chen 陈泓宇
PhD Student at IDSS