Open Conference Systems, StatPhys 27 Main Conference

Font Size: 
On the role of wide flat minima in multi-layer neural networks
Enrico Maria Malatesta, Carlo Baldassi, Carlo Lucibello, Gabriele Perugini, Fabrizio Pittorino, Riccardo Zecchina

##manager.scheduler.building##: Edificio San Jose
##manager.scheduler.room##: Aula Magna
Date: 2019-07-08 06:15 PM – 06:30 PM
Last modified: 2019-06-10

Abstract


It has been recently shown that the space of solutions in one-layer neural networks architectures with binary weights, contain regions of very high local entropy density. In addition, the solutions found by many algorithms are typically located in these regions, i.e. they are surrounded by an exponential number of solutions. I will discuss the role of those regions in large neural networks with one hidden layer by showing, in particular, how using a different activation function (i.e. ReLU) can help increasing the local entropy. This helps algorithms to converge faster and to obtain solutions that generalize better than isolated ones. I will also present new results concerning the critical capacity of these networks; this motivates further the use of deep architectures versus shallow ones.