RMS Prop & Adam Optimization

RMS Prop

RMS Prop stands for Root Mean Square Prop, which can also speed up gradient descent.

Implementation

On iteration t:
    Compute dW, db on the current mini batch
    Sdw = beta * Sdw + (1 - beta) * dW^2
    Sdb = beta * Sdb + (1 - beta) * db^2
    W = W - alpha * dW / sqrt(Sdw)
    b = b - alpha * db / sqrt(Sdb)

*To handle rounding and division by zero, we usually replace sqrt(Sdw) and sqrt(Sdb) with sqrt(Sdw + epsilon) and sqrt(Sdb + epsilon) respectively.

Adam Optimization

Most of the optimizations fail to generalize, RMS prop and Adam optimization are a few optimization algorithms that has been proven to work on wide range of algorithms.

Adam optimization algorithm is basically taking momentum and RMS prop together.

Adam: Adaption moment estimation

Implementation

Vdw = 0, Sdw = 0, Vdb = 0, Sdb = 0
On iteration t:
    Compute dW. db using the current mini batch
    
    Vdw = beta1 * Vdw + (1 - beta1) * dW
    Vdb = beta1 * Vdb + (1 - beta1) * db
    
    Sdw = beta2 * Sdw + (1 - beta2) * dW^2
    Sdb = beta2 * Sdb + (1 - beta2) * db^2
    
    Vdw_corrected = Vdw / (1 - beta1^t)
    Vdb_corrected = Vdb / (1 - beta1^t)
    
    Sdw_corrected = Sdw / (1 - beta2^t)
    Sdb_corrected = Sdb / (1 - beta2^t)
    W = W - alpha * Vdw_corrected / sqrt(Sdw_corrected + epsilon)
    b = b - alpha * Vdb_corrected / sqrt(Sdb_corrected + epsilon)

Hyperparameters choice

alpha   : needs to be tuned
beta1   : 0.9
beta2   : 0.999
epsilon : 10^-8

#RMS Prop

#Implementation

#Adam Optimization

#Implementation

#Hyperparameters choice

RMS Prop

Implementation

Adam Optimization

Implementation

Hyperparameters choice