Learning rate:0.001 Momentum:0.6 Batch size: 16 Hidden unit:50 Epochs:100
Maximum training accuracy:93% Maximum testing accuracy:82% Answer 4:
Different learning rate
g Test Accuracy (Ep: 100 Batches: 16 H-units:50 LR:0.001 M:0.6 )
8 Test Accuracy (Ep: 100 Batches: 16 H-units:50 LR:0.002 M:0.6 )
Different batch size
Different hidden unit
Test Accuracy (Ep: 100 Batches: 16 H-units: 10 LR:0.001 M:0.6 ) 78
g Test Accuracy (Ep: 100 Batches: 16 H-units:30 LR:0.001 M:0.6 )
The maximum accuracy of the neural network in this assignment was about 82% on the test set, which is very good for a single hidden layer network, with a small training set of 10000. I think that without using convolutional neural network, the accuracy would peak at around 85% for using only one hidden layer neural network.
The learning rate seems to be a very sensitive parameter, since increasing it beyond a certain limit will cause the network to not learn anything. At this point it is bouncing/skipping past the local optimum.
Very less number of hidden units (e.g. 10), will cause the test and train accuracy to be low. This is because the network is not expressive enough. However, increasing the number of hidden units beyond a certain point will not further increase the test/train accuracies. Moreover, the variance was larger in using lower number of hidden units.
Increasing the number of hidden units from 10 to 30 had more effect on the testing accuracy than compared to increasing the number of hidden units from 30 to 50.
Batch size of 16 worked best in my neural network implementation. Batch size of 64 (maximum) performed the worst. It seems that there is less variance in the test accuracy when using a smaller batch size.