1. Batch size: Batch size is a very critical parameter when you're dealing with Recurrent Neural Networks. Plain Feed forward neural networks and Convolutional Neural Networks aren't affected much by the batch size. That being said, you can expect slightly better performance and convergence with a higher batch size. Especially if you're using a simple gradient descent algorithm like sgd. More complex algorithms like rmsprop and adam aren't affected as much by the batch size.
- Number of epochs: The way it's usually done in practice is you divide your training set into three parts - train, validation and test. You train on the train set and keep checking the performance on the validation set. When the validation performance stops changing, you stop training. Looking at purely the train performance is bound to make your model overfit.
- How to know if model is overfitting: A simple way to do this is to just observe the difference between your train error and validation error. The difference is initially quite small as your model just begins to train. Sometimes your validation performance might be better. But with increasing number of epochs your validation performance starts to lag behind and it's up to you to choose when to stop training. It's generally a sign of overfitting if your validation accuracy has peaked.
- Number of hidden layers: You can choose to have as many hidden layers as you want. It has been proven that deeper networks work better for image classification compared to shallow networks. There are networks with 1200 layers now (Resnet). But deeper networks lead to a few problems that need to be handled carefully. You need to choose your activation function properly. Otherwise you'll have problems with vanishing and exploding gradients. ReLu is generally a good choice for image classification. You need to make sure your model doesn't overfit. Deeper networks benefit from dropout technique. Use weight and activity regularizations. Add noise to your images to make them generalize better and find more stable optima. Use data augmentation. These are just a few tips that come to mind right now. There are a lot of other things that can be done. There isn't any fixed rule that says number of neurons need to be 2/3rd of the input size.
The easiest technique though? Use pre-trained networks like VGG-16 and Inception or Resnet. They've already been trained to optimum and will generally give you a very good out of box performance.
댓글 없음:
댓글 쓰기