- The training might not be converging, leading the cost to shoot up infinitely.
- Invalid operations, like a divide-by-zero or taking the log of zero, might be happening. This often occurs if the learning rate is set too high.
- Please refer the following thread to learn more: https://community.deeplearning.ai/t/when-does-mse-becomes-nan/385197
- Exploding or vanishing gradients, which can sometimes be managed by using batch normalization.
- A learning rate that's too high. Lowering it can often stabilize the training process.
- Operations that result in NaN during the training, such as division by zero or logarithm of zero.
- NaN values in your input data or errors in preprocessing (like normalization with zero variance), which can propagate NaNs through the network. Refer the following thread to mitigate this issue: https://stackoverflow.com/questions/66381703/linear-regression-contain-nan-values