To test my understanding of Neural Networks and Deep learning I used what i learned form Deep Learing coursera specialization and the code i developed for its assignment to solve the Titanic Kaggle competition.
In the last months i have been following an amazing course held by Andrew Ng for deeplearning.ai on coursera.
This course introduces you to deep learning from the beginning, you just need some basic knowledge of linear algebra adn programming to get started, nothing major but it will help.
I have so far taken the first 3 classes:
- Course 1 Neural Networks and Deep Learning
- Course 2 Improving Deep Neural Networks
- Course 3 Structured Machine Learning Projects
Kaggle provide you with two .csv file with the list of passenger of the titanic. one file is for training the neural network and has the information about the passenger, including the information of their survival to the sank. The other file is the test they give to you, there are all the information about the passenger but no information about it survival. Kaggle asks then to upload a .csv file with the survival’s prediction of every passenger in the test set.
The information provided are:
- Ticket class
- Age in years
- # of siblings / spouses aboard the Titanic
- # of parents / children aboard the Titanic
- Ticket number
- Passenger fare
- Cabin number
- Port of Embarkation
From these information i trained a single hidden layer neural network, with Relu activation on the hidden layer and Sigmoid for the last node.
A single hidden layer neural network consists of 3 layers: input, hidden and output.
The input layer has all the values form the input, in our case numerical representation of price, ticket number, fare sex, age and so on.
In the hidden layer is where most of the calculations happens, every Perceptron unit takes an input from the input layer, multiplies and add it to initially random values. This initial output is not ready yet to exit the perceptron, it has to be activated by a function, in this case a Relu function.
The last and third layer is the output layer, it takes all the previous layer Perceptrons as input and multiplies and add their outputs to initially random values. then gets activated by a Sigmoid function. this layer outputs a value between zero and one, which is the likely in this test that a passenger survives.
To train the network we relies on gradient descent and backpropagation of the gradients. Here, the output values are compared with the correct answer to compute the value of some predefined error-function.The error compared to the expected final output is then fed back through the network. Using this information, the algorithm adjusts the initially random weights of each connection in order to reduce the value of the error function by some small amount. After repeating this process for a sufficiently large number of training cycles, the network will usually converge to some state where the error of the calculations is small. In this case, one would say that the network has learned a certain target function. To adjust weights properly, one applies a general method for non-linear optimization that is called gradient descent. For this, the network calculates the derivative of the error function with respect to the network weights, and changes the weights such that the error decreases (thus going downhill on the surface of the error function). For this reason, back-propagation can only be applied on networks with differentiable activation functions.
The neural network code programmed is in this file:mono_layer.py. the neural network is all manged trought matrix calcuated thanks to numpy and it doesn’t use any library for the neural network itself (like tensorflow).
In the first testing I noticed that small changes in hyper-parameters change a lot the ability of the neural network training to reach a low cost in short time. Here i will show a graph with random values of learning rate and number of neural units in the hidden layer. I ran only 6 tests to make it for a simpler visualization, here are the costs sampled 10 times in 25000 iterations.
This graph didn’t convinced me too much and i tried then to sample all the costs to see how it was behaving wile going down.
Here you can see how the cost behave while it was not sampled for a better visualization. all the tests reach a satisfactory Test_accuracy around 80%
Being deeply interested in this graph i decided to go for 50’000 iterations.
The accuracy after 50’000 iterations seems to rise for most tests. then i decide to continue to reach 100’000 iterations.
In the 100’000 iterations graph we can see that the improvement of the algorithms starts to stagnate. test 3 reaches the best absolute Test_accuracy of 81.84% but test 2 is losing most of it. The algorithm has been over-fitting in the last 50’000 iterations.
Here the values stabilizes a lot, but the accuracy didn’t really improved.
This is just the first post i will do about this test and neural networks in general. I submitted the prevision of the test set provided by Kaggle before doing these studies of performance and scored 0.76555 (position on leaderboard 5913). Then i submitted the prediction based on the knowledge i acquired from the new tests and scored 0.79425 (position on leaderboard 2374).
Also published on Medium.