Neural Networks

OVERVIEW

Neural networks are a type of machine learning algorithm modeled after the human brain. They are composed of layers of interconnected nodes or neurons, which are responsible for processing and transforming data. Each neuron receives input signals, performs some mathematical operations on them, and then outputs the result to the next layer of neurons until the final output is produced.

The neurons in a neural network are organized into layers, with each layer performing a different type of computation. The input layer receives the data and passes it to the first hidden layer, which then passes it on to subsequent hidden layers. The final output layer produces the network's prediction or classification.

Neural networks are trained using a process called backpropagation. During training, the network is presented with a set of labeled examples and adjusts its weights and biases to minimize the difference between its predicted output and the true output. This process is repeated many times until the network's predictions are accurate enough.

Neural networks have many applications in machine learning, including image and speech recognition, natural language processing, and predictive analytics. They can be used for both supervised and unsupervised learning tasks, and can handle large and complex datasets.

Figure 1- Sample Neural Network

DATA PREP AND CODE

Labelled data is necessary for supervised learning because it allows the model to learn the relationship between the input and output variables. In supervised learning, we train the model on labeled data, where each data point is associated with a label or target variable. This means that we know the expected output or response variable for each input or feature set.

The model uses the labeled data to learn the underlying patterns or relationships between the input and output variables. It then uses this knowledge to make predictions on new, unseen data. Without labeled data, the model cannot learn the relationship between the input and output variables, and hence cannot make accurate predictions on new data.

The goal of supervised learning is to learn the underlying patterns or relationships between the input and output variables so that we can make accurate predictions on new, unseen data.

Figures 2 and 3 below give us an idea about the RAW data available to us for analysis, and its transformation to clean data.

This dataset is not clean and formatted so that a Neural Network can be implemented onto it. We dropped features, addressed data type mis-match, and modified column values so as to make the dataset suitable for the model to be implemented.

The final dataset (figure 3) is a combination of 23 of these cleaned datasets for each league. As can be seen, we have a total dataset with dimensions 945x6

Now, since our dataset is ready, next, we need to split it into training and testing datasets.

Splitting data into training and testing sets is necessary in supervised learning to evaluate the performance of the model. The purpose of training a model is to make it learn from the given data so that it can make accurate predictions on unseen data. However, if the model is overfitted, it will perform well on the training data but poorly on the testing data, which defeats the purpose of creating a model in the first place.

To avoid overfitting, the data is split into two sets: the training set and the testing set. The model is trained on the training set and evaluated on the testing set. This way, the model can be tested on data it has not seen before, and the performance on the testing set can be used to estimate the performance of the model on new, unseen data.

Splitting data into training and testing sets is also useful for hyperparameter tuning. Hyperparameters are settings that are chosen before training a model, and they can significantly impact the performance of the model. By testing the model on the testing set, different hyperparameters can be evaluated, and the ones that lead to the best performance can be selected.

Creating a disjoint split when creating test and train splits is important for several reasons:

Preventing overfitting: When a model is trained on a dataset, it may learn to memorize the specific data points and relationships within that dataset, rather than learning more generalizable patterns. This can lead to overfitting, where the model performs well on the training data but poorly on new data. By creating a disjoint split where the test set contains data that the model has not seen during training, we can evaluate the model's ability to generalize to new data and prevent overfitting.

Evaluating model performance: When testing a model's performance, we want to know how well it will perform on new, unseen data. By creating a disjoint split, we can evaluate the model's performance on a set of data that it has not seen during training, which gives us a more accurate estimate of how the model will perform in the real world.

Improving model selection: When comparing the performance of different models, we want to ensure that they are being evaluated on the same set of data. By creating a disjoint split, we can ensure that all models are being evaluated on the same set of test data, which allows for a fair comparison of their performance.

Overall, creating a disjoint split when creating test and train splits is essential for evaluating and comparing machine learning models, as it allows us to test their ability to generalize to new data and prevent overfitting.

Neural networks (NNs) require specific labels depending on the type of neural network you are using. In general, neural networks require labeled data, which means that the data used for training the network must have a corresponding label or output value associated with each input. For example, in a binary classification problem, each data point is labeled as either a 0 or a 1, and the neural network is trained to correctly classify each data point as one of these two labels. Similarly, in a multi-class classification problem, each data point is labeled with a specific class label, and the neural network is trained to correctly classify each data point into one of the multiple classes.

In addition, the type of neural network being used can also affect the labeling requirements. For instance, in supervised learning tasks, such as classification and regression problems, the labeled data is used to train the neural network to produce accurate outputs given input data. In contrast, in unsupervised learning tasks, such as clustering and anomaly detection, labeled data may not be required, and the neural network learns patterns in the input data without a specific output to compare against.

Figure 2 - Raw data before transformation

Figure 3 - Cleaned Data

Figure 4 - Training dataset

Figure 5 - Testing Dataset

Click here for sample training data

Click here for sample testing data

Click here for Neural Networks (Python)

RESULTS

The results of Neural Networks applied on a Soccer Dataset, can provide valuable insights and uncover unknown trends on the said dataset. It can determine the correlations between variables in a dataset and how they contribute to predicting a target variable. Two of the key metrics used to evaluate the performance of a Neural Network are accuracy, and confusion matrix which measures the proportion of correctly predicted instances in the dataset.

For the dataset at hand, we have made use of a simple Neural Network with the following attributes:

1. The input layer takes in 5 inputs with the activation function ReLU.

2. Next, we have added 2 hidden layers to this network, the first one with 3 neurons and the second one with 2 neurons, both using the ReLU activation function.

3. Finally, we have the output layer with only one neuron, also making use of the ReLU activation function.

Figure 6 below gives a gist of how the architecture of our neural network looks like

Figure 6 - Neural Network Architecture

The data we have is for the club 'Manchester United', and we have analyzed their match outcome trends over the past 23 seasons.

Now, let us have a look at the performance of our Neural Network.

Confusion Matrix

Figure 7 & 8 - Model Evaluation

Figures 7 and 8 showcase the results for our simple neural network. As can be seen in figure 7, we ran the network for a total of 40 Epochs and as the epochs increase, the model tends to learn features and attributes in the data which in turn lead to a better accuracy and a lower loss. The model started with an initial accuracy of 24% at epoch 1, made its way to accuracy 75.5% at halfway i.e., at the 20 epoch and finally attained an accuracy of 85.18% whilst running its course.

Figure 8 on the right-hand side shows us the accuracy of our model along with the confusion matrix. As per the confusion matrix, the below can be inferred:

Accuracy: The overall accuracy of the model can be calculated as (TP + TN) / (TP + TN + FP + FN). In this case, the accuracy is (21 + 60) / (21 + 3 + 11 + 60) = 0.85 or 85%.
Precision: Precision is the proportion of true positive predictions out of all positive predictions made by the model. It is calculated as TP / (TP + FP). In this case, precision is 60 / (60 + 3) = 0.95 or 95%.
Recall: Recall is the proportion of true positive predictions out of all actual positive instances. It is calculated as TP / (TP + FN). In this case, recall is 60 / (60 + 11) = 0.85 or 85%.
F1-Score: The F1-score is the harmonic mean of precision and recall. It is calculated as 2 * ((precision * recall) / (precision + recall)). In this case, the F1-score is 2 * ((0.95 * 0.85) / (0.95 + 0.85)) = 0.90 or 90%.

Figure 9 - Learning Curve

The Figure above displays, what is known as the "learning curve." It represents the relationship between the number of training iterations (epochs) and the model's loss function.

As can be seen, in the initial epochs, the model is not trained enough and its predictions are not accurate, so the loss function is high. As the number of epochs increases, the model starts to learn more patterns in the data, which leads to a decrease in the loss function. The curve reaches a point where it flattens out, indicating that the model is no longer learning and has reached its best performance.

Therefore, we can conclude that the model is improving and learning from the data. It is essential to monitor the learning curve to ensure that the model is not over-fitting or under-fitting the data.

CONCLUSION

In this study, we utilized the concepts of Neural Networks to predict the outcome of soccer matches for a particular team, based on a dataset containing various match statistics. We made use of a simple neural network having 5 input variables, 2 hidden layers and a single output layer.

The confusion matrix for our neural network showed that the models does not have perfect predictions and had some misclassifications.

Also, we were able to determine the importance of features being used in our dataset.

Based on the results of the neural network model trained on a soccer dataset to predict the win/loss outcome of a match, we can conclude that the model is decently accurate with an accuracy rate of 85%. Moreover, the model also has a good recall rate of 85%, which means that it is able to correctly identify the majority of the positive outcomes (i.e., wins) in the dataset. The F-1 score of the model is also high at 90%, which is a measure of the balance between precision and recall. Overall, these metrics suggest that the model is effective in predicting the win/loss outcomes of soccer matches.

Overall, the results suggest that Neural Networks can be a powerful tool for predicting the outcome of soccer matches based on various statistics. Even whilst using a simple neural network, we achieved a decent accuracy. One important factor to consider is that Neural Networks perform good only when the data at hand is relatively large. Also, further research can be done to investigate the impact of different features on the prediction accuracy and more complex networks can be implemented for a more in-depth analysis of the same.