WHAT IS OVERFITTING?

Overfitting in a neural network

In this post, we"ll discuss what it means when a Model is said khổng lồ be overfitting.

Bạn đang xem: What is overfitting?

We"ll also cover some techniques we can use khổng lồ try to lớn reduce overfitting when it happens.

digraph class="network-visual" rankdir=LR bgcolor="transparent" ranksep=.6 nodesep=.1 node < shape=circle width=.6 style="setlinewidth(2)" fixedsize=true fontsize=6 > edge < arrowhead=normal style="setlinewidth(.25), dashed" > node

Overfitting occurs when our model becomes really good at being able to lớn classify or predict on data that was included in the training phối, but is not as good at classifying data that it wasn"t trained on. So essentially, the mã sản phẩm has overfit the data in the training set.

How to lớn spot overfitting

We can tell if the mã sản phẩm is overfitting based on the metrics that are given for our training data và validation data during training. We previously saw that when we specify a validation mix during training, we get metrics for the validation accuracy & loss, as well as the training accuracy & loss.

*

If the validation metrics are considerably worse than the training metrics, then that is indication that our Model is overfitting.

We can also get an idea that our mã sản phẩm is overfitting if during training, the model"s metrics were good, but when we use the Mã Sản Phẩm to predict on test data, it doesn"t accurately classify the data in the kiểm tra phối.

The concept of overfitting boils down khổng lồ the fact that the Model is unable khổng lồ generalize well. It has learned the features of the training set extremely well, but if we give sầu the model any data that slightly deviates from the exact data used during training, it"s unable lớn generalize & accurately predict the output.

Reducing overfitting

Overfitting is an incredibly comtháng issue. How can we reduce it? Let"s look at some techniques.

Adding more data khổng lồ the training set

The easiest thing we can vì, as long as we have access khổng lồ it, is to lớn add more data. The more data we can train our Mã Sản Phẩm on, the more it will be able lớn learn from the training mix. Also, with more data, we"re hoping khổng lồ be adding more diversity to the training set as well.

For example, if we train a Model lớn classify whether an image is an image of a dog or cát, and the model has only seen images of larger dogs, lượt thích Labs, Golden Retrievers, & Boxers, then in practice if it sees a Pomeranian, it may not vày so well at recognizing that a Pomeranian is a dog.

Xem thêm: 20 Chi Tiết Kỳ Lạ Nhất Về Cơ Thể Của Poison Ivy Là Ai, Poison Ivy (Character)

*

If we add more data to this Mã Sản Phẩm to encompass more breeds, then our training data will become more diverse, and the Model will be less likely khổng lồ overfit.

Data augmentation

Another technique we can deploy to lớn reduce overfitting is to lớn use data augmentation. This is the process of creating additional augmented data by reasonably modifying the data in our training mix. For image data, for example, we can vị these modifications by:

Cropping Rotating Flipping Zooming

We"ll cover more on the concept of data augmentation in a later post. There is also a data augmentation post in the Keras series.

The general idea of data augmentation allows us lớn add more data to our training mix that is similar khổng lồ the data that we already have sầu, but is just reasonably modified lớn some degree so that it"s not the exact same.

For example, if most of our dog images were dogs facing to lớn the left, then it would be a reasonable modification to add augmented flipped images so that our training phối would also have sầu dogs that faced to the right.

Reduce the complexity of the model

Something else we can vì to lớn reduce overfitting is khổng lồ reduce the complexity of our Model. We could reduce complexity by making simple changes, like removing some layers from the model, or reducing the number of neurons in the layers. This may help our Model generalize better khổng lồ data it hasn"t seen before.

Dropout

The last tip we"ll cover for reducing overfitting is to lớn use something called dropout. The general idea behind dropout is that, if you add it khổng lồ a mã sản phẩm, it will randomly ignore some subset of nodes in a given layer during training, i.e., it drops out the nodes from the layer. Hence, the name dropout. This will prsự kiện these dropped out nodes from participating in producing a prediction on the data.

This technique may also help our Mã Sản Phẩm to lớn generalize better to lớn data it hasn"t seen before. We"ll cover the full concept of dropout as a regularization technique in another post, và there we"ll understand why this makes sense.

Underfitting is next

Hopefully now we understand the concept of overfitting, why it occurs, and how we can reduce it if we see it happening in our models. In the next post, we"ll explore the concept of underfitting. I"ll see ya there!