A multi-label classification problem is one in which a list of target variables is associated with every row of input. An example of this would be the various tags associated with medium articles. In this article we will be labeling satellite images. But why is this important?
Deforestation in the Amazon basin contributes to reduced biodiversity, habitat loss, climate change, and other devastating effects. Combating deforestation requires detection and understanding of markers of human activity over large regions of Earth. Utilizing automated analysis of satellite images to detect these markers can enable faster and more effective responses to activity that indicates or precedes deforestation.
Now that we know that labeling these images is important, let’s proceed to finding out how we actually do it and what kind of metric do we use for the same.
The full jupyter notebook can be found here.
Importing the library
from fastai.vision import *
As usual we start by importing the
fastai library. Since we are working with image data we use
fastai.vision. Next, let's take a look at how our target labels are stored.
df = pd.read_csv(path/'train_v2.csv') df.head()
The target labels are stored in a csv file, with every label separated by white space. Note that in multi-label classification, the number of labels associated with an image will vary. Every image will not have the same number of labels.
tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
In the first step, we apply a number of transforms to our images. Why do we transform our images? We do it in order to give more variety to our data set and help our model learn well. If we are designing a face detection algorithm, we can randomly flip some images horizontally to get a better data set.
In case of satellites however, we can also flip the images vertically as satellite images have no orientation. The
max_zoom parameters are set by trial and error to their best values.
max_warp will change the perspective of the picture. It is handy when it comes to data sets like pets and cars, which clearly can be viewed from different perspectives; you can look at the dog from atop, or on the same level when staying close to the ground and playing with it. But for satellite, it always views the ground from the same perspective–high up above the ground in the space. Thus, adding a perspective warp to the training data set will make it unrepresentative of real satellite images. Hence we turn it off.
np.random.seed(42) src = (ImageItemList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg') .random_split_by_pct(0.2) .label_from_df(sep=' '))
In the second step, we start by creating a
random seed. This seed helps us create the same random data set (train / test split) every time and hence make our results reproducible. We then read the image list, split the labels using csv, create a 20% validation set, and split the multiple labels on white space.
data = (src.transform(tfms, size=128) .databunch(num_workers=0).normalize(imagenet_stats))
Finally (gosh!) we create our data bunch. The reason for keeping this step separate is that we can experiment with different sizes of images which is a good thing to do. We can train a model using a smaller size, save the weights and use them later for training a data bunch with a larger size for better results.
Viewing our data
Model architecture and training
We download and use a pretrained model resnet50 (50 layers) that has been trained to identify thousands of categories of images.
arch = models.resnet50
acc_02 = partial(accuracy_thresh, thresh=0.2) f_score = partial(fbeta, thresh=0.2) learn = create_cnn(data, arch, metrics=[acc_02, f_score], model_dir='/tmp/models')
We use 2 metrics for our model. The first one is accuracy and the second one is f_score (which is used by Kaggle for this competition)
When you have classifier you going to have some false positives and false negatives. How do you weigh up those two things? You do it by creating a single number from these two, by balancing them well. Lots of different ways to do that, one of which is f_score.
In order to understand the
thresh=0.2 let us take an example. Suppose we are designing a digit classifier (0–9), then our model would give out 10 probabilities for each digit, and we would select the maximum one as our prediction. But for multi-label classification, we decide a threshold and every probability above that threshold is used as a label.
lr = 0.01
Finally we set the learning rate, train a little, alter the learning rate (more in this in a future article) and train some more. We can plot our training and validation losses.
Always remember that if your training loss is more than your validation loss, then you are underfitting. You can either train for more epochs or try a different learning rate. In my notebook, my final f_score is 9.285 which on the public leaderboard would get my somewhere in the top 100–120 participants. With more learning and fine tuning, one can do better.