4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
(https://www.intuit.com/careers/oa/technology/?
cid=rodb_aim_click_in_ttt-
global_aw_aidata|alltechaudience_img|980x90_intuit-talent)
SUBSCRIBE
(http://mlgn.to/8ipv)
(https://analyticsindiamag.com)
(https://praxis.ac.in/top-post-graduate-program-in-data-science/?
utm_source=AIM&utm_medium=banner&utm_campaign=PATApril9)
PUBLISHED ON OCTOBER 31, 2021
IN DEVELOPERS CORNER
(HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/DEVELOPERS_CORNER/)
A Guide to Self-Supervised Learning in Computer Vision
BY YUGESH
VERMA (HTTPS://ANALYTICSINDIAMAG.COM/AUTHOR/YUGESH-
VERMAANALYTICSINDIAMAG-COM/)
In the last few years, we have seen that self-supervised learning methods are
emerging rapidly. It can also be noticed that models using self-supervised learning
methods have solved many of the problems regarding unlabeled data. Uses of these
methods in fields like computer vision and natural language processing have shown
many great results. In this article, we are going to discuss the self-supervised
learning methods used in the field of computer vision. We will also discuss the
contrastive learning used as a self supervised approach to address the data
labelling problems in computer vision. The major points to be discussed in this
article are listed below.
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 1/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Table of Contents
1. Why Self-supervised Learning is Needed?
2. The Contrastive Learning
3. Contrastive Predictive Coding (CPC)
4. Instance Discrimination Methods
Let us begin the discussion by understanding why self supervised learning is
needed.
Why Self-supervised Learning is Needed?
We require a lot of labeled data when working on the supervised learning
(https://analyticsindiamag.com/an-introduction-to-supervised-deep-learning-for-
non-techies/) techniques. Most of the time data labelling becomes very costly, and
time consuming especially in the field of computer vision where tasks such as
object detection are required to perform. Image segmentation
(https://analyticsindiamag.com/how-to-do-image-segmentation-using-deeplab/)
tasks are also there in the computer vision tasks which require every small detail to
be well-annotated and labeled. And we also know that we can make available the
unlabeled data in abundance.
The basic idea behind Self-supervised learning is to make a model learn only the
important representation of the data from the pool of unlabelled data. In this
learning method the models are trained as they can supervise themselves and after
supervising they can provide a few labels on the data so that supervised learning
tasks can be performed on it. If we are talking about computer vision the
supervised learning task can be the simplest image classification task or it can also
be semantic segmentation which is a complex task in the computer vision
(https://analyticsindiamag.com/swin-transformers/) field.
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 2/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
In the field of natural language processing, the transformer models such as BERT
(https://analyticsindiamag.com/rethinking-search-age-of-ai-search-engine/) and T5
are providing a lot of fruitful results. These models are also built on the idea of
self-supervised learning where they are already trained with a large amount of
unlabelled data and then they apply some fine-tuned supervised learning models
with few labeled data. Similarly in the field of computer vision, there are some
models which follow the idea of self-supervised learning. In this article we are
going to introduce some of them.
In computer vision, the basic idea behind self-supervised learning is to create a
model which can solve any fundamental computer vision task using the input data
or image data and by the time model is solving the problem it can learn from the
structure of the objects presented in the image. There can be many self-supervised
learning methods but in the case of computer vision, one method named the
contrastive method seems to be more successful than the others. Hence, In the next
section of this article, we are going to introduce the contrastive learning method.
The Contrastive Learning
To understand contrastive learning refer to the below image wherein the model
using the contrastive learning method has a function f() which takes the input a and
gives the output f(a).
Let’s say there can be two types of inputs, positive and negative. So if there are
two similar inputs (in the below image we are considering a1 and a2c as similar
positive input ) to the function f() then their output should be the same and this
output should be dissimilar to the output of the opposite input. Above can be
considered as the statement of any contrastive learning approach.
Positive or similar input can be two sections of the same image or two frames from
the same video and the negative or dissimilar input can be a part of different
computer vision data or section from the different image.
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 3/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Contrastive Predictive Coding (CPC)
The general idea behind the contrastive predictive coding (CPC) is extracting a
few upper rows from the coarse grid of the images ad the task is to predict a few
lower rows of the image by the time of generating a prediction of the lower rows
model need to learn the structural behaviors and the objects of the images. For
example, by seeing the face of the cat, the model can predict that the cat has four
legs in the lower part of the image.
Let us suppose a basic computer vision modeling task is divided into three parts
as:
Divide the image into the grids such as if the size of image is given 256 x 256,
it can divide into 7×7 grids where if the cell size is 64px and 32px it can
overlap with each neighbor cell.
Encode the grid cell into a vector such that if the size of the given image is
similar to the first point the grid cells can be encoded into a 1024 dimension
vector so that the whole image can be converted into the 7x7x1024 tensor.
An autoregressive generative model can be used to predict the lower rows of
the image using the grid cells converted into the tensor of the upper row. For
example, the upper 3 rows converted into the 7x7x1024 tensor can be used for
the generation or prediction of the last 3 rows. The PixelCNN model can be
used for the above process.
The below image can be a representation of the above-given steps.
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 4/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Image source (https://arxiv.org/abs/1807.03748)
To train such a model we are required to introduce a loss function
(https://analyticsindiamag.com/ultimate-guide-to-loss-functions-in-tensorflow-
keras-api-with-python-implementation/) that can calculate the dissimilarity
between the patch predictions more formally saying a measure of similarity if
between the positive pairs and negative pairs. Mostly the loss function uses the set
X of N patches., where the set X can be considered as the set of N-1 samples
which are negative and 1 positive sample. This loss function can be calculated by
estimating the difference between noise and contrast. And the loss function can also
be called the infoNCE function where NSE stands for noise-contrastive estimation.
Image source (https://paperswithcode.com/method/infonce)
Optimizing this loss will result in a function for estimating the density ratio, which
is:
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 5/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Image source (https://paperswithcode.com/method/infonce)
The above-given function is very similar to the log softmax function which is often
used for calculating the similarity between the prediction and actual values or
original values. In the paper “Representation learning with contrastive predictive
coding (https://arxiv.org/abs/1807.03748),” the idea of CPC has been introduced,
where we can find a comparison between the accuracy of models using different
techniques. Which is shown in the below table.
Methods Accuracy
Motion Segmentation 27.6
Exemplar 31.5
Relative Position 36.2
Colorization 39.6
Contrastive Predictive Coding(CPC) 48.7
In the above table, we can see the performance of the CPC method for
representation learning which is still far from the many of the supervised learning
models like ResNet-50 with 100% labels on the Imagenet has 76.5% top-1
accuracy. An update on the CPC is used for increasing the accuracy of the model
which can be called Instance Discrimination Methods. The next section of the
article is an introduction of the Instance Discrimination Methods
Instance Discrimination Methods
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 6/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
As we have seen in the CPC, the method was applied to the part of the images. In
comparison to the CPC, the Instance discriminative methods have the basic idea to
apply the CPC in the whole image. Images with their augmented version can make
a positive pair and they should have a similar representation and either image with
its augmented version should have a different representation.
Image source (http://cdn.linkresearcher.com/46af3lum-l5oa-g10q-mip4-fn87rog3)
The main motive of the image augmentation is that if any representation has been
learned from the model that should not be varied and the augmented image can be
horizontal flip, random crop, different color channel, etc. by putting an augmented
image as input can change the image but there class information learned by the
model should not be changed.
In this method the process of the model can be divided into three basic steps:
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 7/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Give the input to the model as an image with its randomly augmented version
to make a positive pair and also feed the model with negative samples with
their augmented version.
Encode the image pair using any encoder and get the labels or representation
on the image and also use the encoder for rep[resentation of the negative
samples.
Apply InfoCPC to cross-check the similarity level between the positive pair
and the dissimilarity level between the positive and negative pairs.
There are two papers SimCLR (https://arxiv.org/abs/2002.05709) and Momentum
Contrast (MoCo) (https://arxiv.org/abs/1911.05722) which have wired on the
instance discrimination methods and the major difference between them is how
they handle the negative samples. We can use their techniques for making self-
supervised learning models in the field of computer vision.
Final Words
In this article, we have seen that in the field of computer vision the self-supervised
learning is a representation learning method where we can use the supervised
learning models to make the data labeled which can be very helpful in reducing the
cost, time, and effort in the labeling of the data. There are various models based on
contrastive learning like MoCo and SimCLR which can be used in computer
vision for self-supervised learning.
More Great AIM Stories
PyTorch Profiler: Major Features & Updates (https://analyticsindiamag.com/pytorch-profiler-
major-features-updates/)
Guide To Differentiable Digital Signal Processing (DDSP) Library with Python Code
(https://analyticsindiamag.com/guide-to-differentiable-digital-signal-processing-ddsp-library-
with-python-code/)
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 8/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Dataiku 9 Released: Major Features & Updates (https://analyticsindiamag.com/dataiku-9-
released-major-features-updates/)
Clinical Site Visits and Artificial Intelligence (https://analyticsindiamag.com/clinical-site-visits-
and-artificial-intelligence/)
You Look Like A Thing & I Love You: GPT-3 Learns To Flirt
(https://analyticsindiamag.com/you-look-like-a-thing-i-love-you-gpt-3-learns-to-flirt/)
Are Feature Stores The Next Big Thing In Machine Learning?
(https://analyticsindiamag.com/are-feature-stores-the-next-big-thing-in-machine-learning/)
(https://analyticsindiamag.com/author/yugesh-vermaanalyticsindiamag-
(h // i fd l i i di / i /)
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the i… 9/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
(https://microsoftdatanext.analyticsindiamag.com/register-now/)
Our upcoming Events
Conference, in-person (Bangalore)
Rising 2022 | Women in AI Conference
8th Apr
Register
(https://rising.analyticsindiasummit.com/)
Webinar
Masterclass on AI innovation with oneAPI by Intel
13th Apr
Register
(https://register.gotowebinar.com/register/4986769438973359629)
Workshop, Virtual
Accelerate Python* for data science and machine learning with oneAPI AI Analytics
Toolkit
22nd Apr
Register
(https://register.gotowebinar.com/register/8051350238254797839)
Conference, Virtual
Data Engineering Summit 2022
30th Apr
Register
(https://des.analyticsindiamag.com/ticket-pricing/)
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the … 10/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Conference, in-person (Bangalore)
MachineCon 2022
24th Jun
Register
(https://machinecon.analyticsindiamag.com/tickets/)
Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep
Register
(https://www.analyticsindiasummit.com/about/buy-tickets/)
3 Ways to Join our
Community
Discord Server
Stay Connected with a larger ecosystem of data science and ML Professionals
JOIN DISCORD COMMUNITY
(HTTPS://DISCORD.GG/SBTJ3JDEAZ)
Telegram Channel
Discover special offers, top stories, upcoming events, and more.
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the … 11/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
JOIN TELEGRAM
(HTTPS://T.ME/+TRPAPV7GNN2OZ1AZ)
Subscribe to our
newsletter
Get the latest updates from AIM
SUBSCRIBE
MORE FROM AIM
OUR M I SSI ON I S TO B R I N G A B OUT B E TTE R - I N F OR M E D A N D M OR E C ON SC I OUS
DE C I SI ON S A B OUT TE C H N OLOG Y TH R OUG H A UTH OR I TATI V E , I N F LUE N TI A L,
A N D TR USTW ORTH Y J OUR N A LI SM .
S H A PE T H E F UT URE O F A I
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the … 12/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
C ON TA C T US ⟶
(H TTP S: // A N A LYTI C SI N DI A M A G .M YSTA G I N G W E B SI TE .C OM / C ON TA C T- US/ )
(https://analyticsindiamag.com)
(https://www.linkedin.com/company/analytics-
(https://www.facebook.com/AnalyticsIndiaMagazine/)
(https://www.youtube.com/channel/UCAlwrsgeJavG1vw9qSFOUmA)
(https://twitter.com/@analyticsindiam)
(https://www.instagram.com/analyticsindiamagazine/)
india-magazine)
About Us
Advertise
Weekly Newsletter
Write for us
Careers
Contact Us
RANKINGS & LISTS
Academic Rankings
Best Firms To Work For
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the … 13/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
Top Leaders
Emerging Startups
Trends
PeMa Quadrant
RESOURCES
Python Libraries for data science
Best Firms for Data Scientists Certification
OUR BRANDS
AIM Research
AIM Recruits
AIM Leaders Council
VIDEOS
Documentary – The Transition Cost
Web Series – The Dating Scientists
Podcasts – Simulated Reality
Analytics India Guru
The Pretentious Geek
Deeper Insights with Leaders
Curiosum – AI Storytelling
OUR CONFERENCES
Cypher
The MachineCon
Machine Learning Developers Summit
The Rising
Data Engineering Summit
AWARDS
Analytics100
40 under 40 Data Scientists
Women in AI Leadership
Data Science Excellence
EVENTS
AIM Custom Events
AIM Virtual
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the … 14/15
4/7/22, 2:53 PM A Guide to Self-Supervised Learning in Computer Vision
MACHINEHACK
For Organizations
Hackathons
Discussion Forum
Job Portal
Mock Assessments
Practice ML
Courses
NEWSLETTER
Stay up to date with our latest news, receive exclusive deals, and more.
Enter Your Email Address
SUB SC R I B E ⟶
© Analytics India Magazine Pvt Ltd 2022
Terms of use (https://analyticsindiamag.com/terms-use/)
Privacy Policy (https://analyticsindiamag.com/privacy-policy/)
Copyright (https://analyticsindiamag.com/copyright-trademarks/)
https://analyticsindiamag.com/a-guide-to-self-supervised-learning-in-computer-vision/#:~:text=In computer vision%2C the basic,objects presented in the … 15/15