Image Inpainting — Object Removal and Loss Recovery from Images using Deep Learning.

Bharat Ahuja
6 min readAug 4, 2020

How many times have you clicked a picture and have had something unwanted appear in the frame? How you wish you could’ve timed the shot a little better or have the ability to remove it seamlessly? Well, here we have proposed a solution that solves this problem.

The proposed solution also addresses the problem of loss recovery of images after transmission, which is a pressing problem in this digital world.

With my team members Yashvi Chauhan, Mohammed Ghalib Qureshi, Pawan Abburi, Ritwik Puri, Souvik Mishra, Giridhar and I, Bharat Ahuja as interns at LeadingIndia AI under the mentorship and guidance of Dr. Suneet Gupta, we propose a solution to the above mentioned problem.

LeadingIndia AI is an initiative that is aimed at research in Artificial Intelligence promoting AI skills amongst the engineering students of India to reduce the skill gap of the students and make these skills mainstream, as the future is Artificial Intelligence. We are proud to have been interns in this initiative.

We were lucky enough to be given an opportunity to be a part of the program and a special thanks to Dr. Deepak Garg, Dr. Madhushi Verma and our mentor Dr. Suneet Gupta for the internship project without whom our experience and project would not have been a success.

Our proposed solution to the problem above includes two methods- namely, Image Inpainting using Auto Encoder/Decoder approach and Image Inpainting using Bilinear GAN approach.

What is Image Inpainting?

Image inpainting is the task of filling “patches” in an image, this can be used in various aspects of image processing. One can remove unwanted parts of the image while keeping the integrity of the image intact. Image inpainting presents a lot of implications in real life, such as recovery of a lossy image after transmission, different varieties of image/painting restorations.

This can be done in a number of ways, the traditional one being image processing wherein the general/most basic idea would be to:

  • Detect a patch/region to be filled in the image
  • Go pixel by pixel from outside the region to inside/center of the region by filling with the respective colors from the adjacent pixel

But a more robust way would be to use deep learning wherein we use some form of neural networks to automatically do the thing for us which in turn would provide us with a better result than the traditional way as these “patches” will be patched via predictions based on dataset which our deep learning model will be trained on.

There are generally 2 ways to do this in deep learning: GAN (Generative Adversarial Networks) and Autoencoder/Decoder and our team was allocated the Autoencoder/Decoder method.

So what is the Autoencoder/decoder approach?

Autoencoder is basically a neural network which has 3 layers namely: input, middle/hidden/encoding and output/decoding layers. What we basically do is compress our input and then reconstruct/extract the output by decompressing it.

Autoencoder/Decoder Architecture

An Autoencoder consist of three layers:

  • Encoder- In the above architecture, the encoder compresses the input image into a ‘latent’ space representation. Briefly explained, The encoder layer compresses the input image and then encodes it in a reduced dimension.
  • Middle Layer: The middle layer provides the compressed input to the output /decoder layer to decompress it
  • Decoder: This layer of the above architecture decodes the image, that is it upscales the encoded image into its original dimension. During this stage, loss is encountered during the reconstruction of the encoded image and it is reconstructed from the latent space representation as mentioned in the above diagram.

The task at hand is to remove the need of expensive post-processing and blending operations, while dealing with irregular holes/masks robustly and produce a meaningful and semantically correct image. For this purpose, we thought of removing the convoluted layers by replacing them with partially convoluted layers and mask-updates.

Our approach has a series of Conv2D layers along with Transpose layers to achieve the desired result, but we introduced partially convoluted layers instead of the traditional Conv2D layer along with updating of the mask in each layer, thus creating a PConv2D layer. A partial convolution layer comprises of a masked convolution operation which has been re-normalized and then is followed by a mask-update setup. Doing so makes the model not consider all the pixels that are part of the patch/mask. The U-Net structure was used to design the architecture of the model. The activation function used for triggering the layers is ‘relu’ and for the last layer we have used the ‘sigmoid’ function.

The prediction model gave better results than traditional Autoencoder-decoder models. The use of partially convoluted layers improved the colour-correction, kept the edges intact, and saved us the expensive blending methods to create a semantically correct image composition. The result so obtained is shown below.

Result

And.. what is the GAN approach?

The Generative Adversarial Network (GAN) generates candidates while the discriminator network evaluates them. Typically, the generative network learns to map from a latent space to a data distribution of interest, while the discriminative network distinguishes candidates produced by the generator from the true data distribution.

To get the fine tuned grained visual output, we propose a Bilinear GAN architecture.

The above mentioned architecture is based on Global and Local attentive Image Inpainting. Basically the model consists of two stages namely a coarse network and a refinement network. For generating the output, the coarse network is designed with two branches- regular and attentive. The coarse network is based on the encoder-decoder network with Mask pruning Global attentive module.

It calculates the global dependencies among the features and prunes out the less important features. For a coarse reconstruction of masked images, we use a weighted sum of a L1 loss and structural similarity (SS I M) loss explicitly.

Our proposed Bilinear GAN Architecture

Our proposed method defines the given method of Image Inpainting combined with the Bilinear Convolutional architecture; so that we could generate a more robust and flexible model which could generate a realistic and fine tuned grained image as output and increase the overall accuracy to 2–3%.

The result that we obtained using this approach is shown below.

Result

Experience at LeadingIndia AI

Our experience at this internship was really fulfilling and exhilarating.

We had various learning outcomes in this project which include newfound topics, brushing up on old topics and learning about integrating various components of a project together into a single unit. On a broader level, we learnt about effectively dividing the parts of the project into simpler modules, then working on them individually and then reconstructing the whole modules together into one working module.

We learnt about about communicating effectively and teamwork skills. We also learnt about various technologies, frameworks, deep learning, neural networks and development techniques.

We’ve become even better programmers after this internship and this experience reduced our learning curve and skill gaps by a significant amount.

We thank the team at LeadingIndia AI for providing us with this wonderful opportunity.

You can view our work here: https://github.com/69690/Flask-Webapp

--

--