Develop an android application for enhancing lowlight images

Motivation

The advent of smartphones marks a significant milestone in the evolution of technology, revolutionizing how people connect and interact with the world Two decades ago, it was unimaginable that a compact device could enable users to not only communicate globally but also capture high-quality photographs that were once the exclusive domain of professional photographers.

The rise of smartphone photography has significantly impacted the digital camera industry, leading to a staggering 84 percent decline in camera shipments from 2010 to 2018, according to Statista data from CIPA members.

Sharing photos has become effortless with smartphones, but poor lighting can result in underexposed images that lack detail and visual appeal As illustrated in Figure 1.1, these dark photos not only hinder viewer enjoyment but also complicate computer vision tasks like object detection and segmentation, potentially conveying incorrect messages during machine learning processes.

The Camera & Imaging Products Association (CIPA) is a global industry organization comprised of companies involved in the development, production, and sale of imaging-related devices Notable members include leading brands such as Kodak, Sony, Nikon, Canon, Olympus, and Casio.

Figure 1.1: Under exposured photos from Adobe5K dataset [1]

Manual retouching of low-light images demands significant effort, including adjustments to brightness, contrast, and color This process can be particularly challenging for average users lacking specialized skills, and even professional editors may find it exhausting when handling large datasets.

To advance research in computer vision and assist smartphone manufacturers in improving the camera quality of low-end devices without hardware modifications, we conducted a study on low-light image enhancement.

Purpose

This thesis aims to develop a deep learning algorithm for enhancing low-light images and to create an Android application for low-light image enhancement The effectiveness of our approach will be assessed by comparing it to existing methods in the field.

Thesis structure

We lay out all the work we have done in the followings:

• Chapter 1: Gives an overview about our topic including the motivation, purpose and structure of the thesis.

• Chapter 2: Introduces related knowledge to this topic, including common definitions in image processing and deep learning.

• Chapter 3: Lays out relevant researches to the topic.

• Chapter 4: Lists out related datasets to the topic and introduces datasets we experiment on.

• Chapter 5: Introduces our approach including the model architecture, loss functions and training dataset.

• Chapter 6: Presents the experiments and results.

• Chapter 7: Provides the implementation details of our Android application.

• Chapter 8: Sums out what we have done and evaluate, as well as future developments.

In this chapter, we introduce related concepts and definitions we encountered during the research process.

Low-light Image

Definition

Low-light is the environment conditions where illuminance 1 does not meet the normal standard [26] Images captured in this environment are usually called low-light images.

There is no universal standard for low-light images due to the difficulty in defining specific theoretical values, leading to varying standards among camera sensor manufacturers For instance, Hikvision categorizes low-light environments into distinct classifications.

• StarLight Level: Less than 0.001 Lux.

1 Illuminance or Light Level, is the amount of light measured in a plane surface [24]

2 The lux is the SI derived unit of illuminance, measuring luminous flux per unit area

Low-light Image Characteristics

A low-light image, as implied by the name, has characteristics such as [48]:

• Narrow gray range and color distortion

Figure 2.1 shows some examples of low-light images and their corresponding grayscale histograms.

Figure 2.1: Low-light images and their corresponding histograms

The pixel values in these images are sparsely distributed within a low range, specifically from 0 to approximately 70, resulting in minimal visible differences across various sections of the images This limited range creates a small gap between the maximum and minimum pixel values, especially when compared to the histograms of standard images, as illustrated in Figure 2.2 below.

Figure 2.2: Normal images from Adobe5K dataset [1] and their corresponding histograms

Low-light Image Enhancement

This thesis primarily addresses contrast enhancement techniques aimed at improving low-light images, which often suffer from low contrast resulting in faded objects and information loss Contrast refers to the disparity in luminance or intensity levels between different areas within an image A comprehensive literature review of various contrast enhancement methods will be presented in the following chapter.

Convolutional Neural Network (CNN)

Layer types of CNN

Convolution layer processes a 2D convolution on the inputs This layer uses filters (also called kernels), usually have size 3x3 or 5x5 [23].

Filters process input data by calculating the dot product between their weights and the inputs, resulting in a feature map Each filter matches the number of layers in the input volume, and the output features the same number of channels as there are filters.

Figure 2.3: How a filter is applied to the input

A grayscale input image sizeH×W, hyperparameters are filterF, strideS, paddingPand number of filtersC out The output size is shown in equation 2.1 below.

Filter sizeF: refers to the widthxheight of the filter mask.

StrideS: number of pixels shifts over the input.

PaddingP: expanding the input matrix by using zeros.

Number of filtersC out : decides the number of output channels.

CNN also has pooling layer which is used to decrease the input size, therefore the compu- tation speed is improved and large features can be learnt more efficiently.

There are two types of pooling usually used which are max pooling (used more common) and average pooling.

The Fully Connected (FC) Layer, a fundamental component of the Multi Layer Perceptron, features an activation function in its output layer In this architecture, each neuron from the preceding layer is interconnected with every neuron in the subsequent layer, ensuring comprehensive connectivity throughout the network.

The output from the convolutional and pooling layers is flattened and then processed by the fully connected (FC) layer for classification An illustration of this classification process using FC layers is provided in the accompanying figure.

Figure 2.7: Fully Connected Layer - each node is connected to every other node in the adjacent layer(Source:ujjwalkarn me)

Figure 2.8: A simple convolutional neural network

Bilateral Grid Slicing Operation

Bilateral Grid

In 2007, Chen et al introduced the Bilateral Grid, a novel approach that transforms the standard 2D representation of images—typically organized as a 2D array of pixels with spatial axes x and y—into a 3D array format This innovative representation enhances the way images are processed and analyzed.

Figure 2.9: 2D image to 3D bilateral grid [5]

An additional dimension representing pixel intensity is introduced, allowing for a better understanding of neighboring pixels that are adjacent in a 2D image but distant in the bilateral grid This unique structure of the bilateral grid enhances edge awareness, as the Euclidean distance within the grid aligns with intensity edges, effectively preserving important details in the image.

An 8 megapixel image can be transferred to a bilateral grid size70×70×10only, which is less than 50 thousand samples (one sample in space can cover up to 100 pixels and one sample in z corresponds to between 5 to 50 gray levels).

Bilateral also preserve edge in down-sampling while 2D down-sampling might tend to blur the edge.

Figure 2.10: Bilateral grid preserves edges easily [5]

Basic Usage of a Bilateral Grid

Given an image I normalized to [0,1], s s is the spatial sampling rate and s r is the range sampling rates, the bilateral gridΓis constructed as follows:

• Initialization: For all grid nodes(i,j,k),Γ(i,j,k) = (0,0).

• Filling: For each pixel at position(x,y): Γ([x/ss],[y/ss],[I(x,y)/sr])+ = (I(x,y),1) (2.2) where[ã]is the closest-integer operator This construction is notated asΓ=c(I).

Any function f that takes a 3D function as input can be applied to a bilateral gridΓto obtain a new bilateral gridΓ˜ = f(Γ).

Slicing is a process that mirrors the creation of a grid from an image By utilizing a bilateral grid Γ and a reference image E, a 2D value map M is generated through trilinear interpolation at the coordinates (x/ss, y/ss, E(x,y)/sr) This operation is represented as M = s E (Γ).

Edge-Aware Scattered Data Interpolation

The 2D map interpolates user-provided constraints while adhering to the edges of the underlying image It achieves a smooth interpolation in the grid domain, followed by slicing, rather than performing a piecewise-smooth interpolation in the image domain The user constraints are elevated into the 3D domain, where the goal is to minimize the variance of the grid values under these constraints.

In this section, we lay out research which are relevant to the topic ofLow-light Image En- hancement.

Low-light image enhancement aims to enhance both global and local contrast, resulting in images that are suitable for human viewing and computer analysis Additionally, it is essential to minimize noise while ensuring effective real-time performance.

Traditional Methods

Gray Transformation Method

Gray transformation method is transforming the gray values of single pixels into other gray values using a mathematical function The transform is processed on the image’s spatial-domain

[52] This method stretches the gray values of the pixels’ distribution.

Such that: f(x,y): input image g(x,y): output image

Image enhancement can be achieved by adjusting the coefficients, allowing for varying degrees of improvement A widely used formula for linear transformation is: g(x,y) = f(x,y)−f min / f max −f min (gmax−gmin) +gmin In this equation, f max and f min represent the maximum and minimum gray values of the input image, while g max and g min denote the maximum and minimum gray values of the output image.

The dynamic range of the image is transformed from [f min ,f max ] to [gmin,gmax], consequently, the brightness and contrast of image is enhanced.

There is also a local transformation called piecewise linear transformation which adjusts only a specific part of the image Its equation is as follow. g(x,y) 

Nonlinear transformation techniques, such as logarithmic and gamma functions, are employed to enhance image quality by adjusting the dynamic range These methods effectively stretch lower pixel values while compressing higher ones, resulting in improved visual contrast The standard logarithmic formula used in this process is g(x,y) = log(1 + c × f(x,y)), where 'c' serves as a control parameter.

The gamma function is followed. g(x,y) = f(x,y) γ (3.5) whereγ, usually a constant, is the correction parameter.

Drago et al introduced a logarithmic function method for image processing, while Huang et al developed an adaptive gamma correction algorithm Although the gray transformation method is straightforward and easy to implement, it lacks adaptability to different image types and low-light conditions.

Histogram Equalization Method

An image with pixel values widely distributed across all gray levels exhibits high contrast and a large dynamic range The Histogram Equalization (HE) method leverages this characteristic by utilizing the cumulative distribution function (CDF) to effectively redistribute the pixel values.

The gray level probability density function of an imageIis defined as: p(k) = n k

The cumulative distribution function (CDF) of the gray levels in an image \( I \) is defined by the equation \( N(k=0,1,2, ,L−1) \), where \( N \) represents the total number of pixels, \( n_k \) indicates the number of pixels at gray level \( k \), and \( L \) denotes the number of gray levels in the image The histogram equalization (HE) method enhances the original image by transforming it into an image with a nearly uniform gray-level distribution, which is expressed in the equation \( f(k) = (L−1)×c(k) \).

Numerous studies have explored low-light image enhancement techniques utilizing histogram equalization (HE) algorithms, including the equal-area dualistic subimage histogram equalization and weighted histogram equalization methods.

To this point, HE algorithms can be used effectively with other methods to improve the image brightness and contrast However it can also cause color distortion and noise [47].

Retinex Method

The Retinex method, rooted in the retinex theory, posits that human vision relies on three independent cone systems that generate distinct images based on lightness across various wavelengths Consequently, any image can be represented as the combination of its reflection and illumination components.

I(x,y) =R(x,y)L(x,y) (3.9) where R(x,y) is the reflection component and L(x,y) is the illumination component While reflection depends on the characteristics of object surface, illumination represents the environment light characteristics.

Some effective image enhancement algorithms based on the retinex method are:

• Single-scale Retinex (SSR) [22]: The reflection image is obtained by estimating the ambi- ent brightness. logR i (x,j) =logI i (x,y)−log[G(x,y)∗I i (x,y)] (3.10)

(x,y): position of a pixel in the image

The formula of Gaussian surround function:

G(x,y) =Ke − x 2 +y 2 σ 2 (3.11) whereσis the scale parameter and K is the normalization factor to ensure that the Gaussian function satisfies:

• Multiscale Retinex (MSR) [21]: MSR is proposed to balance the dynamic range compres- sion and color constancy, its equation is as follows.

Such that: i: three color channels k: Gaussian surround scale

N: number of scales (generally 3) ω: scale weights

While enhancing the image details and contrast, the MSR algorithm also produces better color consistency and improve visual of the image.

• Multiscale Retinex with Color Restoration (MSRCR) [42]: To avoid color distortion from enhancing color channels separately, Rahman et al proposed MSRCR, which includes color recovery factor C, proportional relationship among channels.

The formula for color recovery factor:

(3.14) where f represents the mapping function andC(x,y)is the color recovery factor:

The results of retinex algorithms are in the following figure.

Figure 3.1: Retinex method, from left to right: original, SSR, MSR, MSRCR [47]

Frequency-domain Method

The frequency domain reflects the rate of change in spatial pixels, where high frequency indicates rapid changes such as edges or boundaries Utilizing a high-pass filter can effectively remove these edges in the frequency domain, a task that proves more challenging in the spatial domain Key techniques in this frequency-domain approach include Homomorphic Filtering (HF) and Wavelet Transform (WT).

Homomorphic Filtering (HF) leverages the illumination-reflection model to separate illumination and reflection components by transforming them into a logarithmic domain sum This technique enhances high-frequency reflection components while suppressing low-frequency illumination components in the Fourier transform domain.

To process the illumination and reflection components separately, the logarithmic transformation is applied to both sides of equation 3.9: lnI(x,y) =lnL(x,y) +lnR(x,y) (3.16)

Then the image and components are transformed into frequency domain using Fourier transform:

F[lnI(x,y)] =F[lnL(x,y) +lnR(x,y)] (3.17) Which can be re-written as:

Select an appropriate high-pass filterH(u,v)to enhance contrast:

Inverse the image back to spatial domain: s(u,v) =F −1 (H(u,v)L(u,v)) +F −1 (H(u,v)R(u,v))

To obtain the final image:

G(x,y) =exp|hL(x,y)| ãexp|hR(x,y)| (3.21) The homomorphic filter:

H(u,v) = (γH−γL)H hp (u,v) +γL (3.22) whereγ L andγ H

Tiêu đề	Develop An Android Application For Enhancing Lowlight Images
Tác giả	Tran Thi Ngoc Diep, Luong Thanh Nhan
Người hướng dẫn	Dr. Nguyen Ho Man Rang
Trường học	Vietnam National University - HCMC University of Technology
Chuyên ngành	Computer Science
Thể loại	undergraduate thesis
Năm xuất bản	2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	79
Dung lượng	2,35 MB