Finger Vein Recognition with Hybrid Deep Learning Approach

Finger vein biometrics is an identification technique based on the vein patterns in fingers, and it has the benefit of being difficult to counterfeit. Due to its high level of security, durability, and performance history, finger vein recognition captures our attention as one of the most significant authentication methods available today. Using a mixed deep learning approach, we investigate the challenge of identifying the finger vein sensor model. Thus far, we use Traditional LSTM architectures for this biometric modality. This work also suggests a brand-new hybrid architecture that shines due to its compactness and a merging with the LSMT layer to be taught. In the experiment, original samples as well as the region of interest data from eight freely available FV-USM datasets are employed. The standard LSTM-based strategy is preferable and produced better outcomes, as seen by the comparison with the earlier approaches. Moreover, the results show that the hybrid CNN and LSTM networks may be used to improve vein detection performance.


Introduction
Personal identification technology has a huge number of security applications as a result of the evolution of information threats. In this regard, intelligent individual trait identification through user-friendly and secure identification systems has recently received interest on a global scale. Biometric techniques utilize physiological traits like fingerprints, faces, iris, voices, digital signatures, and so on, which makes them less vulnerable to theft or replication than traditional identifying systems like secret keys and passwords. Hence, biometrics has recently grown in favor as a very trustworthy authentication technique for identifying people (Tamang et al.,2022).
A person can be automatically recognized using extrinsic or intrinsic biological facts using biometric data approaches. Often employed in recognition are extrinsic components of the human body such fingerprints, the face, the iris, and palm prints. Due of their easy duplication and potential risk to the identification system, these characteristics provide security issues. Intrinsic characteristics, as opposed to these modalities, identify hidden skin features like vein patterns. These modalities take into account a wide range of factors, such as universality, individuality, and permanence. They provide a great deal of privacy for high-security application situations like banking, forensics, and legal assistance since they are not susceptible to religious conflict, are immune to skin illnesses, and give these benefits .
Because of the following benefits (Ma et al.,2021), finger vein detection has received a lot of attention.
(1) Vein images are obtained in a noncontact, more comfortable and hygienic manner.
(2) Vein patterns do not alter as people age. (3) Short bursts of near-infrared light are used to produce vein pictures by passing them through the fingers and through the blood hemoglobin in the vessel, which absorbs near-infrared light to form a vein image. As a result, Neural Networks are prominent approach for finger vein identification systems inspired by CNNs exceptional performance in image classification (Hu et al.,2020). Generally, CNN suffers from an over-fitting issue that might cause low recognition accuracy with testing data while retaining high recognition accuracy with training data. Our work uses data augmentation and dropout methods to solve this problem, which can lessen the effects of the over-fitting issue . Our work uses a hybrid technique to tackle this problem, which may lessen the effects of the over-fitting issue.
Contribution: To circumvent such constraints, in this study, we present a hybrid system for finger vein recognition: a CNN-based hybrid long short-term memory (LSTM) network that allows precise feature extraction to enable reliable detection of finger vein pictures independent of the acquisition processes of current datasets.
The detailed vein characteristics from the input photos were captured using a hybrid conventional LSTM that has many convolutional and subsampling layers. The suggested approach, in contrast to other works, employs CNNs in parallel and integrates their outputs into LSMT, allowing us to use the information included in two pictures rather than just one. The rest of this essay is structured as follows. The suggested model is presented in Section 2, which also describes the network design and data preparation. The experiments and their findings are given in Section 3. Lastly, Section 4 offers final observations.

Related Work
Image capture, preprocessing , feature extraction, and feature matching are the four steps that may be taken to deconstruct a biometric identification system based on finger veins. After obtaining pictures of veins with NIR optical imaging methods, the images are processed in a variety of ways to make them better. These ways include image filtering (Cho, et al.,2012) area of interest (ROI) extraction, and image normalization (Sun, et al.,2021). The following step, feature extraction, involves extracting discriminative qualities from individual vein pictures within the improved images in order to achieve high recognition performance during feature matching. Building finger vein identification systems based on conventional mathematical models has been the subject of many research projects, including machine learning (ML (Aglio-Caballero et al.2018) and deep learning (Zeng et al.,2018) (Zeng,Jalilian et al.,2018).

Deep Learning Based Finger Vein Recognition
Since "big data and DL" have evolved in many images processing tasks, including classification item recognition and digital image processing (Salman;Soud, 2018), the application of DL for rapid and automated feature extraction from pictures of finger veins has been investigated . Because to the flexibility of feature representations, a DL method can be a promising new option for finger vein detection regardless of the vein patterns' shape or orientation. With a big enough training set, a deep learning (DL) technique like a convolutional neural network (CNN) might provide excellent feature extraction capabilities with vein pictures and rapidly adapt to learning those feature representations (Qin et al.,2017)  .
Researchers in the field of deep learning have proposed utilizing finger veins as a form of user identification. Using convolutional neural networks (CNNs), (Kim et al.,2018). created a multimodal biometric identification system that combines finger vein and finger shape to verify a person's identity. Many fusion methods, such as weighted sum, weighted product, Bayesian rule, and perceptron rule, were used to merge the two features at the score level using a ResNet  (Liu et al., 2017).
Local dynamic thresholding was utilized by Radzi et al. to produce a rough shape picture of a previously acquired finger-vein image. CNN was then used to do feature extraction and classification without the use of a separate feature extraction phase (Radzi et al., 2016) .
Traditional DL-based finger vein identification algorithms have demonstrated promising recognition performance, but these results have been hampered by serious faults, notably those related to substandard feature extraction techniques. Pictures of finger veins may show not just vein patterns but also variations in noise and intensity brought on by the muscles and forms of the fingers (Normakristagaluh et al 2022). Previous methods employ shoddy feature extraction networks, which prevents models from acquiring distinctive characteristics of finger vein patterns. As a result, it is unable to obtain a distinct representation of each subject's vein patterns, which leads to poor accuracy.
Moreover, the infrared imaging databases of finger veins that are now accessible offer incredibly poor visual quality in vein patterns. Low picture quality requires the creation of an effective preprocessing pipeline and a potent feature extractor in order to enhance recognition performance (Noh et al., 2012).

Methods
The objective of this paper is to recognize the finger vein-based biometric system approach to identify persons.
The whole model solution of the finger-vein recognition system proposed in this study is shown in Figure 1. In our solution, the vein finger recognition based on multiple stages starts with finger vein image preprocessing reading from FV-USM public standards dataset . The finger image in this collection were collected in two sessions, with each individual contributing four fingers to produce 492 finger classes. The vein image's Region Of Interest (ROI) is 300 100 pixels. The first session's photographs were utilized for training, while the second session's images were used as test images.
Different acquisition systems were used to acquire the vein Images from the databases. Unfortunately, infrared pictures are sensitive to lighting and have extensive backgrounds, thus pretreatment is critical. Pre-processing is the initial stage in all modalities of biometric recognition for adjusting input pictures to the subsequent phases. It combines all of the procedures of segmentation, scaling, and filtering to improve picture clarity and hence enable characteristic extraction.
The first step in isolating the finger from the backdrop and deleting undesirable regions is to extract the ROI. ROI pictures can be found in the FV-USM databases.

Contrast limited adaptive histogram equalization (CLAHE)
To improve the quality of images, we used Contrast Limited The most often utilized improvement approach is the Adaptive Histogram (CLAH). By raising the contrast of tiny places, this approach normalizes the brightness of the image. All photos from the database were downsized to 300 100 pixels in order to be subjected to the same technique (Pour et al., 2018).

De-Blur Each Region Using Regularized Filter
This strategy is effective when there is a lack of understanding concerning noise. A restricted least squares restoration technique with a regularized filter is used to recover the blurred and noisy picture (Sada et al., 2020).

Canny Edge Detection
Edge detection is a method of contouring to find optimal edges in a finger vein image. The method of detecting sharp edges can be performed with the following steps: To show the soft edges of the image, the image is first smoothed using a filter, such as a Gaussian Filter, etc., that may eliminate noise from the original image. Finding out the image's color gamut is the next step. Calculating the gradient size or edge strength | G | (Sekehravani et al., 2020) As follows: …………………………….(1) (Sekehravani et al., 2020) where Gx denotes the gradient in the x-direction and Gy denotes the gradient in y-direction, as shown in Figure 2. problem is one of these difficulties. Multicollinearity occurs when the input features of a dataset have a substantial association with more than one of the dataset's other attributes. This factor reduces the efficacy of regression and classification models. The key tactic used to mitigate the consequences of multicollinearity is feature reduction methods. PCA, or Principal Component Analysis, is a multicollinearity-based statistical approach. In PCA, highly correlated variables are employed. The primary steps in the PCA method are feature space transformation via the connection between attributes, low-dimensional feature space mapping to achieve the aim of dimension reduction, and feature space analysis of the converted feature space (Tian,, 2022). Unsupervised dimensional reduction techniques like PCA lower the size of the data by correlating the input characteristics (Jacob;Darney., 2021).
The optimal transformation matrix is determined by pinpointing the most crucial adjustments to the initial domain. This is the most important consideration while selecting Computers. In general, the directions with the most variability also include the most class-specific information, therefore they are the ones prioritized when picking the PCs recovered from the PCA. The most prevalent form of PCA is produced via a basic linear projection that optimizes variance in the projected space. While average values that equal 2 are considered in this study, the p dimension varies for each image. Afterwards, the PCA features are averaged to give the class dimension for each picture. The PCA average is then reorganized. In order to optimize the transformation matrix, the most important changes in the initial space are found. The primary aspect to consider while selecting the primary components, PCs. The primary criterion for choosing the PCs obtained from the PCA is that the directions with the highest level of variability have the most class-specific data. The most prevalent form of PCA is produced via a basic linear projection that optimizes variance in the projected space. Each image has a different p dimension since our study accounts for 99 percent of variance. Afterwards, the PCA features are averaged to give the class dimension for each picture. Finally, we update the PCA average.

Normalization
After that, all photographs are standardized to the same scale. Image normalization is a technique for limiting the intensity range of pixels. The general form of normalization is shown in Equation (1). Fnorm = (F − Fmin) / (Fmax − Fmin) (Borkin et al., 2019) (2) where F is the normalization value, Fmin is the minimum pixel value, and Fmax is the highest pixel intensity value in relation to an image.

Results and Discussion
The results of the research can represent in:- Table 1 depicts the network structure for the feature extraction procedure in detail. The preprocessed picture with the dimensions 5313, 300, 100, is sent into the network as input.

Conventional LSTM Model
The first three convolutional layers utilize a kernel size of (3, 3) whereas the final LMST 64. The first is the convolutional (Conv2D) layer. It is similar to a collection of learnable filters. The model defined 64 filters for the first two conv2D layers. The kernel filter is used by each filter to change a portion of the picture (specified by the kernel size). The kernel filter matrix is applied to the entire picture. Filters can be viewed as picture transformations. The CNN can separate traits that are valuable everywhere.
Adding an LSTM layer to the network is done so as to increase the precision of the model. Via the flatten layer, the final feature maps are converted into a single 1D vector. After numerous convolutional/maxpool layers, a flattening phase is necessary before utilizing fully linked layers. It combines all of the individual features found in the preceding convolutional layers. Ultimately, I employed the characteristics in a single densely linked layer of an artificial neural network (ANN) classifier. The classification block employs a set of three FCLs to learn a nonlinear combination of the features extracted during the feature extraction procedure. The output of the feature extraction network's final lsmt1 block. The parameters of the network are shown in Figure 3. As opposed to CNNs, a conv2d has a higher total number of learnable parameters since each input unit has its own weight. In order to avoid overfitting caused by the large number of parameters and to improve network generalization, dropouts are utilized after all layers until the final one. The model parameters are shown in Figure 3.

Training, and Test Set Generation
The dataset has a total of six samples for each type. Four of the six samples were put aside for training, while the other two were set away for testing in order to increase the generalizability of the model by training with a variety of data sets. In other words, the entire number of picture samples for each class was divided into training and testing sets using a ratio of 0.9. In Figure  4, the train and test datasets are displayed. The finger vein images were generated after preprocessing and used PCA to extract their corresponding features then the features were passed through the network from each block of the network.

ISSN
A binary cross entropy loss function was used during model convergence during training with the intention of reducing overall loss. Using the Adam optimizer with a learning rate of 0.005, the loss function was decreased to achieve quicker convergence. During training, a batch size of 34 was employed, and 100 epochs altogether. It is crucial to cease training the model after the network hyperparameters have been changed and the training and validation losses have saturated. In order to achieve this, we used an early stopping strategy, where training was terminated when there were no longer any improvements in the loss of validation data relative to the training epochs. The generalized learning curves are depicted in Figure 5. Furthermore, as seen in Figure 6, the model showed extremely high convergence in the loss function for vein picture recognition.   Figure 5 compares the recognition accuracy of FVR-Net to that of several traditional systems. We can show that for high-quality pictures from the HKPU dataset, the suggested technique outperformed CNN and KNCNN algorithms by 99.89%. With the FVUSM dataset, the suggested method's recognition accuracy was around 0.2 percent, 30.029 percent, and 00.2 percent greater than the approaches. The suggested model's excellent recognition performance can be attributed to its strong feature extraction network. Subjecting vein patterns to a robustly built feature extraction network activates the most abstract representation of the input at each layer (Chen et al., 2022) ( Rosdi et al., 2021).