In-Depth Analysis of Identification Documents Through Vision AI
Last Updated on November 3, 2020 by Editorial Team
Author(s): Sean Chua
Author(s): Sean Chua, Martin Gomez
After many years abroad, Juan finally had a good reason to go back home to Iloilo in the Philippines. His parents and wife are in the vibrant festival city, and staying with them is his only son Kiko. The COVID-19 pandemic has hit his workplace, and the country that has been hosting him for work has mandated that all foreigners be sent home and be on a work-from-home model. This was pronounced by the said country’s government in the hope that it would help alleviate its already overburdened healthcare system.
Three days before Juan flew back to the Philippines, Juan diligently filled-up what is called an Electronic Case Investigation Form, otherwise known as e-CIF. A collaborative project between the Philippine Red Cross and the Philippine government, the e-CIF system enables any and all passengers coming into the Philippines to enter their personal and travel details in advance, have an expedited testing process when they land, track their specimen, get their test result, and receive a clearance certificate from both the Philippine Red Cross and the Department of Health’s Bureau of Quarantine.
This smooth, easy and quick process, of course, assumes that Juan filled up the form completely, accurately, and properly, following all the detailed instructions that go with every field in the form.
Let’s take the case of his buddy Tomas, a fellow OFW. Tomas was so bothered that he had to go back to the Philippines that he completely forgot to fill-up the e-CIF prior to his flight. Upon arrival at the Ninoy Aquino International Airport, Tomas was greeted by the Philippine Coast Guard and was asked to present his e-CIF QR code. Bearing none, Tomas was required to fill up an e-CIF on the spot. Fumbling with his phone, Tomas hurriedly filled-up the form and, at the last step, took a selfie, uploaded it to the form, and hit “Submit.”
As he was the last passenger to be processed by the Verification Officers on duty, Tomas’ e-CIF QR code was quickly scanned, his data skimmed, and a barcode assigned. Off Tomas went to the swabbers.
Therein lies the problem.
You see, as with any system, it’s garbage in garbage out. When Tomas filled-up his e-CIF in a mad rush, he never really got to review his form anymore. Thus, the mobile number he entered was all jumbled up; his email address had a typo — gmail.con instead of gmail.com — and he didn’t follow instructions for his identity photo.
Let’s discuss that last bit. In the e-CIF, everyone is required to upload a photo of their identification document. For passengers coming in from international flights, as in the case of Tomas, this is very easy as everyone should have a passport. By default, then, passports are used for all international passengers.
While some passenger information may be derived from the e-CIF form data, this feed of passport photos is critical as it is what can be used as the trusted source of the people coming into the country during this pandemic.
Specifically, the passports uploaded to the e-CIF go through a form of image processing known as OCR (Optical Character Recognition).
OCR enables the conversion of images containing written text into machine-readable data. Information from images of passport data pages is extracted using OCR.
To aid in the OCR process, passports and other forms of identification, such as national ID cards, have standardized the way they provide information to machines. One of the most important sections of a passport’s data page is its machine-readable zone (MRZ). MRZs contain a person’s pertinent information such as name, nationality, identification number, passport country of issue, birthdate, sex, etc. As there are multiple types of MRZs for multiple types of documents, passport MRZs are what’s called Type 3. The International Civil Aviation Organization (ICAO) defines the purpose of an MRZ as: “it may be used to capture data for registration of arrival and departure or simply to point to an existing record in a database.”
I was interested in understanding how this process worked and insights that I could derive from it. With the help of Google-fu and open libraries on Github, I was able to create a rudimentary process to satisfy my curiosity.
I decided to test my method with a dataset of 36,000 passport images that were uploaded to the e-CIF system.
For each image in a directory, I ran the following:
- The algorithm first initializes a “transformation matrix,” which contains the set of possible manipulations the image could undergo to detect the presence of an MRZ, assuming that there is indeed one.
- For consistency, we first resize all images to 500 pixels. This will enable us to set specific measurements later on.
- Since the MRZ is a region of text found within the passport, we need to devise a way to access and process it. To do this, we implement the bounding-boxes algorithm. This algorithm works by looking for regions of text throughout the photo and obtaining the coordinates of the “invisible text box” surrounding each of them. Specifically, these coordinates refer to the coordinates of the corners of the said “text box.” Note that since the MRZ is consistently found in roughly the same location within the passport, this algorithm is viable.
- After implementing the bounding boxes algorithm, we still need to manipulate the image itself to properly process the regions of text in the passport as not all images are consistent with each other. This algorithm, which performs transformations on the image itself, such as changing its color, performing translations and rotations, and the like, is able to more accurately determine whether or not an MRZ is present as well as the MRZ’s location. The manipulations in this particular algorithm include transforming the image into grayscale, blurring the image, and making some elements (such as text) more prominent within the image through lightening their outlines and darkening the background.
- On the other hand, if the algorithm cannot detect the presence of an MRZ, generally called the “region of interest,” we won’t be able to obtain our desired outputs. Our code then throws a “no roi found” error.
Let’s do a deep dive into Step 4. For valid images, the algorithm produces multiple versions of it:
- The first version is the resized image transformed into grayscale.
- The second version is the resized image with an applied Gaussian filter. The Gaussian filter is a convolution, a set of specific manipulations, applied to an image which “blurs” it by reducing sharpness and introducing extra “noise.” Since the MRZ is made up of a block of text and is found at the bottom of a passport, the algorithm is able to easily detect its location.
- The third version is the resized image with an applied blackhat transformation. This transformation makes darker elements of an image more prominent against a brighter background. Consequently, the transformation makes darker objects (such as text) more prominent.
- The fourth version is the resized image with an applied Scharr filter. The Scharr filter highlights edges and curves of an image by considering the first derivative of x (to highlight parts of the image along with the horizontal or x-axis) and the first derivative of y (to highlight parts of the image along with the vertical or y-axis). In short, it produces an outline of the image’s different elements.
- The final version is the resized image, with the irrelevant fields of the passport becoming cropped except for the MRZ (if detected). Now, this is ideally what we need.
We produce multiple versions of the same image so that we can appreciate the process, and we can also use these for debugging.
Using the 36,704 photos and running it through this algorithm, I found that 29,034 of them (79.103%) were deemed readable or “valid” (such as Juan’s). On the other hand, most of those disregarded by the algorithm consisted of selfies, extremely blurred images (such as Tomas’), images with fingers covering part of the MRZ, and the like.
To verify and extract the data from the images, I then ran the valid set through Google’s Vision AI. This confirmed that the previous process worked! In reality, though, hundreds of thousands of OFWs passports have already been processed through the e-CIF form. At the time of this research, there have already been about 225,000 forms that have been submitted. Considering this, we can hypothesize that system efficiency is achieved if 80% of all arriving passengers submit an e-CIF form with a proper identity verification picture. We can find the minimum number of people who satisfy this threshold by using statistics!
We can use a certain function known as the negative binomial distribution. This function takes in 3 parameters — the number of trials (i.e., the number of arriving passengers), the probability of success, and the threshold probability — which in this case are 225,000, 79.103%, and 80%, respectively. It then provides the estimated minimum number of successful trials that satisfy the threshold. After considering these parameters, I was surprised that about 178,000 passengers have to be successful in submitting a proper e-CIF document. From what we’ve seen, however, getting a large number of people to do such a simple thing might not be as easy as it seems.
Last October 6, the Philippine Red Cross hit 1 million tests. It’s also been some time since Juan and Tomas have gone home to their respective families.
We can only hope that during that time gap, more people have uploaded proper identification documents. But that’s for another article. 🙂
In-Depth Analysis of Identification Documents Through Vision AI was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI