Panography: Magic of Seamless Image Stitching
Last Updated on May 18, 2023 by Editorial Team
Author(s): Abhijith S Babu
Originally published on Towards AI.
We all might have used the panorama feature in our smartphone cameras. It helps us to create a high-resolution image that covers a wide angle. We have also seen google street view, where a large number of images are stitched together. But how does it actually work? It works on a technique known as panography. Let’s look at what panography actually is.
Panography is a computer vision technique in which multiple overlapping photos can be combined to create a high-resolution image. The resultant image can capture more details and context than a single image. This technique is used in situations where we need a wide and detailed image of a scene. Let us have a look at how Panography is done.
Multiple images of a scene have to be collected to create a photographic output. The images should be captured while maintaining a consistent level of exposure, focus, white balancing, and so on. It is important to ensure some overlap between adjacent images, which is necessary to provide a common feature between the images. The camera movement should be minimized to maintain a consistent perspective among all the images.
Feature detection and matching
Feature detection is a process of identifying points or regions in the input image, such as edges, curves, corners, and so on. These distinct features can be used to align and stitch the images together. One of the common approaches for feature extraction is local descriptors. They are small portions of the image around a feature point that can be converted to a fixed-length vector. It represents a feature such as texture or appearance.
For extracting features, we can use various techniques. SIFT (Scale Invariant Feature Transform) is an algorithm used to extract features irrespective of scaling, rotation, and illumination changes. SURF (Speed Up Robust Features) is another algorithm that is similar to SIFT but is very fast and robust. AKAZE (Accelerated-Kaze) is a newer technique that is designed to be faster than SIFT and SURF, and it is robust to blur and noise as well.
After detecting and describing the features in the input images, we have to match the corresponding features in each image. It is done by comparing the descriptors of each feature point and finding the closest match based on some distance metrics such as Euclidean distance. For a set of feature descriptors in each image, the Euclidean distance between two images can be calculated by taking the square root of the sum of squared differences between the corresponding descriptor values. The closest match for each feature point is defined as the one with the smallest distance. The matched features can be used to estimate the transform that maps one image to the other.
The images have to be aligned now so that they fall in the same coordinate system. The registration process involves finding a spatial transformation that maps each input image to a common coordinate system. The transformation is estimated by identifying corresponding features. The registration can either be direct or feature-based.
In feature-based registration, the corresponding feature points that are identified in previous steps are used to compute a transformation matrix known as homography. Homography is used to warp one image so that it aligns with the other image. Whereas in direct registration, the images are aligned based on their pixel values using methods such as cross-correlation or mutual formation. It is a slower but more accurate method compared to feature-based registration. Both methods can also be combined, i.e. the estimated alignment can be found using features and it can be refined by direct registration.
The aligned images are now blended using various techniques. Some of the blending techniques are:
Feathering: Feathering is a technique in which the opacity of the overlapping image is adjusted to create a smooth blending between images without any visible seams. This technique is useful if the background is consistent and the lighting of the images matches.
Cross-dissolve: Cross-dissolve blends multiple overlapping images by linear interpolation of pixel values. This method creates a smooth transition, but the seam will be visible if there is a difference in exposure.
Gradient-based blending: This technique can be used to capture complex textures and is useful when the images vary in lighting conditions. Here, a gradient is calculated between overlapping regions and the images are blended together by gradually changing the pixel values based on the direction of the gradient.
Multi-band blending: It is an advanced blending technique based on the frequency band in images. Here frequency can be thought of as the rate of change of intensity values in a pixel. A frequency band refers to the range of frequencies in an image that has a similar size or scale. For example, in an image, the shapes and large objects fall under the low-frequency band, while the small objects and details fall under the high-frequency band. In multi-band blending, the image is divided into different frequency bands, and each frequency band is blended separately and is finally combined to form a detailed final image.
Appropriate blending methods can be chosen based on the input images as well as on account of whether we want speed or accuracy.
Image distortion correction
The final output of blending might have some distortions due to the wide-angle view, which can be corrected by techniques such as homography and perspective correction. Homography correction is done by creating a transformation matrix that maps the distorted image to an undistorted reference image. This transformation matrix can be created using corresponding points in both images. Perspective correction can be done if we know the camera positions and orientations. It can be modeled based on the intrinsic and extrinsic parameters of the camera, such as focal length, image sensor size, the position of the camera in the 3D space, and so on.
Thus we have created a panoramic image of the scene. This technique is used in a variety of applications such as landscape photography, virtual tours, and even 3D modeling of terrain models.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI