Review: IEF — Iterative Error Feedback (Human Pose Estimation)

Last Updated on July 20, 2023 by Editorial Team

Author(s): Sik-Ho Tsang

Originally published on Towards AI.

A Review on IEF U+007C Towards AI

Outperforms Tompson NIPS’14, and Tompson CVPR’15

**Getting Better and Better From Left to Right With Iterative Error Feedback (IEF)**

In this story, IEF (Iterative Error Feedback), by UC Berkeley, is briefly reviewed. Instead of directly predicting the outputs in one go, a self-correcting model is used to progressively change an initial solution by feeding back error predictions. This is a 2016 CVPR paper with more than 300 citations. (

Sik-Ho Tsang @ Medium)

Outline

IEF Architecture Overview
Training
Ablation Study
Comparison with State-Of-The-Art Approaches

1. IEF Architecture Overview

xt, the concatenation of input image I with a visual representation g, input to the model f. With 3-channel RGB images and K heatmaps for K key points, there are K+3 channels for xt.
Then model f output/predict a “correction” et.
yt+1 = et + yt to obtain a new y.
yt+1 is converted into a visual representation of g. g is 2D Gaussian having a fixed standard deviation and centered on the keypoint location.
This procedure is initialized with a guess of the output (y0) and is repeated until a predetermined termination criterion is met.
y0 is the median of ground truth 2D keypoint locations on training images

2. Training

2.1. Loss Function

et and e(y, yt) are the predicted and target bounded corrections, respectively.
h is a measure of distance, such as a quadratic loss.
T is the number of correction steps taken by the model. T=4.

2.2. Fixed Path Consolidation (FPC)

The above is the iterative process, but we only got the final ground-truth. We do not have the intermediate ground-truth.
The simplest strategy is to predefine yt for every iteration using a set of fixed corrections e(y, yt) starting from y0, obtaining (y0, y1, …, y).
The target bounded corrections for every iteration are computed using a function:

where k is the k-th keypoint. L denotes the maximum displacement for each keypoint location. L = 20 pixels. u^ is the unit vector of u.

2.3. ConvNet

ImageNet pre-trained GoogLeNet is used.
The conv-1 filters are modified to be operated on 20 channel inputs. The weights of the first three conv-1 channels were initialized using the weights learned by pre-training on Imagenet. The weights corresponding to the remaining 17 channels were randomly initialized with the Gaussian noise of variance 0.1.
The last layer of 1000 units that predicted the Imagenet classes is discarded. It is replaced with a layer containing 32 units, encoding the continuous 2D correction expressed in Cartesian coordinates (the 17th ”keypoint” is the location of one point anywhere inside a person used in both training and testing).

3. Ablation Study

**PCKh-0.5 results on the MPII validation set**

3.1. Iterative v/s Direct Prediction

IEF that additively regresses to keypoint locations achieves PCKh-0.5 of 81.0 as compared to PCKh of 74.8 achieved by directly regressing to the keypoints.

3.2. Iterative Error Feedback v/s Iterative Direct Prediction

IEF achieves PCKh-0.5 of 81.0 as compared to PCKh of 73.4 by iterative direct prediction.

3.3. Importance of Fixed Path Consolidation (FPC)

Without FPC, the performance drops by almost 10 PCKh points on the validation set.

3.4. Learning Structure Outputs

As a baseline, regression gets 64.6.
The IEF model with a single additional input channel for the left knee gets PCKh of 69.2.
Furthermore, the IEF model over both the left knee and left hip gets PCKh of 72.8.
Finally, modeling all joints together with the image obtains a PCKh of 73.8.

3.5. Visualization

4. Comparison with State-Of-The-Art Approaches

4.1. MPII

IEF outperforms SOTA such as Tompson CVPR’15.

4.2. LSP

**Person-centric PCP scores on the LSP dataset test set**

There is also no marking point on the torsos, so the 17th keypoints are initialized to be the center of the image.
IEF outperforms SOTA such as Tompson NIPS’14.

Reference

[2016 CVPR] [IEF]
Human Pose Estimation with Iterative Error Feedback

My Previous Reviews

Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [DeepLabv3+]

Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [Attention U-Net] [RU-Net & R2U-Net] [VoxResNet] [DenseVoxNet][UNet++] [H-DenseUNet] [DUNet]

Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SR+STN]

Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM] [FCGN] [IEF]

Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]

Generative Adversarial Network [GAN]

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication