Accelerate your data journey. Join our AI Community!


Computer Vision   Research

ECCV 2020 Best Paper Award | A New Architecture For Optical Flow

Author(s): Louis Bouchard

Computer VisionResearch

Photo by Cris Ovalle on Unsplash

ECCV 2020 Best Paper Award Goes to Princeton Team.
They developed a new end-to-end trainable model for optical flow.
Their method beats state-of-the-art architectures’ accuracy across multiple datasets and is way more efficient.

They even made the code available for everyone on their Github!
Let’s see how they achieved that.

Paper’s introduction

The ECCV2020 conference happened last week.
A ton of new research papers in the field of computer vision was out just for this conference.
Here, I will be covering the “Best Paper Award” that they gave to Princeton Team.

In short, they developed a new end-to-end trainable model for optical flow called “RAFT: Recurrent All-Pairs Field Transforms for Optical Flow.”
Their method achieves state-of-the-art accuracy across multiple datasets and is way more efficient.

What is optical flow?

Gif by:

First, I will quickly explain what optical flow is.
It is defined as the pattern of apparent motions of objects in a video.
Which, in other terms, means the motion of objects between consecutive frames of a sequence.
It calculates the relative motion between the object and the scene.
It does that by using the temporal structure found in a video in addition to the spatial structure found in each frame.

Gif by:

As you can see, you can easily calculate the optical flow of a video using OpenCV’s functions:

import cv2
import numpy as np
cap = cv2.VideoCapture("vtest.avi")

ret, frame1 =
prvs = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)
hsv = np.zeros_like(frame1)
hsv[...,1] = 255

ret, frame2 =
next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY)

flow = cv2.calcOpticalFlowFarneback(prvs,next, None, 0.5, 3, 15, 3, 5, 1.2, 0)

mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1])
hsv[...,0] = ang*180/np.pi/2
hsv[...,2] = cv2.normalize(mag,None,0,255,cv2.NORM_MINMAX)
rgb = cv2.cvtColor(hsv,cv2.COLOR_HSV2BGR)

k = cv2.waitKey(30) & 0xff
if k == 27:
elif k == ord('s'):
prvs = next


It just takes a couple of lines of code to generate it in a live feed. Here are the results you get from a normal video frame using this shortcode:

Code & Image by:

It is super cool and really useful for many applications.

Gif by:

Such as traffic analysis, vehicle tracking, object detection and tracking, robot navigation, and much more. The only problem is that it is quite slow and needs a lot of computing resources.

This new paper helps with both of these problems while producing even more accurate results!

What is this paper? What did the researchers do exactly?

Now, let’s dive a bit deeper into what this paper is all about and how it’s an improvement from current state-of-the-art approaches.
They improved the state-of-the-art methods in four ways.

Image by:

First, it can be directly trained on optical flow instead of requiring the network to be trained using an embedding loss between pixels making it much more efficient.
Then, regarding the flow prediction, current methods directly predict it between a pair of frames.
Instead, they optimized their computation time a lot by maintaining and updating a single high-resolution flow field.

With this flow field, they had to use a GRU block, which is similar to an LSTM block, in order to refine their optical flow iteratively, as the best current approaches do.

Image by:

This block allows them to share the weights between these iterations while allowing convergence using their fixed flow field when training.

Image by:

The last distinction between their technique and the other approaches is that instead of explicitly defining a gradient with respect to an optimization objective, using backpropagation, they retrieve features from correlation volumes to propose the descent direction.

How have they done that?

Image by:

All these improvements were made using their new architecture.
It is basically composed of 3 main components.

Image by:

At first, there is an encoder that extracts the per-pixel features from two different frames along with another encoder which extracts features only from the first frame, in order to understand the context of the

Image by:

Then, using all pairs of feature vectors, they generate a 4-dimensional volume using the width and height of both frames.

Image by:

Finally, they use an update operator which recurrently updates the optical flow.
This is where the GRU block is.
It retrieves values from the previous correlation volumes and iteratively updates the flow field.


Just look at how sharp the results are. While being faster than current approaches!

Watch the video showing the results:

They even made the code available for everyone on their Github!
Which I linked in below if you’d like to try it out.

Of course, this was a simple overview of this ECCV2020's best paper award winner.
I strongly recommend reading the paper linked below for more information.

OpenCV Optical Flow tutorial:
The paper:
GitHub with code:

If you like my work and want to support me, I’d greatly appreciate it if you follow me on my social media channels:

  • The best way to support me is by following me on Medium.
  • Subscribe to my YouTube channel.
  • Follow my projects on LinkedIn
  • Learn AI together, join our Discord community, share your projects, papers, best courses, find kaggle teammates and much more!

ECCV 2020 Best Paper Award | A New Architecture For Optical Flow was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓