Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.


Acheiving 33rd Rank (of 186) in a NASA Harvest Field Boundary Detection Challenge in 50 Epochs
Latest   Machine Learning

Acheiving 33rd Rank (of 186) in a NASA Harvest Field Boundary Detection Challenge in 50 Epochs

Last Updated on March 30, 2023 by Editorial Team

Author(s): Ronny Polle

Originally published on Towards AI.

A full description with ablations and code.

source : Zindi.Africa


1. Problem Statement

2. Approach

3. Key Takeaways

Problem Statement

NASA Harvest Field Boundary Detection challenge is a machine learning challenge organized by Zindi.Africa in collaboration with NASA Harvest situated at the University of Maryland, and Radiant Earth Foundation.

According to the problem statement from Zindi, small farms produce about 35% of the world’s food,and are mostly found in low- and middle-income countries. Mapping these farms allows policy-makers to allocate resources and monitor the impacts of extreme events on food production and food security. Unfortunately, these field-level maps remain mostly unavailable and in low and middle-income countries , where the food insecurity risk is highest. Combining machine learning with Earth Observation data from satellites like the PlanetScope constellation can help improve agricultural monitoring cropland mapping, and disaster risk management for these small farms.

In this challenge, the objective is to design machine learning algorithms for classifying crop field boundaries using multispectral observations. In other words, your model is expected to accurately segment out boundary masks delineating areas where there is crop field versus no crop field.


This is by far one of the trickiest if not the trickiest machine learning problems I have encountered. This could be explained by a fact that this is my very first official encounter with satellite data. And that notwithstanding, I set out to learn all that I could find pertaining to this task.

The Data

Here, we are dealing with a classic Satellite Image Time Series (SITS) problem. The time series is provided for six months. However, one is at liberty to select subset of months for modeling.

The data comprises set of satellite imagery and labels tiled into 256 by 256 chips totaling up to 70 tiles. A total of 1532 individual field boundaries have been located and annotated in these 70 tiles. With this information, my initial hypothesis is that each of the 70 tiles will contain at least a single field boundary.

The final train size after split is 57 chips versus a test size of 13 chips. Labels were supplied only for fields that could be completely demarcated inside the chips.

Here is how the file structure of each chip looks like:

  • Satellite imagery containing 4 bands [B01, B02, B03, B03] mapped to 6 unique timestamps [2021_03, 2021_04, 2021_08, 2021_10, 2021_11, 2021_12]
  • Rasterized labels mapping to the fields in the train set.

Image Preprocessing

In this section, we will dive into the image preprocessing and scaling pipeline.

The intuition behind my choice of image pre-processing was aimed at primarily creating weakly delineated boundaries in the images to enable the models gain better visual perception of the fields and also to offer a better supervised learning procedure. The end-goal is geared towards making the label masks reasonably detectable within their corresponding images.

My final pre-processing pipeline was heavily inspired by first place winning solution for Crop Detection from Satellite Imagery competition organized by CV4A workshop at ICLR 2020. I encourage you to check it out here.

The idea can be laid down in steps below:

1. Grab a field

2. Apply a square root function on a field to take care of outliers

3. Compute the mean and standard deviation of result in step 2

4. Standardize result in step 3 per-channel. Thus, for each channel, subtract out the mean and divide result by the standard deviation

#loading the 4 bands of the image
tile = random.choice(train_tiles)

bd1 ="{train_source_items}/{dataset_id}_source_train_{tile}/B01.tif")
bd1_array =
bd2 ="{train_source_items}/{dataset_id}_source_train_{tile}/B02.tif")
bd2_array =
bd3 ="{train_source_items}/{dataset_id}_source_train_{tile}/B03.tif")
bd3_array =
bd4 ="{train_source_items}/{dataset_id}_source_train_{tile}/B04.tif")
bd4_array =

field = np.dstack((bd4_array, bd3_array, bd2_array, bd1_array))

field = np.sqrt(field)

# standardization
for c in range(field.shape[2]):
mean = field[:, :, c].mean()
std = field[:, :, c].std()
field[:, :, c] = (field[:, :, c] - mean) / std

And that’s all it takes!


The model is a Unet-based architecture using pretrained Efficientnet-B7 as encoder.

Surprisingly, it offered good consistency over the loss, F1 and Recall metrics across both the train and validation sets.

The data was split based on a custom segmentation-based stratified split method.

Training is done utilizing AdamW optimizer with a learning rate of 1e-4 and weight decay of 1e-5, for only 50 epochs!

The truth is, I had originally planned to train for 1000 epochs; and it was only my last minute experiment before the competition ended. Also, my compute did not permit larger batch sizes. And additional data augmentation ideas could not be applied as my notebook was repeatedly crashing with only a custom mixup augmentation.

Key Takeaways

A summary of my experiments ( failed + successful )

Problem Formulations — I tried looking at the problem from different angles. And then I eventually discovered that binary segmentation offered consistently better results.

  • Multi-class segmentation
  • Multi-label segmentation
  • Regression
  • Binary segmentation
  • Sequence classification

Image Pre-processing Ablations

  • logarithmic/square root normalization of the bands pre-/post-stacking
  • min-max normalization of the bands pre-/ post-stacking
  • combination of logarithmic and min-max normalization
  • combination of square root and min-max normalization
  • standalone min-max, logarithmic and square root normalization pre-/ post-stacking

Finally, I discovered that applying standalone square root normalization (without min-max scaling) after stacking the 4 bands into a 4-dimensional field was better.

Model Ablations

  • Different pretrained model backbones — Efficient net variants(B1-B7), Resnet34, Resnet50, Seresnet34, Eefficientnet B5 + noisy student weights, VGG16 and VGG19-bn.
  • Multi-task Resnet34 with Attention-based pooling
  • Sequence-based custom encoder-decoder setup : Pre-trained image encoder (Efficientnet-b5) + Long short term memory (LSTM) decoder.
  • Residual Unet model trained in a multitask fashion

Loss Function Ablations

  • Binary Crossentropy
  • Categorical Crossentropy
  • Dice loss ( +/- per-image )
  • Tanimoto loss and Tanimoto dual loss
  • Weighted Categorical Crossentropy loss
  • Combo loss — dice + binary crossentropy ( I tried different weighting configs before landing on 0.9*dice + 0.1*bce)
  • Combo loss — multiclass dice loss + focal crossentropy ( 1 : 1 weighting)
  • Lovasz loss


  • Adam, AdamW**, SGD, RMSprop

** — offered best results together with stochastic gradient descent scheduler with warm restarts


  • Cosine annealing +/- warm restarts
  • Stochastic gradient descent scheduler with warm restarts **.

** — although this did better with initialized parameters, no further tuning was done.


  • F1 +/- threshold search**
  • Recall**
  • Dice metric
  • Accuracy metric

** — F1 + Recall metrics were the main metrics evaluated. With F1, upon threshold search, [0.1, 0.2] were found to improve only the local CV score and not the public score.

Fold splits

  • Simple stratified train-test split
  • Custom stratified splitter designed for segmentation tasks
  • K-Fold splitter
  • Multilabel Stratified Shuffle splitter (assigned ‘y’ the flattened masks (i.e 256 * 256 or 65536 pixels) )
  • GroupKFold splitter

What did not work ?

  • Generally speaking, the extent to which I explored all of the above experimental configurations was highly decided by my compute and internet access (It would be really nice if I could experiment more but offline ,on a much more powerful machine !).
  • Deciding to cast the task as multi-class and sequence classification problems was painfully non-trivial. I observed little consistency between local cross validation (and across folds) and public score, even after switching data splitters.
  • Especially, with the Multilabel splitter, the results were quite interesting but reflected poorly upon submission.
  • Tanimoto and its dual variant did not offer satisfactory results in terms of both multi-class single-output and multitask scenarios.
  • Snapshot ensembling worsened the performance.

A great deal of learning experience it was, with numerous uncharted territories that one could explore.

Link to full code is available in the reference list below.

Thank you for reading! Feedback is highly welcome.


[1] NASA Field Harvest Boundary Detection Challenge

[2] 2021 NASA Harvest Rwanda Baseline Model

[3] Stochastic Gradient Descent with Warm Restarts (SGDR)

[4] Stratified Split for Semantic Segmentation

[5] A Spatio-Temporal Deep Learning-based Crop Classification Model for Satellite Imagery

[6] Link to Full Code

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓