LeCropFollow: Latent Space Planning for Navigation in Unstructured Crop Fields

Tommaselli, Felipe Andrade G.; Affonso, Francisco; Pompeu, Arthur; Capezzuto, Gianluca; Sivakumar, Arun Narenthiran; Chowdhary, Girish; Becker, Marcelo

LeCropFollow:
Latent Space Planning for Navigation
in Unstructured Crop Fields

Felipe Tommaselli¹ , Francisco Affonso² , Arthur Pompeu¹ , Gianluca Capezzuto¹ ,
Arun Narenthiran Sivakumar² , Girish Chowdhary² , Marcelo Becker¹

Corresponding author: f.tommaselli@usp.br

IEEE RA-L ’26 Robotics & Automation Letters

Paper arXiv Code Data Models Video

LeCropFollow is a learning-based navigation framework for under-canopy agricultural robots that plans trajectories within a learned latent world model over the uncompressed heatmap signal, enabling zero-shot navigation of unstructured fields without GNSS.

Abstract

Unstructured navigational features, such as irregular planting or discontinuities, remain the primary failure mode for under-canopy agricultural robots. Existing geometric approaches often fail in these scenarios because they compress high-dimensional visual data into deterministic spatial references, effectively discarding the uncertainty and semantic context required to navigate ambiguous terrain. To address this, we present LeCropFollow, a visual navigation framework that bypasses explicit geometric modeling in favor of a learned latent representation. By integrating a self-supervised semantic heatmap extractor with TD-MPC2, a Model-Based Reinforcement Learning (MBRL) planner, our system optimizes trajectories directly within a latent manifold. The framework operates over the uncompressed heatmap signal, preserving the semantic context that geometric reductions discard. We demonstrate that this representational shift enables zero-shot transfer from simplified simulation to the physical world without fine-tuning. Extensive field experiments in late-stage corn fields show that LeCropFollow matches state-of-the-art baselines in unstructured rows but significantly outperforms them in plantation gaps, achieving a 2.4× reduction in semantic failures compared to keypoint-based methods. These results suggest that latent planning offers a robust alternative to geometric estimation for operations in heterogeneous agricultural environments.

Field Demonstrations

System Overview

LeCropFollow maps high-dimensional visual observations directly to control inputs without explicit state estimation. At inference time, the perception backbone (trained via self-supervised learning) and the control policy (trained via model-based reinforcement learning) are deployed zero-shot in the physical field. Rather than assuming explicit geometric priors, our system performs trajectory planning in a learned latent space with a trained world model.

Field Validation

Aerial view of the experimental corn plantation and the plantation gap — **Field Validation Environments.** Top-down aerial view of the experimental corn plantation during the Flowering Stage (Source: Google Earth, Airbus, Landsat/Copernicus). The figure highlights the three distinct testing environments: Left Border, Center, and Right Border rows. Specific focus is drawn to the **Plantation Gap**, an unstructured section spanning from 6.1 m to 14.8 m (an 8.7 m discontinuity) with a degraded left row.

Results

Failure Mode Analysis

Our primary hypothesis is that geometric methods suffer from information over-compression in unstructured environments; the experimental data strongly supports this. While CropFollow++ yields high-confidence divergence by fitting geometry to noise within plantation gaps, LeCropFollow reduces semantic failures by 2.4× (29 vs 70) by treating the heatmap's spatial dispersion as part of the input signal.

Unstructured Resiliency & Ablation

CropFollow++ (G1) degrades substantially in this regime (6.7% success), as its keypoint estimation produces unreliable references when the vanishing point loses consistency. CROW (G3) retains partial robustness (53.3%) by leveraging a 3×3 m point cloud window to track adjacent rows. Operating on the same visual input as CropFollow++, LeCropFollow (L3) traverses the gap in 14 of 15 runs (93.3%): in high-uncertainty regions, the policy applies small angular corrections rather than tracking the noisy keypoint references, maintaining a smooth heading until the row structure resumes.

Gap traversal ablation: collisions per run across geometric and learned variants — Collisions per run over 15 traversals of the 8.7 m gap. Geometric baselines (G1–G3) vs. Learned variants (L1–L3). The full formulation achieves the lowest median and tightest spread.

Comparative Field Trials

LeCropFollow performs comparably to established state-of-the-art methods, matching the best baseline on maximum gap-free distance, while both baseline controllers were fine-tuned to fit experimental conditions, ours was deployed in a fully zero-shot manner (Flowering Stage, 12 runs each).

Method	Collisions ↓	Max Dist. w/o Col. [m] ↑
CropFollow++	5.3 ± 1.6	38.9
CROW	6.5 ± 1.4	37.9
LeCropFollow	5.1 ± 1.8	38.9

Harvested-stage runs yield 4.8 ± 1.5 collisions and 28.4 ± 5.7 m maximum collision-free distance — no significant difference from the Flowering distribution, indicating the policy is not implicitly fitted to a single perceptual configuration of the field.

BibTeX

@article{tommaselli2026lecropfollow,
  title         = {LeCropFollow: Latent Space Planning for Navigation in Unstructured Crop Fields},
  author        = {Tommaselli, Felipe and Affonso, Francisco and Pompeu, Arthur and Capezzuto, Gianluca and Sivakumar, Arun Narenthiran and Chowdhary, Girish and Becker, Marcelo},
  journal       = {IEEE Robotics and Automation Letters},
  year          = {2026},
  note          = {Accepted for publication},
  eprint        = {2606.31941},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO}
}

Related Work

CROW

CropFollow++

LeCropFollow:Latent Space Planning for Navigationin Unstructured Crop Fields