Skip to content

Advancing Autonomous Systems Using Offline Reinforcement Learning

Featured Research

Autonomous systems like racing vehicles require reliable decision-making in dynamic environments. Current online learning models face risks of accidents during training. VRAC researchers Cody Fleming and Prajwal Koirala developed offline reinforcement learning techniques that leverage pre-recorded data to safely and efficiently train autonomous agents without real-time interaction risks. 

Introduction/Problem Description 

In autonomous racing and control systems, traditional online reinforcement learning requires constant interaction with the environment, which poses safety risks and high computational demands. Existing industry methods, such as rule-based and model-free learning, often fail to generalize to unseen environments and struggle with efficiency. VRAC’s research leverages offline reinforcement learning, which uses historical data to train agents safely without real-time interaction, reducing risks while maintaining high performance. This technique provides significant advantages in the field of autonomous driving and robotic control. 

Project Description 

The research introduces two novel frameworks: Return-Conditioned Decision Tree Policy (RCDTP) and Return-Weighted Decision Tree Policy (RWDTP). Both methods reformulate offline reinforcement learning as a regression problem using decision trees. These methods enable fast training and inference, achieving competitive performance across robotic tasks like locomotion and racing. Hardware setups like the F1tenth platform allow testing and evaluation of the agent’s generalization capabilities across multiple racing tracks, ensuring efficient data collection and evaluation. 

asdfasdfasdfa
asdfasdfasdfa

How it works 

In traditional reinforcement learning, agents must interact with the environment to collect training data. However, this interaction can be costly, risky, or impractical for real-world applications like autonomous driving. Offline reinforcement learning circumvents this by training agents using pre-existing datasets. In the F1tenth racing context, a controller gathers race data, such as motor thrust and steering rates. This data is then used to train offline RL models utilizing RCDTP and RWDTP frameworks. These policies utilize decision trees, providing fast, explainable results without interacting with the physical environment, thus mitigating real-time failure risks. 

The RCDTP and RWDTP frameworks improve upon many existing frameworks, simplifying the approach to achieve greater agent trading and inference speeds while demonstrating performance at least on par with the established methods. RCDTP models the agent’s actions based on the state, return-to-go (the future reward sum), and timestep, conditioning each action on the trajectory’s goal. This aids in maintaining consistent long-term objectives. In contrast, RWDTP utilizes only the current state but applies a weighted return, where a weighted cumulative reward influences actions. This simplifies decision-making, especially under constrained data, making it less computationally demanding. RCDTP’s return-to-go approach makes it more suited for tasks requiring precise control over longer sequences, whereas RWDTP’s return-weighting enhances efficiency for scenarios where quick, interpretable responses are prioritized. 

Highlight-worthy point 

“With the prospect of leveraging extensive datasets of human driving behaviors to train agents, the offline learning of driving policies stands as a pivotal research direction within intelligent transportation” 

asdfasdfasdf

Publications

Philpott III, Robert, and Eliot Winer. “Modeling Public Concerns for Unmanned Aerial System Operations in the National Airspace System.” Journal of Air Transportation (2023): 1-9.

https://arc.aiaa.org/doi/full/10.2514/1.D0346