Learning To Core Thermals : A Scientific Angle

Here I want to summarize some of the take-homes of the theoretical paper ‘Learning to soar in turbulent environments’ by Gautam Reddy, Antonio Celani, Terrence J. Sejnowski, and Massimo Vergassola, published in PNAS in 2016

In this research gliders are trained through reinforcement learning uncovering the key sensorimotor cues that permit effective control over soaring in turbulent environments (using numerical models of turbulent convective flow ). They are trained in low to moderate and strong turbulent flow levels to see if different soaring strategies (‘policies’) are preferable for the different conditions. The learning algorithm used is the state–action–reward–state–action (SARSA) algorithm, modeled on animal learning.

Performance criterion

Height ascended per trial (averaged over different flow simulations). Each trial lasts for 2.5 min.

Sensorimotor cues investigated

Vertical acceleration. vertical velocity, torque (difference of vertical velocities at the left and the right wing), and temperature.

Glider inputs

There are 6 basic inputs the glider can take to improve thermalling: increasing, decreasing or maintaining either bank angle or angle of attack.

Reinforcers

Both the sensation of vertical wind velocity and the vertical wind acceleration (at the next moment after the action, in 1 second intervals). ‘Rewards’ are when these are positive, ‘punishments’ are when these are negative or the glider hits the ground!

Low to moderate strength thermic conditions

The ratio of the average (absolute) up/down velocities is lower than the average glider speed – e.g. thermals < 4-5 m/s on paragliders. (Taking into account the fact a paraglider sinks at 1 m/s.)

Strong thermic conditions

The ratio of the average (absolute) up/down velocities is higher than the average glider speed – e.g. thermals > 4-5 m/s on paragliders.

Article Summary

Thermals are understood not as simple (discrete) columns or bubbles of rising air, but as complex fluctuating, erratic wind velocities (turbulence) at different scales around thermal cores. “Birds or gliders attempting to find and maintain a thermal face the challenge of identifying the potentially long-lived and large-scale wind fluctuations amid a noisy turbulent background.”
Here are the results of learning in different ‘flow regimes’ (how strong the thermals are).
.
A. Strength of conditions and learning curves. Learning curves are similar for both weaker and stronger conditions – although there’s steeper linear learning in weaker conditions in the earlier part of the learning curve. The learning curves flatten after about 10 hours thermalling practice.
B. Strength of conditions and thermalling efficiency. For weak to moderate instability (up to e.g. 5 m/s thermals), there’s rapid, linearly increasing height gain with strength of thermals. But as instability increases, there are still gains but they occur more gradually. There is also increasing thermalling efficiency up to 3-4 m/s, but then a downward trend that reflects the increasing difficulty in control as fluctuations increase.
C. What information should we be sensitive to. A comparison of the average gain in height for different combinations of sensorimotor cues. The best cues are the combination of vertical acceleration and torque (difference of vertical velocities at the left and the right wing). Velocity on its own tells you that you’re going up but not where the cores are. “The pair acceleration and torque allows the glider to climb the thermal toward the core and also detect the edge of a thermal so that the glider can stay within the core.” Note that angle of attack does not influence significantly the performance in climbing an individual thermal.
D. Search space for the learning process. The ‘effective horizon’ is the amount of time into the anticipated future that reinforcers are factored into the choice of action. As the effective horizon increases, later rewards contribute significantly to the decision-making, and more exploratory strategies are preferred – rather than exploitative ‘greedy’ short-term strategies to maximize rewards. So in the case of good thermalling, thinking ahead round 100 seconds to maximize the rewards of lift and vertical acceleration is much more optimal than just trying to improve lift and acceleration within the next few seconds. This suggests that the ‘search space’ for the learning process should be a around 1 – 1.5 min window (about 4-6 circles).
Controlling angle of attack on glides between thermals. While changing angle of attack doesn’t improve performance while thermalling, it does improve performance when gliding between thermals where there are ‘lift lines’ and areas of turbulence. The glider in this situation learns to increase its pace during phases of descent while slowing down during periods of ascending currents.
.

THERMALLING STRATEGIES
‘Don’t Lose The Core’ Strategy. “When the glider experiences a negative wind acceleration, the optimal action is to sharply bank toward the side of the wing that experiences larger lift.”
‘Straight if Getting Stronger’ Strategy. “When the glider experiences a large positive acceleration and no torque, the glider continues flying along its current path.”
Moderate Banking In Strong Conditions (7-8 m/s), Steeper Banking in Moderate (2-3 m/s) Conditions. “The preferred bank angles of gliders trained in a strong flow are relatively moderate, and the policy in general is more conservative.” For instance, “in the case of zero torque and zero acceleration …the optimal bank action in the weak flow regime is to turn as much as possible, in contrast to the policy in the strong flow regime, which is to not turn…. A policy becoming more conservative and risk averse as fluctuations increase is consistent with the balance of exploration and exploitation”. In a more turbulent environment, where a wrong decision can lead to highly negative consequences, we expect the active pilot to play safe and tend to gather more information before taking action – and be less reactive, smoothing over smaller variations in acceleration and torque.

Take Homes

Coring. Thermalling well is best understood as continuously active coring of thermals, not simply getting ‘in’ the thermal, maintaining bank, and going up.
Acceleration/Torque: Learn to focus on vertical acceleration/decceleration and torque (left/right wing pressure) as the important information to be aware of while thermalling. Whenever these are sensed, a decision needs to be made.
Angle of Bank for Thermalling. Bank angle (weightshift/brake) is the critical pilot input for coring well.
Angle of Attack for Gliding. Speed-bar/rear risers is the critical pilot input for gliding well on a single heading.
The learning process is based on reinforcement (through a sense of pleasure/displeasure) and trial and error.
Reinforcers. Learn to feel rewarded (punished) by the sense of vertical acceleration (decceleration), and climb (sink) rate.
Learning should involve exploration and planning over 60-100 second time windows (about 4-6 circles), particularly in low to medium strength days. The ‘rewards’ of better vertical acceleration and climb rate should be maximized over these time chunks, not just immediately.
A lot of learning can occur in 10 hours of thermalling.
Different thermalling strategies may be needed for weaker (<4 m/s) vs stronger conditions (>5 m/s) conditions, with more conservative flying (e.g. less reactive, more information gathering) and more shallow bank angles in stronger conditions.