Multi-objective deep reinforcement learning for wireless networks

Roure, Babacar
Thesis

The scale, heterogeneity, and complexity of next-generation communication networks, such as 6G, are expected to increase significantly compared to current systems. This evolution renders traditional optimization approaches based on models increasingly inadequate to meet performance requirements. These challenges are further exacerbated by the integration of novel network components and architectures, notably unmanned aerial vehicles-based communication systems, which introduce additional degrees of freedom and complexity. In this context, Reinforcement Learning (RL) has emerged as an effective paradigm for learning long-term decision-making strategies through direct interaction with the environment, without requiring explicit models or prior knowledge. However, existing RL approaches remain difficult to deploy in real-world telecommunication environments, which are inherently multi-objective and characterized by conflicting performance criteria.

This thesis investigates Multi-Objective Reinforcement Learning (MORL) as a unifying framework to explicitly handle variable trade-offs among conflicting objectives. The goal is to design algorithms capable of adapting to dynamic environments as well as to changing operator-defined preferences, while avoiding the costly retraining process. The focus is put on two main problems: downlink scheduling in wireless networks and UAV-based wireless data collection, demonstrating that both problems require a modeling framework that simultaneously accounts for multiple performance objectives. 

Conventional solutions to the downlink scheduling problem predominantly rely on simple mathematical heuristics that optimize a single objective. This thesis proposes the use of multi-objective Q-learning to train a scheduler capable of dynamically optimizing throughput, fairness, and packet loss rate according to operator preferences. This approach systematizes the modeling of multi-objective communication problems.

Due to their adaptability and mobility, UAVs are playing an increasingly important role in communication networks, particularly for data collection tasks. In this context, AI–based approaches have attracted significant interest for UAV trajectory planning in large-scale and complex environments. Since energy consumption constitutes a major constraint for small UAVs, this thesis introduces a stable and efficient algorithm capable of adapting UAV navigation in urban environments according to the desired trade-off between the data collection and energy objectives. This adaptation is achieved without requiring additional retraining or fine-tuning.

Another major limitation of RL-based approaches for UAV-based data collection lies in their limited generalization capability with respect to changes in mission parameters, which typically needs costly retraining for each new configuration. To address this issue, radio signals have been replaced as learning inputs by a map-based representation processed using convolutional neural networks. This thesis proposes permutation-invariant architectures, based on attention mechanisms to account for the multi-objective nature of the problem while improving generalization across diverse scenarios. The proposed models demonstrate superior results in multi-objective performance, generalization capability, and memory efficiency compared to convolutional networks.

Finally, the last part of this thesis focuses on data collection using a swarm of UAVs aiming to achieve global trade-offs over shared objectives. The multi-UAV approach improves the overall performance by distributing the task among multiple agents with limited flying time, each exploring different regions of the city. A method that augments the sample efficiency, improves generalization, and promotes cooperation, while maintaining decentralized execution of navigation policies, is proposed. The proposed approach remains effective even in an urban environment distinct from the one used during training.


Type:
Thesis
Date:
2026-03-23
Department:
Communication systems
Eurecom Ref:
8562
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/8562