
The Deep Synergy Between Telecommunications and Machine Learning
Author(s): Mahmoud Abdelaziz, PhD
Originally published on Towards AI.
Where Signals Meet Intelligence

Telecommunications and machine learning may seem like distant cousins — one rooted in signals, antennas, and physical channels, the other in data, prediction, and high-dimensional inference. But after working in both fields for over a decade, I’ve come to see them not as separate disciplines, but as two dialects of the same language: the language of uncertainty, estimation, and information.
This insight isn’t abstract for me — it’s personal. My PhD journey began at Tampere University of Technology in Finland, home to one of Europe’s leading Centers of Excellence in Signal Processing. There, I completed a PhD followed by a postdoc at the intersection of telecommunications, signal processing, and machine learning — not as separate tools, but as components of a single unified framework. That same fusion continues in my current role at the Communications and Information Engineering program at the University of Science and Technology in Zewail City. Our curriculum and research sit right at the crossroads between wireless systems, machine learning, and intelligent signal processing — because that’s where the future of both fields lies.

This article is the result of years of working at that intersection. It’s not just a catalog of use cases or research trends — it’s a technical deep dive into the synergy between telecommunications and AI:
- Part 1 explores how machine learning is revolutionizing telecommunications, from predicting user mobility and optimizing beamforming to training fully neural receivers, enabling self-healing networks, and designing cognitive radio systems. These are not hypothetical scenarios — they are happening now, using real systems and datasets.
- Part 2 flips the lens: it shows how core telecom principles shaped modern machine learning. Ideas like maximum likelihood, Bayesian inference, cross-entropy, mutual information, and belief propagation were all developed within the telecom community, long before they became buzzwords in AI. Tools like the Cramér-Rao bound, adaptive filters, and ROC curves were telecom staples before migrating to ML frameworks. Even the very structure of neural networks owes much to signal processing — from convolution (or more accurately, cross-correlation) to frequency-domain representations via FFT. At a deeper level, both fields are driven by the same philosophical quest: extracting reliable information from noisy, incomplete, or uncertain data. Whether decoding a modulated signal or training a deep network, we are ultimately seeking to uncover hidden patterns that enable accurate decisions.
- Part 3 goes one step further, exploring a space where machine learning and telecommunications do not merely enhance one another, but co-evolve into something new: systems that are not only connected and intelligent, but increasingly aware. In domains like smart agriculture, wireless sensors give voice to the environment — transmitting signals that carry the hidden needs of soil and crops. Machine learning interprets these signals, turning data into decisions. What emerges is not just an optimized system, but a sensing, learning, and responding ecosystem — a distributed intelligence shaped by the synergy of communication and learning.
Together, the three parts paint a comprehensive picture of how telecommunications and machine learning don’t merely borrow from each other — they co-evolve. They represent a single scientific arc that stretches from Shannon to Bayes, from MIMO to GANs, from stochastic processes to transformers, and now toward systems that are not only intelligent but also contextually aware — systems that sense, communicate, and decide as one.
What makes this work truly stand out is its multi-directional and layered perspective — something rarely seen in existing literature. Most discussions tend to focus on just one direction: either how AI is transforming telecommunications, or how telecom techniques laid the foundation for AI. This article does both — and adds a third, often overlooked dimension. It shows how the synergy of the two fields is now giving rise to a new class of systems altogether: systems like intelligent farms and smart grids, where sensing, communication, and learning merge to form responsive, adaptive infrastructures. By bringing these three perspectives together into a single narrative, this work offers not only a deeper understanding of the present, but also a more integrated vision of the future — one in which communication and intelligence are no longer separate functions, but expressions of the same system-level awareness.
Part 1: How AI and Machine Learning Are Transforming Telecommunications
This section examines how artificial intelligence (AI) and machine learning (ML) are being integrated across all layers of the telecommunications stack — not as isolated tools, but as core components that reshape how networks are planned, deployed, optimized, and operated.
Telecommunications today faces enormous complexity: highly dynamic environments, growing user demands, spectrum scarcity, and the need for real-time responsiveness. These challenges make traditional rule-based algorithms increasingly insufficient. AI and ML provide an alternative — models that can learn from data, adapt over time, and infer hidden patterns to improve performance under uncertainty.
We present these AI applications in six tightly connected layers, ordered to reflect the real-world logic of wireless system development — from planning to execution, from optimization to intelligence at the signal level.
1.1 AI-Driven Radio Planning and Propagation Modeling
Before any bit is transmitted, the wireless network must be carefully planned. AI is now redefining this process using data-driven intelligence:
- Neural propagation models, trained on ray tracing simulations and drive test data, predict signal strength with higher spatial resolution and accuracy than traditional path loss formulas.
- These models enable AI-assisted planning, helping engineers evaluate candidate base station locations and spectrum reuse strategies across complex terrains like dense urban areas, tunnels, and indoor environments.
By learning from real measurements and simulation data, these AI tools accelerate deployment, improve model realism, and adapt more effectively to diverse environments. The high accuracy of coverage predictions enables also using such models as digital twins for mobile networks with very useful applications related to not only network planning, but optimization as well.
In order to add more fun to the material, let’s explore a practical library, namely Sionna, in which we can generate real data from accurate ray tracing simulations which can then be used to train ML models.
In order to start, we first need to include some libraries in Python as follows.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
# Import or install Sionna
try:
import sionna.rt
except ImportError as e:
import os
os.system("pip install sionna-rt")
import sionna.rt
# Import or install Mitsuba (for rendering)
try:
import mitsuba as mi
except ImportError as e:
import os
os.system("pip install mitsuba")
import mitsuba as mi
from sionna.rt import load_scene, Camera, Transmitter, Receiver, PlanarArray,\
PathSolver, RadioMapSolver, load_mesh, watt_to_dbm, transform_mesh,\
cpx_abs_square
rm_solver = RadioMapSolver()

To simulate realistic wireless propagation in our virtual environment, we begin by configuring the scene. This setup loads the built-in Etoile scenario from Sionna RT, representing a richly detailed 3D environment with complex geometry and material properties as shown in the figure above. Adding more material details is possible, but we aim for simplicity in this article. We then define the RF frequency at 28 GHz (mmWave band in 5G), and the system bandwidth as 100 MHz, which determines the frequency selectivity of the channel. Next, we configure the antenna arrays: the transmitters are equipped with a 3GPP-compliant planar array, allowing for directional beamforming, while the receivers use simple isotropic single-element antennas. We manually place two transmitters at specific 3D coordinates, each with a defined orientation that steers the main beam toward a target area in the scene. Their transmit power is set to 46 dBm — typical for base stations. By structuring the transmitters this way, we can explore how antenna configuration and placement influence signal propagation through complex environments. The transmitters are labeled in red.
The code for simulating this scene can be found below.
def config_scene(num_rows, num_cols, fc, BW):
# Load the Etoile scene
scene = load_scene(sionna.rt.scene.etoile)
scene.frequency = fc
# Set system bandwidth
scene.bandwidth = BW
# transmitters positions
positions_m = np.array(
[[-150, 20, 40],
[-125, 10, 40]]) # positions in meters
# main beam direction
look_ats_m = np.array(
[[-216, -21, 0],
[-90, -80, 0]]) # beam target points in meters
positions = positions_m
look_ats = look_ats_m
# Define planar antenna arrays
scene.tx_array = PlanarArray(num_rows=num_rows,
num_cols=num_cols,
pattern="tr38901",
polarization="V")
scene.rx_array = PlanarArray(num_rows=1,
num_cols=1,
pattern="iso",
polarization="V")
# Add transmitters
power_dbms = [46, 46] # Tx power in dBm
for i in range(len(positions)):
scene.add(Transmitter(name=f"tx{i}",
position=positions[i],
look_at=look_ats[i],
power_dbm=power_dbms[i]))
return scene
Once the scene is configured, we compute the received signal strength (RSS) map using Sionna’s ray-tracing solver. This step simulates how electromagnetic waves propagate through the 3D environment, capturing multiple reflections, diffractions, and scattering effects. We define a resolution of 1 m × 1 m per grid cell and allow up to five propagation bounces. The solver traces 10 million rays per transmitter, enabling high spatial accuracy in capturing signal behavior across the entire environment. The Python code can be found below.
# Load and configure scene
# Parameters of antenna array (spacing between elements = 0.5 wavelength by default)
num_rows=8 #number of rows of antenna elements in planner arrayx
num_cols=2 #number of cols of antenna elements in planner array
fc = 28e9 # 28 GHz or 3.5 GHz for example
BW = 100e6
scene_etoile = config_scene(num_rows, num_cols, fc, BW)
# Compute the radio map
rm_etoile = rm_solver(scene_etoile,
max_depth=5,
samples_per_tx=10**7,
cell_size=(1, 1))
Finally, the Python code for rendering the RSS radio map can be found below.
cam = Camera(position=[0,0,1000], orientation=np.array([0,np.pi/2,-np.pi/2]))
# We can render the received signal strength (RSS) map as shown below
# Another option is the SINR map indicating intereference levels
scene_etoile.render(camera=cam,
radio_map=rm_etoile,
rm_metric="rss",
rm_show_color_bar=True,
rm_vmin=-100,
rm_vmax=-20);
The rendered RSS radio map at 28 GHz is shown below where the transmitters (with TX power = 46 dBm) are marked in red. Notice the distribution of the RSS across the scene in which the highest levels are closer to the transmitters and vice-versa.

Repeating this experiment at 3.5 GHz (which is very common in 5G), we get the following radio map. Notice the enhanced coverage when using lower frequency as a result of the physics of radio wave propagation.

We can use the above code to generate multiple radio maps for different configurations of the transmitters, environment, antennas, RF frequencies, etc. Then we can train a machine learning model that predicts the radio coverage from those parameters in addition to the geographical map of location. This becomes a typical image regression problem in which the target is to predict the image of the RSS radio map based on the input features. This model can be very useful for radio planning for example in mobile communications in which we would like to place the radio base stations at the optimal locations with optimal configurations.
Very cool stuff!
Further details can be discussed in a future article. For more information about this topic you may check the following references.
[1] M. Vasudevan and M. Yuksel, “Machine Learning for Radio Propagation Modeling: A Comprehensive Survey,” IEEE Open Journal of the Communications Society, vol. 5, pp. 5123–5153, 2024.
[2] A. Marey, M. Bal, H. F. Ates, and B. K. Gunturk, “PL-GAN: Path Loss Prediction Using Generative Adversarial Networks,” IEEE Access, vol. 10, pp. 90474–90480, 2022.
[3] O. Ozyegen, S. Mohammadjafari, K. E. Mokhtari, M. Cevik, J. Ethier, and A. Basar, “An Empirical Study on Using CNNs for Fast Radio Signal Prediction,” SN COMPUT. SCI. 3, 131, 2022.
[4] J. Hoydis, S. Cammerer, F. A. Aoudia, A. Vem, N. Binder, G. Marcus, and A. Keller, “Sionna: An Open-Source Library for Next-Generation Physical Layer Research,” arXiv preprint arXiv:2203.11854, 2022.
[5] R. Levie, C. Yapar, G. Kutyniok, and G. Caire, “RadioUNet: Fast Radio Map Estimation with Convolutional Neural Networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 6, pp. 4001–4015, June 2021.
[6] Xin Zhang, Xiujun Shu, Bingwen Zhang, Jie Ren, Lizhou Zhou, and Xin Chen, “Cellular Network Radio Propagation Modeling with Deep Convolutional Neural Networks,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20), Association for Computing Machinery, New York, NY, USA, pp. 2378–2386, 2020.
[7] S. Bakirtzis, K. Qiu, J. Zhang, and I. Wassell, “DeepRay: Deep Learning Meets Ray-Tracing,” in Proceedings of the 16th European Conference on Antennas and Propagation (EuCAP), Madrid, Spain, 2022, pp. 1–5.
1.2. AI for Network Optimization and Automation
Once a network is deployed, the real work begins. Dynamic environments, fluctuating user behavior, mobility, and interference make manual optimization impractical. Here, AI steps in to deliver continuous learning, predictive adaptation, and autonomous decision-making.
Traffic Prediction and Steering
AI models can predict traffic variations across time and space, enabling proactive load balancing and traffic steering. Deep learning models — including LSTMs and spatiotemporal convolutional networks — are trained on historical usage data to forecast traffic densities, detect bottlenecks, and inform resource reallocation. This allows networks to anticipate demand, rather than simply react to it.
Handover Optimization and Mobility Management
Classical handover rules — based on thresholds or hysteresis margins — are limited in dense, fast-changing environments. Reinforcement learning (RL) agents have shown promise in learning optimal handover policies by interacting with the environment, balancing dropped-call probability with load distribution. Context-aware models can also incorporate mobility predictions, speed, and radio conditions to enhance decision-making.
Anomaly Detection and Fault Management
Telecom systems generate high-dimensional operational data. AI models — especially unsupervised approaches like autoencoders or variational inference — can learn normal behavior patterns and flag anomalies ranging from signal degradation to misconfigured parameters. By identifying issues early, operators can avoid cascading failures and reduce downtime.
Self-Organizing Networks (SON)
SON frameworks were introduced to automate configuration, optimization, and healing tasks. AI extends this vision: clustering techniques can optimize cell boundaries; deep learning models can tune power levels and antenna tilts; RL agents can learn repair strategies post-failure. Combined, these enable truly self-configuring and self-healing networks.
O-RAN: Architecture for Embedded Intelligence
To operationalize these AI functions, modern networks increasingly adopt Open RAN (O-RAN) — a disaggregated architecture that exposes standardized interfaces and allows intelligent agents to be embedded throughout the system.
- The Near-Real-Time RAN Intelligent Controller (near-RT RIC) supports sub-second decision-making, hosting xApps for tasks like real-time beam management or dynamic handover control.
- The Non-RT RIC, in contrast, handles slower processes like model training, performance diagnostics, and policy optimization through rApps.
- These agents ingest measurements such as CSI, CQI, UE positions, and power levels — enabling cross-layer AI coordination.
- O-RAN also encourages openness: models can be retrained offline, updated securely, and shared across vendors, accelerating innovation.
Through the O-RAN architecture, AI moves from a side feature to a native, integral part of the telecom control loop, enabling operators to adapt their networks in real-time, at scale, and with minimal human intervention.
For more information about the above topics, you may check the following references.
[1] N. Chatzistefanidis, N. Makris, V. Passas, and T. Korakis, “ML-based Traffic Steering for Heterogeneous Ultra-dense beyond-5G Networks,” in Proceedings of the 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, United Kingdom, 2023, pp. 1–6.
[2] S. Hämäläinen, H. Sanneck, and C. Sartori (Eds.), LTE Self‐Organising Networks (SON): Network Management Automation for Operational Efficiency, John Wiley & Sons, 2011. Print ISBN: 9781119970675.
[3] A. Robinson and T. Kunz, “Downlink Scheduling in LTE with Deep Reinforcement Learning, LSTMs and Pointers,” in Proceedings of MILCOM 2021 — IEEE Military Communications Conference, San Diego, CA, USA, 2021, pp. 763–770.
[4] Hoang Duy Trinh, “Data Analytics for Mobile Traffic in 5G Networks Using Machine Learning Techniques,” Doctoral Thesis, Universitat Politècnica de Catalunya, Department of Telematics Engineering, June 10, 2020.
1.3. AI for Positioning and Context-Aware Communications
In modern wireless systems, location is no longer an optional service — it’s an enabler of intelligent communication. From emergency response and resource allocation to beam steering and session continuity, accurate positioning directly impacts both user experience and system performance.
Traditional techniques such as Time of Arrival (ToA), Angle of Arrival (AoA), and Received Signal Strength (RSS) rely on explicit geometric models and assumptions about the propagation environment. However, these methods often suffer in non-line-of-sight or multipath-heavy conditions — such as dense urban areas or indoor spaces.
AI reframes the positioning task as a learning problem, enabling more robust and flexible solutions:
- Neural networks, trained on fingerprinted signal measurements (such as CSI, RSS, or raw IQ samples), can infer user position by identifying complex spatial and multipath patterns that would be inaccessible to analytical models.
- Instead of relying on geometric triangulation, AI models treat the signal space as a latent representation of location, achieving sub-meter accuracy in challenging environments without needing GPS.
- By learning from temporal sequences of CSI or other wireless features, recurrent models can also predict short-term movement trajectories, such as estimating the user’s velocity vector or next likely position — valuable for anticipatory handovers and beam steering.
- In indoor environments like malls, airports, or factories, AI models can segment spatial zones, recognize transitions between rooms or corridors, and support context-aware services such as access control or location-based content delivery.
This paradigm shift from explicit estimation to statistical inference over signal embeddings allows positioning systems to work even when traditional models fail — including indoor, underground, and dense multipath conditions.
MathWorks provides two highly practical examples that demonstrate how AI can be applied to real-world wireless positioning problems:
- Three-Dimensional Indoor Positioning with 802.11az Fingerprinting and Deep Learning — A hands-on demo showing how to train and evaluate a deep learning model for indoor positioning using simulated signal fingerprints.
- AI for Positioning Accuracy Enhancement — A practical example of using machine learning to refine 5G positioning estimates based on simulated radio measurements.
These examples are especially useful for understanding how deep learning can be integrated into wireless localization systems in realistic settings.
1.4. AI in Physical Layer Signal Processing: Channel Estimation, Equalization, and Beyond
The physical layer is where raw bits confront real-world distortion — fading, noise, interference, and hardware imperfections. Traditional solutions rely on mathematical models: pilot-based channel estimation, MMSE equalizers, and carefully designed filters. But in complex or fast-varying environments, these models can fall short.
AI offers a different strategy: learn directly from data. Neural networks, particularly CNNs and autoencoders, are trained to estimate and equalize channels based on received signals, learning patterns that conventional models miss.
Unlike classical systems that treat estimation and equalization as separate stages, learned receivers can jointly perform both — sometimes without explicit channel models or even pilots. This opens the door to:
- MIMO detection and beamforming using data-driven inference
- ISI cancellation in dispersive or nonlinear channels
- Pilot reduction, freeing up bandwidth
- End-to-end neural receivers trained directly to recover data without manual design of each block
While these models don’t necessarily reduce computational complexity — and often increase it — they offer robustness, adaptability, and the ability to generalize from limited data. Tools like NVIDIA Sionna and MATLAB’s AI-native PHY libraries make it possible to simulate and train such systems on real or synthetic datasets.
The following hands-on examples and tools show how these ideas, and more, are being applied in practice.
Neural Receivers and Learned Decoding
Deep learning is being used to replace or augment traditional blocks like demodulators and LLR estimators — making receivers more robust to channel distortions and hardware imperfections.
- AI-Native Fully Convolutional Receiver
Demonstrates an end-to-end convolutional neural network receiver trained on 5G waveform data. - Training and Testing a Neural Network for LLR Estimation
Shows how to train a DNN to estimate log-likelihood ratios from soft demodulated inputs. - Autoencoders for Wireless Communications
A compact transmitter-receiver pair trained jointly as a deep autoencoder over a simulated channel. - NeuralRx
Open-source project from NVIDIA for learning-based demodulation and decoding.
AI for Beam Selection and Directional Transmission
Learning-based beam management helps networks adapt faster to user mobility, blockage, and environmental changes — especially in mmWave and massive MIMO setups.
- Neural Network for Beam Selection
Trains a classifier to predict the best beam from received signal features. - Train DQN Agent for Beam Selection
Uses deep reinforcement learning (DQN) to learn optimal beam selection through interaction. - ViWi Dataset
A public dataset combining visual and wireless features for learning beam predictions in realistic settings.
Learning Compact Channel Representations
Channel state information (CSI) can be high-dimensional and expensive to feed back. Deep learning offers a way to compress, denoise, and reconstruct CSI efficiently.
- CSI Compression with Autoencoder
Trains a deep autoencoder to reduce CSI feedback overhead while preserving accuracy.
Hardware-Aware Modeling with AI
AI can model and compensate for hardware non-idealities — such as nonlinearities in RF components — to enhance signal integrity and improve energy efficiency.
- Neural Network for Digital Predistortion
Uses a DNN to learn a digital predistortion function that linearizes the power amplifier’s response.
Tools, Demos, and Open Platforms
For rapid prototyping and research, several platforms now provide examples, datasets, and benchmarks focused on deep learning at the physical layer.
- AI for Wireless Communication Systems with MATLAB
Central portal with curated examples for AI applications in signal processing, beamforming, and modulation. - DeepVerse6G
Community-driven initiative for benchmarking and sharing deep learning models and datasets in next-generation wireless research.
These innovations don’t replace classical methods — they build on them. As we’ll see in Part 2, many of these AI approaches echo ideas from estimation theory, Bayesian inference, and mutual information — concepts born in the heart of telecommunications.
1.5. AI in Cognitive Radio and Adaptive Spectrum Access
Efficient use of the radio spectrum is critical as demand grows. Cognitive radio aims to address this by enabling devices to detect unused frequencies and adapt their transmissions accordingly. Traditional spectrum sensing methods rely on statistical detection or hand-crafted features — but AI offers a more powerful approach.
Deep learning models can learn directly from raw RF data to perform:
- Spectrum sensing — identifying signals under low SNR or in crowded bands
- Modulation classification — using CNNs trained on IQ samples to identify schemes like QPSK, 16-QAM, etc.
- Interference detection — learning to detect patterns that indicate jamming, overlapping signals, or equipment faults
- Occupancy prediction — using time series models to anticipate which frequencies will be available
Real-world platforms like DeepSig’s OmniSIG and MATLAB’s deep learning modules demonstrate how such models outperform classical detectors, especially in challenging wireless environments.
Beyond sensing, reinforcement learning (RL) is being actively researched to enable decision-making — for example, selecting which band to access or which power level to transmit. While RL agents show strong performance in simulated environments, practical deployment remains experimental, due to issues like safety, convergence, and unpredictable environments.
AI thus moves cognitive radio toward a more predictive and adaptive framework, helping networks make better use of scarce spectrum — especially in shared and unlicensed bands.
Here are some practical tools and demonstrations that showcase how AI can make wireless systems more spectrum-aware:
Modulation Recognition and Signal Classification
Accurate classification of signal types is a foundational task in RF sensing and spectrum intelligence.
- Modulation Classification with Deep Learning
A hands-on MATLAB example that uses a convolutional neural network (CNN) to classify modulation types based on IQ samples. - OmniSIG® by DeepSig
A commercial AI-native software tool developed by DeepSig, designed for real-time RF signal detection and classification. It demonstrates the use of deep learning models trained on large RF datasets for spectrum monitoring, threat detection, and waveform intelligence.
Deep Learning for Spectrum Sensing
Identifying which technologies are occupying the spectrum is critical for coexistence, interference management, and regulation.
- Spectrum Sensing with Deep Learning to Identify 5G and LTE Signals
Demonstrates how to train and evaluate a deep neural network that detects the presence of LTE or 5G signals based on frequency domain features using MATLAB.
1.6. AI at the Edge: Sustainable Intelligence and the Optical Backbone
As machine learning moves closer to the source of data generation, edge computing becomes essential for supporting real-time telecommunications services. Applications such as connected vehicles, smart factories, and intelligent power grids require ultra-low-latency responses and local decision-making — tasks increasingly handled by AI models deployed at the network edge. These edge systems must not only process data rapidly but also meet constraints on energy efficiency, privacy, and regulatory compliance.
Artificial intelligence enables this transformation in two key ways. First, it powers edge-based inference and control, enabling systems to detect patterns, act autonomously, and adapt to environmental changes. Second, AI itself benefits from edge deployment by reducing communication overhead and enhancing sustainability, especially when inference workloads are distributed intelligently across underutilized resources.
Supporting this architecture demands a robust telecom backbone, particularly high-capacity optical networks linking centralized and edge computing resources. Coherent optical transceivers — capable of modulating the amplitude, phase, and polarization of the carrier using formats like dual-polarization 16-QAM (DP-16QAM) — are central to this infrastructure. These higher-order modulation schemes improve spectral efficiency but also impose stricter requirements on signal integrity and power efficiency, which in turn depend on advanced digital signal processing (DSP).
One important technique in this domain is digital predistortion (DPD), which compensates for the nonlinear behavior of optoelectronic components such as driver amplifiers and Mach-Zehnder modulators. By accurately modeling these nonlinearities and applying inverse transformations to the input signal, DPD reduces distortion and allows the transmitter to operate more efficiently. Machine learning can further enhance DPD by adapting its models to real-time conditions, improving robustness across varying operating points.
Recent work — such as a PhD thesis from the Technical University of Munich (link) — demonstrates how learning-based algorithms can optimize DSP for coherent optical systems. These advances directly impact energy consumption in optical modules, a key concern as AI workloads continue to expand and the environmental cost of data centers becomes a global issue.
A promising strategy involves relocating edge data centers to regions with spare power capacity, ideally near renewable energy sources. This decentralization not only helps balance electrical loads but also reduces reliance on large centralized cloud facilities. However, such a vision depends entirely on the underlying communication network — particularly optical links — to maintain high throughput and reliability.
Once again, we observe a mutual reinforcement: telecom infrastructure enables scalable AI, and AI improves the efficiency and intelligence of that infrastructure. When both are designed with sustainability and performance in mind, they unlock a path toward greener, more responsive digital systems.

Part 2 — From Communications to Learning: Foundations That Shaped AI
While Part 1 explored how AI and machine learning are transforming telecommunications systems across all layers, from planning and optimization to channel processing and intelligent control, this part flips the lens.
Here, we examine the reverse direction — how the foundational ideas and tools of modern machine learning were deeply rooted in the disciplines of telecommunications and signal processing.
This is not just a historical coincidence. Both fields are fundamentally concerned with the same philosophical challenge: extracting information from noisy, uncertain data. Whether the goal is to decode a weak signal or to recognize handwritten digits, the task boils down to interpreting randomness, modeling structure, and inferring hidden variables.
Many of today’s machine learning concepts — from maximum likelihood estimation, Bayesian inference, and mutual information, to belief propagation, cross-entropy loss, and ROC curves — were developed, studied, and extensively applied in the communications domain long before becoming cornerstones in AI.
This part tells that story. It traces the lineage of key machine learning principles back to their origins in telecom, highlighting how ideas like estimation, detection, signal transforms, and information measures first emerged to solve communication problems — and later migrated into ML, where they continue to evolve.
We begin with estimation theory — the core idea that underlies both learning and communication.

2.1. Estimation Theory: The Shared Foundation of Learning and Communication
At their core, both telecommunications and machine learning face the same challenge: how to extract meaningful information from noisy data. Whether it’s a mobile device trying to decode a weak radio signal, or an AI model trying to predict outcomes from examples, both tasks involve interpreting incomplete, uncertain, and error-prone inputs.
Estimation theory provides the mathematical backbone for doing this. It’s the science of making the best possible guess about something we can’t observe directly — based on indirect, noisy evidence. This applies equally well to both fields.
In telecommunications, estimation theory has long been used to recover unknown quantities such as signal strength, channel distortion, time delays, and frequency offsets. Engineers design estimators to track these parameters in real time, ensuring reliable decoding and synchronization.
In machine learning, the same principles help us infer the parameters of models — the weights of a neural network, the coefficients of a regression model, or the structure of a probabilistic graphical model. Training a model, in many cases, is an estimation problem: we’re trying to learn the best description of a system from the data we have.

The key unifying idea is this: both domains treat the data as realizations of random variables. In communications, the randomness comes from noise, fading, and interference. In machine learning, it’s from variability in real-world data, hidden variables, and imperfect labels. But in both, we assume that there’s some structure underneath — and we want to uncover it.
This connection is not accidental. Estimation theory was developed and applied extensively in communication systems long before machine learning became popular. As a result, many of the methods now standard in AI — like maximum likelihood estimation, Bayesian inference, or even regularization — were already well known in telecom decades ago.
Moreover, concepts like model uncertainty, adaptivity, and the trade-off between bias and variance were critical in, e.g., adaptive filters and equalizers. These same ideas now form the basis of many modern learning algorithms, especially in online learning and reinforcement learning.
In short, estimation theory is not just a tool shared between machine learning and communications — it’s a deep conceptual bridge. It teaches us that learning, whether by a modem or a model, is fundamentally about making sense of an uncertain world.
2.2. Bayesian Inference and Estimation: Learning from Uncertainty
While estimation theory provides the general mathematical tools to extract unknown parameters from observed data, Bayesian inference stands out as a particularly powerful and conceptually rich subclass. It does not merely seek to minimize error or find point estimates — it offers a complete probabilistic treatment of uncertainty. In both communications and machine learning, where noise, ambiguity, and incomplete data are the norm, this probabilistic lens has proven indispensable.
Bayesian inference is one of the most elegant frameworks for learning from uncertain or incomplete information. It provides a principled way to update our beliefs as new data becomes available, making it central to both communications and modern machine learning. While Bayesian methods are now widely known due to their success in probabilistic AI, their foundational roots lie deep in signal processing and estimation theory, particularly within telecommunications.
In a Bayesian setting, all unknowns — whether they are signal amplitudes, channel states, or neural network weights — are treated as random variables with associated prior distributions. The goal is not just to find a single “best” estimate, but to describe and update the full distribution of our uncertainty. This allows systems to balance prior knowledge with noisy observations, making decisions that are robust and data-efficient.
For example, in turbo decoding — a major breakthrough in modern error correction — Bayesian inference is central. At each iteration, soft estimates of bit probabilities are exchanged between decoders, updated using the Bayes rule to refine the belief about the transmitted message. This mirrors belief updating in probabilistic graphical models, which are widely used in AI for reasoning under uncertainty.
In deep learning, Bayesian methods now underpin many crucial approaches. Bayesian neural networks, for instance, place distributions over weights rather than point estimates, enabling principled uncertainty quantification — especially important in safety-critical tasks such as medical diagnosis or autonomous driving.
Perhaps more subtly, many classical techniques in machine learning can be reframed as Bayesian procedures. For instance, L2 regularization in neural networks corresponds to imposing a Gaussian prior on the weights. The loss function then becomes the negative log posterior, turning training into a form of maximum a posteriori (MAP) estimation.
In summary, Bayesian inference and estimation offer a unifying language for reasoning in the face of uncertainty — whether it’s about a transmitted signal corrupted by noise or a complex pattern hidden in data. The long-standing use of these techniques in communications has given rise to highly efficient and robust algorithms that have been naturally repurposed and expanded within modern AI. As ML applications increasingly require quantifying confidence and dealing with scarce data, this Bayesian foundation becomes even more critical.

2.3. Detection and Decision Theory: From Bits to Beliefs
After exploring estimation and Bayesian inference, a natural progression leads us to detection and decision theory. While estimation focuses on uncovering the values of hidden parameters from noisy observations, detection is concerned with deciding which hypothesis — among several possibilities — best explains the data. In both machine learning and communications, this distinction between estimating parameters and making decisions based on them is fundamental.
In classical communications, detection theory provides the mathematical foundation for interpreting received signals under uncertainty. For example, in a binary communication system, the receiver must decide whether the transmitted bit was 0 or 1, given a noisy observation corrupted by the channel. This is a classic hypothesis testing problem: the receiver compares two statistical models — one for each bit — and chooses the most likely one.
This same formalism now underpins modern binary classification in machine learning. Given input features and a labeled training set, an ML classifier effectively learns how to assign new inputs to one of multiple hypotheses (classes) — a direct extension of statistical detection theory.
There are two dominant approaches to detection:
- Classical detection theory, often using likelihood ratios and Neyman–Pearson criteria, aims to minimize specific error probabilities under controlled constraints. It assumes a probabilistic model of the data under each hypothesis and compares them directly. This gives rise to Receiver Operating Characteristic (ROC) curves, which visualize the tradeoff between false positives and true positives — a concept borrowed directly from radar and signal detection in communications.
- Bayesian detection theory, by contrast, incorporates prior beliefs about the likelihood of each hypothesis. It minimizes the overall expected cost (or risk) by accounting for both the posterior probability of each hypothesis and a loss function that quantifies the cost of incorrect decisions. This framework is now central to probabilistic ML, especially in Bayesian classifiers, naive Bayes models, and MAP (maximum a posteriori) detectors.
These detection tools were refined early in the context of radar systems, digital communication receivers, and signal intercept problems, long before becoming ML staples.
What’s remarkable is how concepts born partly from the need to decode noisy signals now support machines deciding between spam and not spam, tumor and no tumor, or fraudulent vs legitimate transactions.

2.4. Information Theory — The Common Language of Communication and Learning
While estimation and detection address how we infer hidden variables from data, information theory tells us what can be known in the first place. It quantifies uncertainty, compressibility, and feature importance — concepts at the heart of both telecommunications and artificial intelligence.
Originally developed by Claude Shannon to study the capacity of communication systems, information theory introduced ideas like entropy, mutual information, and channel capacity — all of which later found deep resonance in the world of machine learning.
Entropy and Uncertainty
Entropy is used to quantify uncertainty. For a discrete random variable X with probability distribution P(x), the entropy is defined as:
H(X) = −∑ₓ P(x) log P(x)
In communications, this represents the average number of bits needed to encode messages drawn from X.
Mutual Information as a Learning Signal
One of the most powerful and unifying concepts is mutual information (MI) — a measure of how much knowing one variable reduces uncertainty about another. It is defined as:
I(X; Y) = H(X) − H(X|Y) = H(Y) − H(Y|X)
In communications, MI defines the maximum rate at which information can be reliably transmitted over a noisy channel — the channel capacity.
In machine learning, MI is increasingly used for:
- Feature selection: Identifying input variables X that share the most information with labels Y
- Neural network pruning: Retaining only neurons or layers whose activations carry relevant information
- Representation learning: Finding compressed representations that retain essential predictive information
This brings us to a key modern concept: the Information Bottleneck Principle.
The Information Bottleneck Principle
The Information Bottleneck (IB) Principle proposes that a good representation T of input X should be both informative about the target output Y (by maximizing I(T; Y)) and compressed (by minimizing I(T; X)). Essentially, it suggests finding a representation that retains only the essential information needed to predict Y, while discarding irrelevant details about X. The IB framework thus provides a principled way to formalize learning representations, especially in deep learning, by balancing compression and prediction.
Given input X and target Y, we aim to learn a representation T that:
- Retains predictive power: Maximizes I(T; Y)
- Forgets irrelevant details: Minimizes I(T; X)
This leads to the objective:
Minimize: L = I(T; X) − β · I(T; Y)
where β controls the trade-off between compression and prediction. This principle not only explains generalization in deep learning, but also provides a tool for architecture design and training. For example, variational autoencoders (VAEs) approximate this trade-off through their objective function, balancing reconstruction accuracy and latent space regularity.
Applications in Compression and Robustness
In modern deep learning research, MI also provides tools for understanding overfitting, generalization, and compression. Redundant parts of a model — weights or activations with low MI relative to the output — can be safely removed. This leads to information-theoretic pruning and model compression strategies that maintain accuracy while reducing complexity.
For example, studies have shown that during training, deep layers tend to compress information about the input while preserving what is relevant for the output — aligning closely with the information bottleneck theory. This dynamic helps explain why deeper networks sometimes generalize better, despite being more complex.
Cross-Entropy: A Bridge Between Coding and Learning
Another core concept from information theory that permeates machine learning is cross-entropy. In coding theory, it measures the inefficiency of assuming a wrong distribution Q(x) instead of the true distribution P(x):
H(P, Q) = −∑ₓ P(x) log Q(x)
In classification problems, cross-entropy loss is widely used to compare predicted probabilities Q(x) with ground-truth labels P(x). What began as a coding inefficiency metric is now central to training neural networks and many other machine learning models.
This section shows that information theory is not merely a historical ancestor of ML — it is a living framework still guiding today’s AI. Whether we are transmitting a signal or training a model, we are always manipulating uncertainty, relevance, and noise. The tools we use to do so — entropy, MI, cross-entropy — remain the same.
Next, in Section 5, we explore how belief propagation, graphical models, and message passing — all of which are foundational in telecommunications — underpin today’s most powerful probabilistic models.
2.5. Bayesian Networks and Belief Propagation: From Error Correction to Probabilistic Graphical Models
In modern AI, Bayesian networks (also known as belief networks) are graphical models that represent the joint probability distribution over a set of variables through directed acyclic graphs. They provide a structured way to represent conditional dependencies and are widely used for inference, decision-making, and learning under uncertainty.
But before they were common in ML toolkits, Bayesian inference over graphs was already thriving in telecommunications — particularly in the domain of error correction coding. Two major examples highlight this influence: LDPC codes and Turbo codes.
LDPC Codes and Belief Propagation
Low-Density Parity-Check (LDPC) codes are powerful error-correcting codes developed in the 1960s by Robert Gallager and rediscovered in the 1990s as computational resources improved. These codes are defined by a sparse bipartite graph (a Tanner graph), which connects variable nodes (bits) and check nodes (parity constraints).
To decode an LDPC code, the receiver performs iterative probabilistic inference over this graph using a technique known as belief propagation or the sum-product algorithm. This algorithm estimates the marginal probabilities of each bit being 0 or 1 given the received (possibly corrupted) sequence.
At its core, belief propagation is a form of message passing — a way to compute marginal distributions in graphical models. Its success in LDPC decoding directly inspired later work on general graphical models, which form the backbone of many probabilistic machine learning approaches.
Turbo Codes and MAP Inference
Turbo codes, introduced in the 1990s, marked another breakthrough in channel coding. Their decoding strategy was based on Bayesian estimation, particularly maximum a posteriori (MAP) inference. Each decoder estimates the posterior probability of each transmitted bit given the observed sequence and the prior information passed from the other decoder.
Again, the iterative nature of Turbo decoding mirrors belief propagation in a graph-structured model. The encoder structure of Turbo codes can be viewed as a loopy graph, and the decoding process is essentially probabilistic inference on this structure. This connection paved the way for generalizing such inference methods to more complex models used in AI.
From Communications to AI
These examples underscore a broader pattern: core AI methods of inference in graphical models owe much to telecommunications, where the need to infer the true message from noisy signals led to innovative and efficient algorithms under strict constraints of time and energy.
Today, belief propagation is used in Bayesian networks, factor graphs, variational inference, and even deep probabilistic models. But its roots in LDPC and Turbo decoding provide a compelling historical bridge between telecommunications and AI.
This synergy is not coincidental — both domains are built around the challenge of recovering hidden information from noisy observations, whether those are bits over a fading channel or latent causes behind a dataset.

2.6. Signal Processing Foundations of Deep Learning
Modern machine learning — especially deep learning — borrows heavily from signal processing. This is not just historical coincidence, but a deep structural influence rooted in how both fields represent and process data. Signals in time or space, images, audio, and even sequences of symbolic data all require structured representations, efficient transformations, and robust learning from noise.
One of the clearest bridges between the two worlds is the convolutional neural network (CNN). Originally developed for image processing, CNNs apply learned filters to input data in a manner directly inspired by signal filtering techniques. But despite the name, what CNNs actually compute in most frameworks is cross-correlation, not true convolution.
In cross-correlation, the filter is slid over the input as-is, without flipping:
(CNN operation) → Σₖ f[k] ⋅ g[n + k]
By contrast, in true convolution (as used in classical signal processing), the filter is flipped before application:
(True convolution) → Σₖ f[k] ⋅ g[n − k]
This subtle difference is often glossed over in ML literature, yet it’s a perfect example of how concepts from telecommunications and signal processing are adapted — and sometimes approximated — within modern AI architectures.
Beyond CNNs, several core operations in neural networks are rooted in signal processing:
- Fourier transforms and their discrete counterparts (FFT) are frequently used in vision transformers, audio generation, and physics-informed models, where frequency-domain information is more informative than time or spatial-domain signals.
- Filtering and time-frequency analysis, central to digital communication systems, are now used in attention mechanisms, spectrogram processing, and multi-resolution learning.
- Stochastic process models (e.g. autoregressive, Markovian, or Gaussian processes) underpin time-series forecasting architectures and Bayesian deep learning frameworks.
Even the concept of residual learning — the basis of ResNets — has analogs in recursive filtering and adaptive equalizers. These similarities are more than analogies; they represent direct inheritance. For decades, the signal processing community has developed and deployed models for pattern extraction, denoising, prediction, and control — all of which now live inside today’s AI pipelines, often under different names.
Thus, deep learning is not a departure from signal processing — it is, in many ways, its nonlinear, learned evolution. Understanding this lineage not only grounds AI in well-established theory but also opens the door to new architectures informed by decades of insights from communication and systems engineering.

Part 3 — When the System Becomes Aware: Emergent Synergy Between AI and Telecommunications
In Parts 1 and 2, we explored a bidirectional relationship between telecommunications and machine learning. AI techniques have become embedded in telecom systems, and telecom theory has helped shape the foundations of modern AI. But in Part 3, we move into a space where something deeper happens — where the combination of AI and telecommunications doesn’t just enhance existing capabilities, but enables entirely new kinds of systems. These are systems that go beyond connection and computation. They exhibit properties of awareness.
By awareness, we do not mean consciousness in the human sense. Rather, we refer to the ability of a system to sense, communicate, and respond — with coherence, context, and continuity. These systems are distributed across space, but unified in function. They are embedded in the physical world, but shaped by inference and control. And they would not be possible without the deep integration of telecommunications and machine learning.

One of the clearest examples of this is smart agriculture. Consider a modern farm outfitted with a network of wireless sensors. These sensors are distributed across soil, air, and water systems. They measure temperature, humidity, pH levels, nutrient content, and the health of plants. On their own, these sensors produce raw data. But through telecommunications, they begin to speak — transmitting their readings across the field, through the cloud, into edge servers, or to a central model. The farm becomes an environment that communicates.
Yet communication is only the first step. The data that travels through this wireless infrastructure becomes input for machine learning models trained to detect disease, predict drought, or optimize irrigation. These models recognize patterns across time and space, correct for noise and sensor drift, and produce decisions that are both localized and globally informed. They can prioritize water usage, alert farmers to early signs of stress, or even coordinate actions across regions. What was once a set of disconnected devices becomes a responsive agricultural ecosystem.
This is not simply automation — it is distributed cognition. The intelligence of the system does not reside in any single sensor or server. It emerges from the synergy of data, communication, and learning. Without wireless connectivity, the sensors would remain silent. Without learning, the data would remain uninterpreted. It is the combination that gives rise to something new: a system that listens, learns, and acts.
This same pattern appears in other domains: energy grids that adapt in real time, cities that manage themselves through sensor networks, transportation systems that coordinate across vehicles and infrastructure. But agriculture is perhaps the most intimate example — a domain where life, growth, and sustainability are directly shaped by the responsiveness of the system.
In these systems, telecommunications gives the world a voice, and AI gives it a brain. Together, they form the nervous system of intelligent environments — not centralized, not monolithic, but distributed, adaptive, and grounded in the realities of the physical world. What emerges is not just a smarter network, but a more responsive one. Not just more data, but awareness.
As we move further into an era shaped by climate variability, resource constraints, and the need for resilient infrastructure, these systems will not be a luxury — they will be essential. And they will stand as proof of what becomes possible when learning and communication do more than cooperate — when they co-evolve into something greater than the sum of their parts.
Conclusion — From Signals to Systems That Sense
This article has traced the deep and evolving relationship between telecommunications and machine learning across three interconnected layers. We began with the applied frontiers, where AI enhances the performance and adaptability of modern communication systems — from channel estimation to network planning. We then moved to the theoretical roots, uncovering how foundational tools in modern machine learning were originally forged within the mathematical and signal processing frameworks of telecom. Finally, we arrived at a third space: one where the union of these two fields gives rise to intelligent, responsive, and aware systems — with smart agriculture as a clear and timely example.
Together, these three dimensions reveal that the relationship between AI and telecommunications is not just technological, but systemic. It touches theory, engineering, and application — and opens the door to a future where communication networks do more than transmit bits. They sense, learn, and respond. They become infrastructure with purpose.
This article is the beginning of a broader exploration. In the coming months, I plan to publish more focused pieces in each of these directions:
— On the foundational principles that connect estimation, information, and inference across ML and communications.
— On the practical use of AI in telecom systems, especially at the physical and network layers.
— And on the design of intelligent, IoT-based ecosystems, where awareness emerges from the synergy of sensing, connectivity, and learning.
The path ahead is rich with questions — theoretical, architectural, and ethical. But one thing is already clear: we are no longer building just networks or models. We are building systems that understand.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI