Accepted Tutorials – 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing

Phase Retrieval for Synthetic Aperture Systems: From Theory to Field

Samuel Pinilla (STFC)*; Kumar Vijay Mishra (United States DEVCOM Army Research Laboratory)

April 07, 2025 (Monday) | 0930 – 1300

Chairperson: Arunkumar K.P. – NPOL, Cochin

Abstract

Synthetic aperture (SA) systems create an effectively larger aperture, offering higher spatial and temporal resolution than what is achievable with a single sensor’s physical dimensions. These systems are utilized in various signal processing applications, including optics, radar, remote sensing, microscopy, acoustics, and tomography. A key challenge in SA processing is phase retrieval (PR), where the goal is to reconstruct a complex signal from phaseless measurements. While both convex and nonconvex methods have been proposed for the general PR problem, they are not directly applicable to many SA-specific scenarios. This tutorial offers an in-depth exploration of recent advancements in PR techniques for modern synthetic aperture imaging and sensing applications. Based on the linear propagation model, we categorize diverse applications into four main types: Fourier PR, coded illumination, coded detection, and random. We will discuss the underlying theories, algorithms, and practical examples, including the integration of machine learning approaches. Additionally, this tutorial aims to promote cross-disciplinary interaction among different SA fields, enhancing the overall understanding of PR challenges.

Speech Synthesis with Discrete Speech Tokens

Kai Yu (Shanghai Jiao Tong University)*; Shujie Liu (Microsoft Research Asia); Xie Chen (Shanghai Jiaotong University); Yiwei Guo (Shanghai Jiao Tong University)

April 07, 2025 (Monday) | 0930 – 1300

Chairperson: Hemant A. Patil – Dhirubhai Ambani University (DAU) (formerly, DA-IICT)

Abstract

In this tutorial, we will systematically introduce and taxonomize tokenization methods for speech. We will perform fair experimental comparisons between these tokens to show their distinct properties. Then, various types of discrete token-based speech synthesis systems, especially TTS models, will be elaborated in detail. Following this, we will focus on the combination of discrete speech tokens and large language models (LLMs) that pioneers cross-modality interactive voice agents with understanding and reasoning abilities. Finally, we will share insights on the existing problems and challenges.

Generative AI and Model Optimization

Swagatam Das (Indian Statistical Institute); Arijit Ukil (Tata Consultancy Services)*; Angshul Majumdar (IIIT Delhi)

April 07, 2025 (Monday) | 0930 – 1300

Chairperson: Peter Gerstoft – University of California, San Diego

Abstract

This tutorial will provide a comprehensive overview of fundamentals and recent breakthroughs in Generative AI (GenAI), the challenges associated with GenAI in solving different real-world problems including Industry 4.0, Internet of Things (IoT), digital healthcare etc, particularly focusing on the research ideas and concepts to develop optimized GenAI models to enable deployment in resource constraint devices and to realize different practical applications. With self-supervised learning, Gen AI or specifically LLMs are pre-trained on vast datasets. The traditional data-hungry LLMs consume huge amount of energy with high computational power requirement and perform (close to) human-level intelligence dissemination in different downstream tasks including QnA, information retrieval, code generation, content generation, and related others. While the performance of GenAI is mind-blowing, the parameters of the model which are often a few hundred billion (sometimes crossing the trillion mark) in number that not only require substantial energy to run but are also computationally too expensive to develop practical applications in diverse domains where computational capability and energy budget are limited. Hence, Gen AI models need to be optimally parameterized to reduce their overparameterization. The tutorial aims to present the fundamentals of generative AI, the state-of-the-art model compression techniques towards accelerating large model training and inference to improve efficiency and deployability.

Signal Processing for Integrated and intelligent 6G Connectivity

Aryan Kaushik (Manchester Met)*; Marco Di Renzo (Paris-Saclay University); Yonina Eldar (Weizmann Institute of Science)

April 07, 2025 (Monday) | 1400 – 1730

Chairperson: Bhogavalli Satwika – IISc Bangalore

Abstract

In this tutorial, we will discuss the timeliness of integrated sensing and communications (ISAC) and intelligent metasurfaces (IM) technologies with 6G standardization, such as 3GPP Rel. 19, Rel. 20+ and latest industry activities such as ETSI ISGs, EU industry-oriented projects. Next, we will present signal processing fundamentals, state-of-the-art, trends and technical advancements in IM and ISAC such as wave-domain signal processing, energy-efficient hybrid beamforming using convex optimization, index modulation, Stacked IM for sensing and communications prototype, reconfigurable intelligent surface (RIS)-aided ISAC for non-terrestrial networks (NTN), etc. We will also discuss signal processing synergies, challenges, use cases such as in public safety, 3D localization applications, etc. Following this, we will look at opportunities with cmWave and sub-THz frequencies towards evolution of 6G and Beyond. Finally, we will discuss future prospects of signal processing synergies between RIS/stacked/flexible IM and ISAC technologies.

CANCELLED

Next-Generation Remote Sensing: Challenges, Innovations and Applications

Mingyi He (Northwestern Polytechnical University); Jocelyn Chanussot (Grenoble Institute of Technology); Pourya Shamsolmoali (Queen’s University Belfast)*; Masoumeh Zareapoor (Shanghai Jiao Tong University)

A Deep Dive into Recent Advances in Stochastic Approximation

Gersende Fort (CNRS); Eric Moulines (Ecole Polytechnique); Hoi-To Wai (Chinese University of Hong Kong)*

April 07, 2025 (Monday) | 0930 – 1300

Chairperson: Marius Pesavento – Technische Universitaet, Darmstadt

Abstract

This tutorial provides a comprehensive introduction to the stochastic approximation (SA) scheme. The first part of the tutorial serves as an essential foundation, acquainting participants with the background and rationale of the SA scheme. It establishes a connection between SA and the challenge of locating roots of mean field functions. By demonstrating the derivation of the SA scheme as a stochastic perturbation of an Euler Ordinary Differential Equation (ODE) discretization, participants are granted insight into the scheme’s conceptual origins. The second segment shifts focus towards practical applications of the SA scheme. Through illustrative examples drawn from the realms of signal processing and machine learning, the versatility of SA in analyzing and understanding classical algorithms will be showcased. The third part of the tutorial delves deeper into the presented application examples, inspiring the introduction of a general stochastic algorithm. This broadens the scope of SA, accommodating scenarios with non-gradient updates and biased oracles. This section will emphasize modern convergence analyses encompassing sample complexity and high-probability bounds. The final section of the tutorial introduces participants to contemporary developments within the realm of SA. Prominent among these advances is the reduction of computational complexity through applying variance reduction to general SA, federated extensions of SA methods, and Markovian noise. By incorporating these latest innovations, participants will be well equipped to explore emerging frontiers in applying SA in their research.

Topological Signal Processing and Learning

Elvin Isufi (Tu Delft)*; Paolo Di Lorenzo (Sapienza University of Rome); Sergio Barbarossa (Sapienza University of Rome)

April 07, 2025 (Monday) | 1400 – 1730

Chairperson: Andrea Cavallo – TU Delft

Abstract

We begin the tutorial by motivating the need for topological data processing techniques and highlighting the limitations of graph signal processing (GSP) and graph machine learning. We will provide an overview of simplicial and cell complexes as mathematical objects to represent topological structures, as well as introduce the corresponding Hodge Laplacian theory and how it offers a new perspective on proximity between topological signals. We will then use the Hodge Laplacian and Hodge decomposition to introduce the two main pillars of this tutorial: topological signal processing (TSP) and topological machine learning (TML). We will draw parallels between GSP and TSP to motivate the techniques in the latter, then introduce the topological convolutional filter along with its frequency analysis and design strategies. After this, we will introduce the basics of topological machine learning, focusing particularly on convolutional and attentional neural networks built from the Hodge Laplacian and topological convolutional filter. By using the topological Fourier transform, we will tie these architectures with the spectral analysis, providing insights into how we can use these techniques for self-supervised learning and solve inference tasks on topological spaces. After discussing techniques with a fixed topology, we will explore how the Hodge theory and topological structure can be used to learn the topology from the data. We will discuss both the signal processing and machine learning perspectives, which essentially involve using linear vs. non-linear models to infer the direction of the hidden latent topology of the data. Finally, inspired by challenges in infrastructure networks such as communication, water, or power networks, where time-varying flow data are present, we will discuss spatiotemporal topological models as a way to capture the joint topological and temporal dependencies in the data. We will conclude the tutorial with some open issues that require advances in both fundamental signal processing concepts and their application to challenges in semantic communication and infrastructure networks.

End-to-End Learning from Crowdsourced Labels: A Signal Processing Perspective

Shahana Ibrahim (University of Central Florida)*; Panagiotis Traganitis (Michigan State University); Xiao Fu (Oregon State University); Georgios B. Giannakis (University of Minnesota)

April 07, 2025 (Monday) | 1400 – 1730

Chairperson:

Abstract

This tutorial explores the critical role of crowdsourcing in advancing artificial intelligence (AI) and machine learning (ML) tools to cope with noisy and possibly massive annotations. Crowdsourcing involves distributing data to multiple annotators, whose labels are then aggregated for downstream learning and inference tasks. However, this process has to deal with noisy labels due to factors such as limited annotator expertise and reliability. The tutorial will specifically offer an in-depth account of reliable and robust crowdsourcing systems that serve data annotation in machine learning from a signal processing vantage point. These include key crowdsourcing models, ranging from the Dawid-Skene one to recent deep learning-based end-to-end (E2E) models. The tutorial will further place emphasis on theoretical insights and algorithmic developments behind the success of these models. To this end, well-developed signal processing theory and methods—such as matrix/tensor decomposition and estimation theory—will be leveraged to address novel solutions to fundamental and long-lingering challenges in crowdsourcing. The tutorial will showcase how the unique strengths of the signal processing society (SPS) in optimization, statistical analysis, and advanced (multi-) linear algebra, among others, have propelled crowdsourcing research and still remain significantly relevant to address modern-day challenges.

Transforming Chaos into Harmony: Diffusion Models in Audio Signal Processing

Chieh-Hsin Lai (Sony AI); Koichi Saito (Sony AI); Bac Nguyen (Sony Europe B.V.); Yuki Mitsufuji (Sony AI); Stefano Ermon (Stanford University)*

April 07, 2025 (Monday) | 1400 – 1730

Chairperson: Satish Mulleti – Indian Institute of Technology Bombay (IIT Bombay)

Abstract

In this tutorial, we aim to expand diffusion model research and applications within the audio signal processing community. We will explore the diffusion model, providing intuitive insights and tracing its development history in an accessible manner. Additionally, we will introduce applications of diffusion models in audio signal processing, such as audio media restoration and controllable audio generation. We will share key insights into techniques originally developed for computer vision tasks to encourage research in the audio domain. The tutorial consists of four sessions: (1) Foundations of diffusion models (2) Applications of diffusion models to audio signal processing; (3) Live demonstration of diffusion model training and application to audio restoration and editing; (4) Future trends. Our aim is to spur additional research and applications of diffusion models in audio signal processing, as well as promote the democratization of diffusion models beyond the computer vision community.

Computational Methods in Radar Imaging

Hassan Mansour (Mitsubishi Electric Research Laboratories (MERL))*; Petros Boufounos (Mitsubishi Electric Research Laboratories)

April 07, 2025 (Monday) | 1400 – 1730

Chairperson: Edoardo Focante – TU Delft

Abstract

Recent advances in inverse problems and learning have shifted the design paradigm for sensing systems. Computational methods are now an integral part of the design toolbox, using algorithms to address hardware limitations. A very promising application has been in radar imaging, which is becoming increasingly important in applications including robotics, autonomous driving, and medical imaging, among others. This tutorial will present a general inverse problem and learning framework for array processing systems, which describes both the acquisition hardware and the scene being acquired. Under this framework, we can exploit knowledge, learned or designed, on the scene, the system, and the nature of a variety of errors that might occur. The result is significant improvements in the reconstruction accuracy. Furthermore, we consider the design of the system itself in the context of the inverse problem, leading to designs that are more efficient, more accurate, or less expensive, depending on the application. We conclude the tutorial by discussing open problems and new research directions.

Graph-based Machine Learning for Wireless Communications

Santiago Segarra (Rice University)*; Ananthram Swami (ARL); Zhongyuan Zhao (Rice University)

April 07, 2025 (Monday) | 0930 – 1300

Chairperson: Adrish Banerjee – Indian Institute of Technology Kanpur

Abstract

As communication networks continue to grow in scale and complexity, traditional approaches to network design and operation are becoming inadequate. Machine learning (ML) has garnered significant attention for its potential to complement conventional mathematical models in the capabilities of describing complex wireless systems and deriving computationally efficient solutions. However, standard ML methods, such as multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs), struggle to effectively leverage the underlying topology of communication networks, causing significant performance degradation as network size increases. Graph neural networks (GNNs) emerge as a promising ML approach that has gained surging interest within the ML community. GNNs excel when dealing with large network scales as well as dynamic network membership and topologies, outperforming MLPs and CNNs in such scenarios. This timely tutorial provides a gentle introduction to GNNs and explores recent approaches for applying them to solve classical problems in wireless communications and networking, including power allocation, beamforming, link scheduling, and routing. The emphasis will be on how GNNs can augment, rather than replace, existing solutions to these problems. The goal of the tutorial is to foster further research and exchange between the communications, ML, and signal processing communities and to inspire applications of ML in fields that will further advance wireless communications.

Unlimited Sensing: From Theoretical Foundations to Practical Impact

Ayush Bhandari (Imperial College London)*; Ruiming Guo (Imperial College London)

April 07, 2025 (Monday) | 1400 – 1730

Chairperson: Christoph Mecklenbraeuker – TU Wien

Abstract

The tutorial covers logistics and an introduction, followed by an exploration of Shannon’s impact on the digital revolution, including theory, practice, limitations, and implications. It then delves into computational sensing philosophy, examining various sensing approaches like pointwise, average sampling, low-resolution methods (e.g., Sigma-Delta, noise-shaping, time-encoding, signum), compressive sensing, and unlimited sensing. Following this, we will introduce the Unlimited Sensing Framework (USF), from theory to practice. This section will cover both time-domain and Fourier-domain recovery, signal processing with folding non-linearities, and sparse sampling. We will also discuss spectral estimation and sub-Nyquist sampling, focusing on band-pass and multi-band signals. We will explore USF’s application across different architectures, including one-bit sampling, event-driven or time-encoded sampling, and the Radon transform. The tutorial will showcase performance breakthroughs achieved through USF in areas such as high dynamic range (HDR) imaging, tomography, massive MIMO communications, and radar systems. Finally, we will conclude with a forward-looking discussion on the future of the field and closing remarks.