Education Short Courses
in Collaboration with SPS Education Board
The IEEE Signal Processing Society (IEEE-SPS) Education Board is planning an inaugural education activity in the form of short courses at ICASSP 2025. The introduction of education-oriented short courses will offer Professional Development Hours (PDHs) and Continuing Education Units (CEUs) certificates to those who complete each course. Given that students, academic, and industry researchers and practitioners have a broad diversity of interests and areas of experience worldwide, the IEEE-SPS goal is to develop meaningful methods of offering beneficial and relevant courses in support of our membership educational needs.
Four courses have been selected by the SPS Education Board and the ICASSP committee. The courses will be conducted in-person during the main ICASSP conference. The total duration of each course is 6 or 9 hours.
Participants can only attend live.
(SC-1) Title: A Signal Processing Tour of High-Dimensional Estimation and Learning: Results, Techniques, and Applications
Dates and Time: April 06, 2025 (Sunday): Full Day/6 hours
Yue M. Lu is a Harvard College Professor and Gordon McKay Professor of Electrical Engineering and Applied Mathematics at Harvard University. He received his M.Sc. in Mathematics and Ph.D. in Electrical Engineering from the University of Illinois at Urbana-Champaign in 2007. He has held a postdoctoral position at the Audiovisual Communications Laboratory at École Polytechnique Fédérale de Lausanne (EPFL), and has held visiting appointments at Duke University (2016) and École Normale Supérieure (ENS) (2019).
He currently serves as an Associate Editor for IEEE Transactions on Information Theory (2022–present) and has previously served as Associate Editor for IEEE Transactions on Signal Processing (2018–2022) and IEEE Transactions on Image Processing (2014–2018). He has been a member of various technical committees of the IEEE Signal Processing Society, including Signal Processing Theory and Methods (SPTM) (2016–2021), Sensor Array and Multichannel (SAM) (2022–present), Machine Learning for Signal Processing (MLSP) (2019–2021), and Image, Video, and Multidimensional Signal Processing (IVMSP) (2015–2017).
His research focuses on the theoretical and algorithmic aspects of high-dimensional statistical signal processing and learning. He has been recognized with the ECE Illinois Young Alumni Achievement Award (2015), an IEEE Signal Processing Society Distinguished Lectureship (2022-2023), and has received paper awards at several IEEE conferences, including ICIP (2006, 2007), ICASSP (2011), GlobalSIP (2014), and CAMSAP (2017). He is an IEEE Fellow (Class of 2024).
This short course assumes participants have a solid foundation in undergraduate probability theory and linear algebra. It is designed to be largely self-contained, with key concepts, tools, algorithms, and rigorous theory introduced progressively through detailed lectures. Participants will be actively engaged and will have the opportunity to deepen their understanding through Python-based labs completed in self-study after the lectures.
Module 1: Introduction to High-Dimensional Phenomena
We will begin by motivating this short course with an introduction to the “blessings of dimensionality,’‘ which refer to unique and useful phenomena that emerge in high-dimensional settings, such as concentration of measure, scaling limits, and universality. Next, we will delve into the historical development of the field and explore several key examples, including spiked random matrices and high-dimensional empirical risk minimization, to demonstrate the types of results that can be achieved through high-dimensional analysis. These examples will serve as recurring themes throughout the course.
Module 2: Basic Concepts and Probability Tools
We will cover several fundamental results and probabilistic tools that serve as the foundation for all subsequent discussions in this short course. Specific topics include:
- Multivariate normal distributions, and the rotation trick.
- Example 1: a simple proof of the Johnson-Lindenstrauss lemma for dimension reduction via the rotation trick.
- Concentration inequalities, and stochastic order notation.
- Example 2: a high-dimensional geometric interpretation of the James-Stein estimator.
Module 3: Random Matrices and Spectral Methods for High-Dimensional Estimation
In this module, we will introduce fundamental results related to the spectral properties of large random matrices. As a key application in signal processing, we will explore spectral methods for estimating signals in nonconvex settings and analyze their performance, with particular emphasis on the phase transition phenomenon. Specific topics include:
- Wigner matrices and the semicircle law.
- Spiked (i.e. signal plus noise) matrices and the BBP phase transition.
- Single-index models, spectral methods for signal estimation, and performance analysis.
Module 4: Analyzing High-Dimensional Empirical Risk Minimization
In this module, we cover the analysis of high-dimensional empirical risk minimization using random matrix theory, and Gaussian comparison inequalities (the CGMT framework). Specific topics include:
- High-dimensional ridge regression and its analysis.
- Random feature models, and the double-descent phenomenon in learning curves.
- Gaussian comparison inequalities and the CGMT framework.
- Analyzing binary classification with two-layer neural networks via CGMT.
Module 5: Advanced Topics, Recent Progress, and Open Problems
We will explore the limitations of existing theories and techniques, highlight active areas of ongoing research, and discuss open problems in the field. Specific topics that will be covered include:
- High-dimensional estimation with structured and nearly deterministic sensing matrices.
- Gaussian equivalence theorems for random feature models and multilayer random neural networks.
- Asymptotic characterizations of in-context learning via linear transformers.
- A precise end-to-end analysis of transport-based generative models
- Open research questions and future directions
Conclusions
We will summarize key takeaways from the short course and provide final remarks and suggestions for further reading.
Hands-On Components
The lecture materials are supplemented by four Python-based laboratory assignments, each designed to be completed in one hour of self-study. These labs are designed to reinforce both the theoretical insights and practical techniques covered in the lectures, helping participants bridge the gap between abstract concepts and applications.
- Lab 1: High-dimensional geometry, concentration of measure, and a high-dimensional geometric interpretation of the James-Stein estimator.
- Lab 2: Random matrices, semicircle law, BBP phase transition for spiked models.
- Lab 3: High-dimensional ridge regression, random feature models, and the double-descent behavior in learning curves.
- Lab 4: Structured sensing matrices in high-dimensional estimation and universality.
Audience, Pre-Requisites, and Course Materials
The target audience for this short course includes graduate students, researchers, and signal processing practitioners in industry. The only prerequisites for participants are a solid foundation in undergraduate probability theory and linear algebra, and familiarity with Python (or Matlab). The course is designed to be accessible, with detailed lectures that progressively introduce each concept, ensuring that participants can fully grasp both the theoretical and practical aspects of high-dimensional analysis. Hands-on components and exercises will also be incorporated to enhance understanding and reinforce the material.
(SC-2) Title: Estimation Performance Bounds in Signal Processing
Dates and Time:
- April 06, 2025 (Sunday): Full Day/6 hours
- April 07, 2025 (Monday): Forenoon/ 3 hours
Joseph Tabrikian is a Professor of Electrical and Computer Engineering in Ben-Gurion University of the Negev. He is a Fellow of IEEE for Contributions to Estimation Theory and MIMO Radars, and a Fellow of AAIA. He served in the editorial boards of the IEEE Transactions on Signal Processing and of the IEEE Signal Processing Letters as associate editor and senior area editor. He is co-author of 7 paper awards in IEEE conferences and workshops. He received awards of outstanding lecturer for the courses Estimation Theory and Introduction to Stochastic Processes. He is a plenary speaker at IEEE International Conference on Signal, Information and Data Processing 2024 and is expected to give a plenary talk at IEEE Radar Conference 2025. His research interests include estimation and detection theory, model selection for learning systems, radar signal processing, cognitive radar, DNN for radar signal processing, and automotive radar.
Tirza Routtenberg is an Associate Professor of Electrical and Computer Engineering at Ben-Gurion University of the Negev. She served as the William R. Kenan, Jr. Visiting Professor for Distinguished Teaching at Princeton University in 2022-2023 and is a Senior Member of IEEE. She has delivered tutorials on Graph Signal Processing at international conferences and held editorial roles for IEEE Transactions on Signal and Information Processing over Networks and IEEE Signal Processing Letters. She also serves on the IEEE Signal Processing Society’s Education Board, contributing to educational advancements in the field. She has received Excellence in Teaching awards from the Ben- Gurion University of the Negev (2024) and the ECE department (multiple times). She has won four Best Paper Awards at international conferences for papers on performance bounds. Her research interests include signal processing, graph signal processing, optimization for power systems, and statistical signal processing.
Koby Todros is an Associate Professor of Electrical and Computer Engineering in Ben-Gurion University of the Negev. He spent nearly a decade working in the high-tech industry, focusing on algorithm development for biomedical signal processing. Prof. Todros has been honored with an Excellence in Teaching award by the ECE department. As a senior IEEE member, he holds editorial positions with both the Signal Processing journal (Elsevier) and IEEE Signal Processing Letters. Additionally, he is a member of the Signal Processing Theory and Methods committee. His research encompasses statistical signal processing and machine learning, with focus on semi-parametric detection and estimation, adaptive filtering, sensor array and multichannel signal processing, blind source separation, and biomedical signal processing.
Estimation theory is a fundamental building block of statistical signal processing and machine learning. A strong foundation in estimation theory is essential for successful signal processing engineers and researchers. One of the key topics in this field is performance bounds, which provide valuable tools for signal processing theory and applications. These bounds serve as benchmarks for evaluating the performance of estimators and are frequently used in system design and optimization, especially in cognitive and adaptive systems. The most widely used performance bound for parameter estimation is the Cram´er-Rao Bound (CRB) in both Bayesian and non-Bayesian frameworks, mainly due to its simplicity. However, the CRB is based on several assumptions that are often not met in practice. These limitations include strict regularity conditions, unattainability, the need for perfect model specification, or unbiasedness conditions that are not satisfied by common estimators. Other classical tight bounds, such as the Barankin and Weiss-Weinstein bounds, partially address the limitations of the CRB in the non-Bayesian and Bayesian frameworks, but they are computationally complex and often intractable. Over the past 15 years, there has been significant progress in the study of performance bounds, encompassing bounds for periodic and constrained parameter estimation and reparameterization, misspecified bounds [9, 10], and tight and insightful Bayesian bounds, and efficient computation of the bounds, including learning-based approaches for computation of bounds.
In today’s era, where machine learning methods are often applied in an ad-hoc manner, there is a growing demand for rigorous performance bounds to ensure the reliability and robustness of models, particularly in complex systems. Furthermore, with the rise of big data applications, such as sensor networks and large-scale communication systems, the need for efficient tools to guide system design and resource allocation has become increasingly critical. Performance bounds offer the essential theoretical framework enabling researchers to identify optimal strategies for estimation and detection, even in challenging environments involving nonlinearities or model mismatches. By introducing the latest advancements in Bayesian and non-Bayesian performance bounds, this short course will enhance participants’ understanding and expose them to new ideas and tools that are essential for the signal processing community. These concepts are especially timely, given the increasing complexity of modern signal processing tasks involving large datasets and machine learning-driven systems. The goal of this course is to introduce the theory of performance bounds in both Bayesian and non-Bayesian frameworks and help the attendees to understand their limitations and relevance to various signal processing applications. Additionally, the course will equip participants with practical techniques for efficient computation of these bounds.
Introduction and motivation [0.25 hour]
In this part, we will introduce basic concepts in estimation theory (e.g. observation space, parameter space, Bayesian and non-Bayesian parameter estimation, mean-unbiasedness) as a foundational tool in signal processing. We will highlight performance bounds as essential benchmarks for performance evaluation of estimators, system design, and optimization. This introduction will set the stage for exploring classical and advanced bounds throughout the course.
Classical MSE bounds [2.75 hours]
– Bayesian framework
- Covariance inequality bounds -We will demonstrate how the Cauchy-Schwarz inequality is applied to derive various performance bounds on the mean-squared-error (MSE) of estimators. This inequality establishes a relationship between estimation error and the statistical properties of the probability density function. We will introduce the necessary conditions on auxiliary functions to obtain estimator-independent bounds. Using different auxiliary functions, several existing bounds, such as the Bayesian CRB (BCRB), Bobrovsky-Zakai bound, and Weiss-Weinstein bound, will be derived. We will prove that this family of bounds can be derived through constrained MSE minimization. Additionally, we will highlight situations where these bounds may not be attainable and present alternative simple bounds that have recently been proposed.
- Ziv-Zakai family bounds – We will explain how MSE lower bounds in this family can be derived using an M-ary hypothesis testing problem. Additionally, we will discuss the valley-filling function and extension to the vector parameter case. The performance of Ziv-Zakai lower bound will be discussed in both high-noise and low-noise regimes, and we will compare it with covariance inequality bounds via simple signal processing examples.
– Non-Bayesian framework
Similar to the Bayesian framework, we will demonstrate how non-Bayesian MSE bounds can be obtained using the Cauchy-Schwarz inequality and derive conditions on estimators’ bias that lead to estimator-independent bounds. Additionally, we will prove that non-Bayesian bounds can be derived through MSE minimization under bias constraints. Specifically, we will derive the CRB and Barankin-type bounds. Finally, the trade-off between estimation bias and covariance will be discussed, followed by techniques for optimal bias selection.
Non-Bayesian bounds under Lehmann unbiasedness theory [2 hours]
Lehmann unbiasedness -We will introduce a general non-Bayesian theory based on Lehmann unbiasedness, which extends the concept of unbiasedness to arbitrary risks and parameter spaces. We will demonstrate that the mean-unbiasedness condition is obtained under the MSE risk. This concept will be applied to the following two classes of estimation problems frequently encountered in signal processing.
- Periodic parameter estimation – This section will address the problem of estimating parameters with cyclic or periodic behavior, such as phase, Direction of arrival (DOA), and frequency estimation. We will define periodic and cyclic unbiasedness conditions based on the concept of Lehmann-unbiasedness. Using this framework, we will develop performance bounds for periodic problems, including the periodic CRB and the cyclic Barankin bound, which are based on the periodic and cyclic unbiasedness, respectively. A brief overview of the Bayesian versions of these bounds will also be presented.
- Constrained parameter estimation – Constrained parameter estimation focuses on problems where the unknown parameters are subject to certain limitations or restrictions, which are common in fields like array processing and image reconstruction. This lecture will explore performance bounds for constrained parameter estimation, emphasizing the limitations of traditional bounds such as the constrained CRB (CCRB) and their applicability in practical scenarios. We will introduce the Lehmann-unbiased CCRB (LU-CCRB) and the Lehmann-unbiased constrained Barankin-type bound (LU-CBTB), which provide tighter and less restrictive bounds for constrained estimators, particularly when classical bounds fail due to non-differentiability or nonlinear constraints. Simulations will illustrate the effectiveness of these bounds, especially in cases where the commonly used constrained maximum likelihood (ML) estimator is Lehmann-unbiased but not mean-unbiased.
Misspecified bounds [1 hour]
In many signal processing applications, the true data model is not perfectly known, and the estimator is derived based on an assumed model that may be inaccurate. Even when the model is perfectly known, the corresponding estimator may become intractable and computationally complex, motivating the use of approximate (misspecified) models. In these cases, model misspecification contributes to estimation errors in addition to conventional errors arising from the random nature of the observed data. A significant breakthrough in this field within the signal processing community was achieved in, which established the theory of misspecified non-Bayesian estimation performance bounds. Another important contribution to the field was made in, in which misspecified Bayesian bounds were presented. Misspecified performance bounds predict the expected performance limits resulting from both sources of errors. In this lecture, we will derive Bayesian and non-Bayesian misspecified bound, primarily in the Cram´er-Rao family. The theory will be illustrated with signal processing examples that involve various forms of model misspecification.
A unified framework of MSE lower bounds [1.5 hours]
In this lecture, we will present a unified framework for deriving general classes of non-Bayesian, Bayesian, and hybrid lower bounds on the MSE of estimators. These classes are constructed by projecting each component of the estimation error vector onto specific Hilbert subspaces of L2. In the non-Bayesian case, the focus is on uniformly unbiased estimators, with the relevant Hilbert subspaces formed via integral transforms applied to the likelihood-ratio (LR) function. For the Bayesian case, integral transforms are applied to functions used in the Weiss-Weinstein family of bounds, while in the hybrid case, they are applied to the centered LR function. These integral transforms extend the traditional derivative and sampling operators commonly used to derive well-known bounds such as the Cram´er-Rao, Weiss-Weinstein, and hybrid Barankin bounds. By adjusting the kernel function of the integral transform, we can tailor the Hilbert subspaces to derive both new and existing bounds. This approach provides flexibility and computational efficiency, with a particular focus on Fourier transform-based bounds. These bounds effectively condense crucial information from the input function (the LR function in non-Bayesian case) into a few key frequency components, yielding computationally efficient bounds that accurately predict the signal-to-noise ratio (SNR) threshold. These bounds are particularly useful for nonlinear estimation problems involving ML, minimum MSE, maximum a-posteriori probability (MAP), and joint MAP-ML estimators. The proposed framework unifies existing methodologies and offers tighter and computationally efficient bounds for a wide variety of estimation problems. This lecture will cover the mathematical foundations of these bounds, their derivation through integral transforms, and practical applications in signal processing. Through examples and case studies, participants will gain an in-depth understanding of the theoretical aspects and their implementation in real-world estimation problems, making it a valuable session for researchers and practitioners working in statistical signal processing.
Signal processing applications [1.5 hours]
In this lecture, we will provide several examples of signal processing where we show how estimation performance bounds can be used for signal processing applications for both system performance evaluation and system design.
- System design for sensing applications – We will address the problem of array geometry design for DOA estimation (or spectrum design for time-delay/range estimation). The limitations of the CRB and BCRB for this problem will be demonstrated, motivating the need for tighter Bayesian and non-Bayesian bounds. In addition, the performance bounds presented in the course will be utilized as optimization criteria for waveform design in the space-frequency domain for multiple-input multiple-output (MIMO) radar and integrated sensing and communications (ISAC) systems.
- Graph signal processing (GSP) -We will demonstrate the application of various performance bounds in GSP, specifically for signal recovery over graphs. We will demonstrate how the bounds can be used for the design of sampling schemes for graph bandlimited signals and for filter design in complex networked systems. Additionally, we will use these bounds to guide the design of sampling schemes for graph bandlimited signals and the development of graph filters in complex networked systems.
- Power system state estimation – Power system state estimation is a fundamental problem in electrical networks, crucial for monitoring and ensuring the stability of the grid. Although it is essential for real-time grid operations, the problem is NP-hard, making performance bounds critical for evaluating the efficiency of state estimators and guiding the design of robust algorithms that can handle the computational challenges while maintaining reliability. We will also discuss lower bounds on the admittance matrix estimators.
Supporting Course Resources
Comprehensive lecture notes, slides, and computer simulations of the signal processing examples will be provided to all attendees to enhance their learning experience.
Hands-on Components
To complement the theoretical content, each lecture will be accompanied by a MATLAB exercise focused on computing lower bounds for real-world estimation problems. Illustrative examples include DOA estimation in array signal processing, auto-regressive parameter estimation in time-series modeling, and parameter estimation in neural networks.
Previous Related Versions of the Course
Some of the topics of this course have been taught at Ben-Gurion University. This course and its variations are not available publicly and have not been given at any conference or workshop.
Target Attendees and Prerequisites
The lectures are designed for graduate studies level in Electrical Engineering. Basic knowledge of random signals, linear algebra, and constrained optimization is mandatory. Acquaintance with simple signal processing applications and basic estimation theory is beneficial.
(SC-3) Title: Phase Processing of Speech Signals
Dates and Time:
- April 06, 2025 (Sunday): Full Day/6 hours
- April 07, 2025 (Monday): Afternoon/ 3 hours
Bayya Yegnanarayana is currently Emeritus Professor and INSA Honorary Scientist at the International Institute of Information Technology Hyderabad (IIIT-H). He was Professor Emeritus at BITS-Pilani Hyderabad Campus during 2016. He was an Institute Professor from 2012 to 2016 and Professor & Microsoft Chair from 2006 to 2012 at IIIT-H. He was a Professor (1980 to 2006) and Head of the CSE Dept (1985 to 1989) at IIT Madras, a visiting Associate Professor at Carnegie-Mellon University (CMU), Pittsburgh, USA (1977 to 1980), and a member of the faculty at the Indian Institute of Science (IISc) Bangalore (1966 to 1978). He received BSc from Andhra University Visakhapatnam in 1961 and BE, ME and PhD from IISc Bangalore in 1964, 1966, and 1974, respectively.
His research interests are in signal processing, speech, image processing and neural networks. He has published over 420 papers in these areas. He is the author of the book “Artificial Neural Networks”, published by Prentice-Hall of India in 1999. He has supervised 39 PhD and 42 MS theses at IISc, IITM and IIIT-H. He is a Fellow of the Indian National Academy of Engineering (INAE), a Fellow of the Indian National Science Academy (INSA), a Fellow of the Indian Academy of Sciences (IASc), a Fellow of the IEEE (USA) and a Fellow of the International Speech Communications Association (ISCA). He was the recipient of the 3rd IETE Prof.S.V.C.Aiya Memorial Award in 1996. He received the Prof. S.N. Mitra Memorial Award for the year 2006 from INAE. He was awarded the 2013 Distinguished Alumnus Award from IISc Bangalore. He was awarded “The Sayed Husain Zaheer Medal (2014)” of INSA. He received Prof. Rais Ahmed Memorial Lecture Award from the Acoustical Society of India in 2016. He was awarded Life Fellow of IIT Kharagpur in 2022.
He was an Associate Editor for the IEEE Transactions on Audio, Speech, and Language Processing during 2003-2006. He is currently an Associate Editor for the Journal of the Acoustical Society of America. He received Doctor of Science (Honoris Causa) from Jawaharlal Nehru Technological University Anantapur in February 2019. He was the General Chair for Interspeech2018, held in Hyderabad, India, during September 2018. He was a visiting Professor at IIT Dharwad and at CMU Africa in Rwanda during 2019. Currently, he is an Adjunct Faculty at IIT Tirupati, a Distinguished Professor at IIT Hyderabad, and a Distinguished Adjunct Professor at IIIT Naya Raipur.
Prof. Yegnanaryana has been teaching courses at UG and PG levels at various institutes in India and abroad over the past 57 years. He taught several courses, including digital signal processing, speech signal processing, artificial neural networks, artificial intelligence, information theory, acoustics, electromagnetic theory, and many others. He has also given several tutorials at various national and international conferences, including a tutorial on Nonspectral Features in Speech Processing at Interspeech 2006 in Pittsburgh, USA.
Sri Rama Murty received B.Tech degree in Electronics & Communications Engineering from JNTU Hyderabad in 2002, and PhD in Computer Science & Engineering from IIT Madras in 2009. He joined the Department of Electrical Engineering at IIT Hyderabad as an Assistant Professor in 2009, and he is currently a Professor in the same department. He is also an associate faculty member in the Department of Artificial Intelligence at IIT Hyderabad. He was Head of the Department of Electrical Engineering at IIT Hyderabad from 2017 to 2020. He has been an Associate Editor of IEEE Signal Processing Letters (SPL) since 2021. He was an organizing committee member & Exhibits chair of Interspeech – 2018, held in Hyderabad. He was a technical area chair for “Analysis of speech and audio signals,” Interspeech 2018 and 2020. His research interests include signal processing, speech analysis, machine learning, and deep learning.
The objective of this course is to present an overview of the phase processing of speech signals by discussing the meaning and context of the use of the term phase spectrum. The course also presents, starting from the basics of digital signal processing, how the understanding and utility of phase has evolved over the years, ending up with the current state of obtaining the phase without wrapping. The course concludes by highlighting some possible new applications in speech processing and, at the same time, the challenges in realizing those applications.
Signals and signal processing basics
- Phase in signals: Sinusoids, Modulated signals, Multicomponent signals, Phase shift of multicomponent signals, Instantaneous spectrum for a given signal.
- Basics of signals and systems: Eigenfunctions and eigenvalues of a linear system, Spectrum of a signal and its approximations, Projection of real-time signal onto complex exponential signals, Projection of real frequency function onto complex exponential signals
- Equivalent representations of signals and systems: Linear difference equation representation of discrete-time systems, System function, Poles and zeros of the system function, Minimum, maximum, and mixed-phase systems
- Signal decomposition methods for multicomponent signals: Analytic signal, AM-FM components, STFT at each instant through DFT (block processing), Narrowband filtering at each frequency, Single-frequency filtering (filtering at fs/2)
- Characteristics of speech signals: Sequence of impulses exciting vocal tract system, Time-varying excitation and vocal tract system, Wideband (WB) and narrowband (NB) spectrograms, Difficulty in defining magnitude and phase,
- Scope of the present course (only signal processing perspective): STFT phase involving block processing, SFF phase involving filtering
Phase processing of speech signals using mostly STFT Analysis
- Studies highlighting the importance of phase: History and recent efforts, Effect of size of analysis window on phase spectrum, Use of short window for magnitude spectrum
- Phase representations for different contexts: Fourier analysis of phase – Effect of windowing, Phase derivatives wrt time and wrt to the frequency, phase representation in the harmonic domain – sinusoidal modeling, Relative phase shift
- Group delay (GD) processing: GD functions and its properties, Formant extraction using GD functions, LP phase spectrum, Minimum phase GD function, Modified GD function, Chirp GD function, Robustness of GD for additive noise, Applications of GD functions
- Instantaneous frequency (IF) processing: Computation and properties of IF, Features in IF spectrum
- Reassigned spectrograms: Reassignment – History and definitions of CIF and LGD, Reassigning the spectrogram, Nelson’s algorithm, Reassigned power spectrum, Pruning the reassigned spectrogram, Cross spectral method, Separation of formants and glottal pulses, Applications – Analysing phonation types
- Issues in processing spectral phase: Effects of windowing, Methods for phase unwrapping
Group delay processing of speech signals without phase wrapping
- SFF analysis: Filtering frequency-shifted signal at Fs/2, Instantaneous complex spectra, Wideband (WB) equivalent SFF spectrogram, Narrowband (NB) equivalent SFF spectrogram, Signal reconstruction from SFF spectra, Speech enhancement using SFF analysis
- Modified SFF analysis: Modified SFF spectrograms, Enhancement of features of speech production
- GD spectrograms without phase wrapping: GD spectrograms for different types of voices, Enhancement of formant and harmonics in spectrographic display
- Analysis of phase derivatives using SFF analysis: Features in GD and IF, Magnitude and phase spectrograms of different phase derivatives
- Extraction formant and harmonic frequencies: Analysis of different voices
Possible new applications of phase processing and challenges
Some applications of GD spectrograms, Possible new applications, Computational issues, Signal processing artifacts, Time-frequency resolutions in practical signals, Phase processing of degraded signals, Phase processing of mixed signals, Phase processing of nonspeech signals
Tentative List of Experiments and demonstrations
STFT analysis of speech, Linear prediction analysis of speech, SFF analysis of speech, Modified SFF analysis of speech, GD spectrograms without phase wrapping, Enhancement of production features in GD spectrograms, Extraction of formants and harmonics from GD spectrograms, Speech synthesis using SFF magnitude and phase spectra, Speech intelligibility improvement using SFF analysis and synthesis
(SC-4) Title: Spherical Visual Signal Processing: A Primer
Dates and Time:
- April 06, 2025 (Sunday): Full Day /6 hours
- April 07, 2025 (Monday): Forenoon / 3 hours
Cláudio R. Jung received B.S. and M.S. degrees in Applied Mathematics, and a Ph.D. in Computer Science, from Universidade Federal do Rio Grande do Sul (UFRGS), Brazil, in 1993, 1995, and 2002, respectively. He is an Associate Professor at UFRGS in the Computer Science department and was a visiting faculty at the University of Pennsylvania from July 2015 to July 2016. His research interests include image processing, computer vision, and pattern recognition. He has been a TPC member or reviewer for several image processing and computer vision conferences and journals. He has been an Associate Editor for the IEEE Transactions on Image Processing journal since December 2021. Prof. Jung is the author of more than 10 articles addressing different computing tasks with spherical visual signals.
Thiago Lopes Trugillo da Silveira is an Assistant Professor (“Adjunct Professor”) at the Institute of Informatics from the Federal University of Rio Grande do Sul (UFRGS), Brazil. He holds a Ph.D. degree in Computer Science (2019) from UFRGS, an M.Sc. degree in Computer Science (2016), and B.Sc. degrees in Information Systems (2015) and Computer Science (2013) from the Federal University of Santa Maria (UFSM), Brazil. Dr. Silveira is a Brazilian Computer Society (SBC) member and a researcher for the Signal Processing Group @ Stats and the Computer Graphics, Image Processing, and Interaction Group. His interests include signal, image, video processing, pattern recognition, and computer vision. Dr. Silveira is the author of more than 15 articles addressing different computing tasks with spherical visual signals
Nowadays, capturing omnidirectional images – spherical, panoramic, or 360° images – has become more affordable and portable due to the release of new devices. These types of images approximate the ideal imaging model called the plenoptic image model, which captures all visual information of a scene from all possible points of view over time. Spherical media provides immersive user experiences in augmented, mixed, and virtual reality (AR/MR/VR) applications when viewed on head-mounted display devices (HMDs). Processing spherical visual signals might involve techniques from signal and image processing, computer vision, VR, and machine learning.
This Short Course on spherical visual signal processing aims to provide a brief yet solid introduction to the field for researchers. The course provides an in-depth discussion of the spherical camera model; pipelines for spherical image acquisition; and planar or multi-planar representations. Furthermore, this short course is broadly helpful since it reviews new tools for processing spherical signals including recent learning-based strategies and selected applications, which can inspire new research directions.
A detailed outline is listed below.
- Introduction to spherical visual signals (Thiago L. T. da Silveira): This first session will introduce fundamental knowledge of spherical visual signals, distinguishing them from regular visual signals (images and videos). This session will also motivate and explore the typical applications and potential usage of spherical visual signals.
- Foundations of the Spherical Imaging Model (Thiago L. T. da Silveira): This second session will revise the plenoptic imaging function and expose some relevant approximations. Then, we will delve into the nuances of the Spherical Imaging Model, showing how 3D world points map to a spherical camera. We will present the mathematics relating 3D world points to their projections on different spherical camera models. Finally, we will explain how to retrieve the 3D world points using epipolar geometry constraints in a multi-view camera setup.
- Standard acquisition systems (Thiago L. T. da Silveira): This third session will expose the standard acquisition systems, which include catadioptric and polydioptric capturing systems and the recent affordable low-cost dual-fisheye 360-degree cameras. We will then expose the pros and cons of each approach.
- Standard planar/multiplanar representation formats (Thiago L. T. da Silveira): This fourth session will expose different sphere-to-plane mapping functions, such as equirectangular projection, cube mapping, and tangent plane projection. We will then discuss the main pros and cons of each representation.
- Main challenges of spherical visual signal processing (Thiago L. T. da Silveira): This fifth session will expose the main challenges of processing spherical visual signals such as topology when the signals are represented on the sphere. We will also relate challenges to the adopted planar/multiplanar representation, such as irregular sampling and the underlying distortions, as well as connectivity.
- Strategies for spherical visual signal processing (Thiago L. T. da Silveira): This sixth session will explore different strategies for tackling the challenges discussed in the previous session.
- Selected applications (Thiago L. T. da Silveira): This seventh session will present an overview of three selected applications involving spherical visual signals. This session will explore applications such as gravity alignment, layout estimation, and depth estimation. The applications will be contextualized and characterized regarding requirements and fundamental assumptions. Then, we will show real-world examples.
- Final remarks (Thiago L. T. da Silveira): This eighth session will conclude this Short Course and point out probable future directions of spherical visual signal applications.
Hands-on/Experimental Components
The course will offer participants a comprehensive experience encompassing practical coding exercises in Python language. We will make notebooks available for the concepts exposed in different sessions, such as the Spherical Imaging Model and standard planar/multiplanar representations, and explore some applications, such as gravity alignment.
Prerequisite & Intended Audience.
The primary audience comprises senior undergraduate students, early-stage graduate students, and researchers across various disciplines interested in spherical visual signal processing, including Electronic Engineering, Computer Science, Mathematics, etc. Previous knowledge of spherical imaging is not required. However, attendants should know the basics of signal/image processing tools for regular perspective images.
6-9 hour courses held over two days, designed to offer an in-depth, multi-faceted understanding of a topic, including hands-on experience. Participants will receive course materials and a professional development certificate. The fee is per course, and registration for the main conference is not required to attend the short courses.