Grand Challenges

GC-1: First Indoor Path Loss Prediction Challenge

Organized by: Stefanos Bakirtzis, Ranplan Wireless | Çağkan Yapar, TU-Berlin | Kehai Qiu, University of Cambridge | Ian Wassell, University of Cambridge | Jie Zhang, University of Sheffield Challenge Website

Efficient and realistic tools capable of modeling radio signal propagation are an indispensable component for the effective operation of wireless communication networks. The advent of artificial intelligence (AI) has propelled the evolution of a new generation of signal modeling tools, leveraging deep learning (DL) models that learn to infer signal characteristics. This Grand Challenge will probe the potential of DL algorithms to infer wireless signal attenuation in indoor propagation environments, where modeling signal propagation is more challenging due to a substantially larger number of reflected, refracted, scattered, or diffracted electromagnetic field components. To this end, we release a large dataset comprising radio maps from intelligent ray-tracing simulations conducted in indoor environments of varying complexity, multiple frequency bands, and assuming different antenna radiation patterns. Exploiting these data enables the development of full-fledged data-driven propagation models that can generalize simultaneously over new building layouts, frequency bands, and antenna radiation patterns, thus paving the way for the replacement of legacy radio signal propagation modeling techniques.

GC-2: Gas source localization from real-world spatial in-situ concentration and wind measurements

Organized by: Patrick Hinsen, German Aerospace Center (DLR) | Thomas Wiedemann, German Aerospace Center (DLR) | Han Fan, Technical University of Munich | Claudia Munoz Martos, German Aerospace Center (DLR) | Victor Scott Prieto Ruiz, German Aerospace Center (DLR) | Siwei Zhang, German Aerospace Center (DLR) | Dmitriy Shutin, German Aerospace Center (DLR) | Achim Lilienthal, Technical University of Munich Challenge Website

The scope of this challenge is development of signal processing methods for localizing a gas source using in-situ wind speed and gas concentrations measurements. The methods are designed to advance robotic olfaction and associated autonomous robotic exploration techniques -- highly relevant yet challenging problems in the context of gas source localization, environmental monitoring, and civil security, to name only a few.

Development of the corresponding techniques, however, requires experimental data to benchmark different methodologies and get better understanding of the observed phenomena. Yet accurate, realistic gas observations with robotic platforms remain challenging due to a number of factors, such as variability of the environment (e.g., temperature, wind, or propagation geometry), sensor limitations, and as well as interference from various sources. As a consequence, advanced data analysis and careful experimental designs are essential to address these challenges and achieve reliable gas observations. To address these issues, data have been collected under controlled conditions in the Low-Speed Wind Tunnel (LST), Marknese, Netherlands. The wind tunnel setting allows studying gas propagation under stable wind conditions, thus providing quasi-stationary measurements. By using a synthetic gas source, realized with a commercial fog machine with custom fog fluid, along with a specially constructed sampling device, a high resolution sampling and sensor characterization can be realized. For gas sensing, commercially available and compact sensors like MOX (metal oxide) and PID (photo-ionization) are used; in addition, anemometers are employed to measure wind at the sampling locations. In this way a comprehensive, accurately localized ground truth gas observations are collected.

The goal of this challenge is to utilize the collected data to benchmark and advance corresponding signal processing methods for gas sensing and olfaction. Specifically, using the collected data, the participants are tasked with solving the following Gas Source Localization (GSL) problem:

Given samples of measured gas concentration values and wind at different spatial locations of the exploration area,

determine the location of the gas source, and provide the corresponding uncertainty estimate, thus quantifying localization precision. The data set provides sufficient material for developing and testing signal processing tools. Several measurement settings are used: one for training, and one for validation of the developed methods. We expect successful contributions to enhance existing data-driven or model-based techniques, or propose original, novel solutions to this GSL challenge. The proposed methods should be efficient, since robotic measurements are often ``expensive''. This implies that the designed solutions should utilize as few measurements as possible to achieve a precise and accurate source localization. The achieved accuracy and precision of GSL solution versus the number of required samples will thus be utilized as key metric to compare different algorithms.

GC-3: MEIJU-The 1st Multimodal Emotion and Intent Joint Understanding Challenge

Organized by: Rui Liu, Inner Mongolia University | Xiaofen Xing, South China University of Technology | Zheng Lian, Institute of Automation, Chinese Academy of Sciences (CASIA) | Haizhou Li, The Chinese University of Hong Kong | Bj ̈orn W. Schuller, Imperial College London | Haolin Zuo, Inner Mongolia University Challenge Website

The Multimodal Emotion and Intent Joint Understanding (MEIJU) aims to decode the semantic information expressed in the multimodal dialogue while inferring the emotions and intents, enhancing human-machine interactions. Unlike traditional tasks of emotion recognition and intent recognition, the MEIJU task faces unique challenges. Multimodal dialogues encompass diverse data types like speech, text, and images, necessitating effective integration and modeling for comprehensive user understanding and emotional insight. Moreover, the intricate interplay between emotions and intents poses complexities; emotions expressed by speakers convey specific intents, which are responded to in an empathetic manner. Capturing and modeling the complex relationships between emotions and intents in the model is an urgent problem to be addressed. We have designed two tracks to address the major challenges faced in real life. We warmly welcome researchers from academia and industry to participate and jointly explore reliable solutions for these challenging scenarios.

GC-4: Low-field MR Image Quality Challenge

Organized by: Sairam Geethanath, Ph.D.| Mathews Jacob, Ph.D. | Hayit Greenspan, Ph.D. Challenge Website

The recent resurgence of portable, very low-field MRI systems to improve accessibility is constrained by image quality chall enges compared to clinical scanners. The advent of machine learning methods provides a unique opportunity to address gaps in image resolution and signal-to-noise ratio (SNR). As very-low-field MRI gains the attention of the wider research community, this timely grand challenge invites participants from complementary backgrounds in signal processing, MRI analysis, machine learning, and related fields to break ground on this important topic of image quality. The challenge starts in Sep, 2024 and will last until Jun, 2025.

GC-5: EEG-Music Emotion Recognition Grand Challenge

Organized by: Salvatore Calcagno, University of Catania | Simone Carnemolla, University of Catania | Isaak Kavasidis, University of Catania | Simone Palazzo, University of Catania | Daniela Giordano, University of Catania | Concetto Spampinato, University of Catania Challenge Website

Emotions are central to human decision-making and interpersonal relationships, yet much remains to be understood about their underlying mechanisms. Music, with its profound impact on human emotions, provides a unique context to explore the correlation between neural signals and emotional responses. The EEG-Music Emotion Recognition Challenge aims to leverage electroencephalography (EEG) to decode emotional states from brain signals while subjects listen to music. This initiative seeks to uncover the intricate relationship between neural activity and emotional responses, offering insights for detecting and treating affective disorders, as well as advancing adaptive user interfaces. To this aim we provide a comprehensive EEG dataset containing data from 34 subjects who each listened to an average of 24 minutes of musical stimuli, resulting in approximately 12 hours of data. We propose two tasks:
Task 1: Person Identification. Given a segment of EEG, identify the subject from whom the EEG was recorded.
Task 2: Emotion Recognition. Given a segment of EEG, classify the emotional state of the subject while listening to the musical stimulus.

GC-6: The Prediction and Recognition Of Cognitive declinE through Spontaneous Speech (PROCESS) Signal Processing Grand Challenge

Organized by: Heidi Christensen, University of Sheffield | Simon Bell, University of Sheffield | Daniel Blackburn, Sheffield Institute for Translational Neuroscience | Bahman Mirheidari, University of Sheffield | Madhurananda Pahar, University of Sheffield | Fuxiang Tao, University of Sheffield | Dorota Braun, CognoSpeak™ | Hend ElGhazaly, University of Sheffield | Caitlin Illingworth, University of Sheffield | Ronan O'Malley | Fritz Peters, University of Sheffield | Saturnino Luz, University of Edinburgh's Medical School | Fasih Haider, University of Edinburgh Challenge Website

The Prediction and Recognition Of Cognitive Decline through Spontaneous Speech (PROCESS) Signal Processing Grand Challenge, initiated by the University of Sheffield and the University of Edinburgh, focuses on the detection of dementia through speech analysis. This challenge continues the trajectory of past challenges like ADReSS and ADReSSo but introduces a previously unreleased corpus specifically designed for the early detection of cognitive impairments, including Alzheimer's disease and Mild Cognitive Impairment (MCI). The PROCESS challenge is structured around two primary tasks: a classification task aimed at diagnosing healthy, MCI, and dementia states from speech signals, and a regression task focused on predicting cognitive decline through MMSE. Participants are provided with training and development sets, encouraging the development of innovative signal processing and machine learning techniques to address these tasks. The released PROCESS corpus will contain the speech collected in response to three types of elicitation tasks: "Cookie Theft" picture description as well as semantic and phonemic fluency. The challenge aims to align closely with real-world diagnostic criteria, providing a robust dataset for advancing dementia detection research.

GC-7: The First VoicePrivacy Attacker Challenge

Organized by: Natalia Tomashenko - Inria, France | Xiaoxiao Miao - Singapore Institute of Technology, Singapore | Emmanuel Vincent - Inria, France | Junichi Yamagishi - NII, Japan Challenge Website

The First VoicePrivacy Attacker Challenge is a new kind of challenge organized as part of the VoicePrivacy initiative. It focuses on developing attacker systems against voice anonymization, which will be evaluated against state-of-the-art anonymization systems including some submitted to the VoicePrivacy 2024 Challenge. Training, development, and evaluation datasets are provided along with baseline attacker systems. To develop attacker systems, the Challenge participants can use any additional training data and models, provided that they are openly available and declared before the specified deadline. Participants should develop their attacker systems in the form of automatic speaker verification systems and submit their scores on the test and development data to the organizers. The metric for evaluation is equal error rate (EER).

GC-8: Accelerometer-Based Person-in-Bed Detection

Organized by: Lauren Mentzer, Analog Devices | Ravi Kiran Raman, Analog Devices Challenge Website

Modern smart beds are equipped with sensors and tools to estimate user vitals, such as heart rate and respiration rate, and insights on sleep quality. One way to monitor the user in bed is to measure the movements in the bed using an accelerometer. When subjects lie on a smart bed with an accelerometer integrated within the mattress, the expansion and contraction of their chest from breathing induces a tilt in the mattress, which is measured by the accelerometer. As their heart pumps blood, the recoil forces generated by the heart from each pump can also be detected.

An integral challenge of accelerometer-only based smart beds is determining when to estimate user vitals; we only want to do so when someone is in the bed. It is important to distinguish ambient noise arising from other vibrations of the bed from the motions arising when a user is lying on the bed. We have collected a dataset using the ADXL355, (an ultra-low noise, 3-axis accelerometer), placed in between a mattress and mattress topper. The data is collected with various users lying in bed and in the empty bed, in the presence of various kinds of disturbances on and off bed. We invite participants to tackle two tasks:
1) Classification of pre-chunked accelerometer signals as "in_bed" or "not_in_bed" and
2) Streaming-based classification that minimizes latency while maximizing accuracy.

GC-9: Multilingual streaming TTS with neural codecs for Indian languages

Organized by: Philipp Olbrich, GIZ | Mark Hasegawa-Johnson, University of Illinois | Hema A Murthy, IITM | Pranaw Kumar, CDAC | Shinji Watanabe, Carnegie Mellon University | Sheng Zhao, Microsoft Azure Challenge Website

This challenge aims to accelerate progress in multilingual Text-to-Speech Synthesis (TTS), specifically by focusing on streaming TTS models and the use of audio codecs. This is the continuation of the LIMMITS challenges, with LIMMITS 23 and LIMMITS 24 being part of ICASSP SPGC 23 and 24 respectively. As a part of this year’s challenge, participants will be building TTS systems for 4 Indian languages - Gujarati, Indian English, Bhojpuri and Kannada, with 2 speakers per language. TTS corpora in these languages are being built as a part of the SYSPIN project. We present an opportunity for researchers to contribute towards the development of streaming and neural codec-based TTS systems. Recent developments and popularity in Conversational AI models create demand for real-time, multilingual, and adaptable speech generation. For applications such as Large Language Models (LLMs), low latency, streaming TTS systems are required. Further, recent neural codec-based TTS systems have obtained SOTA performance. These codecs offer compact representations of speech that enable efficient transmission and storage. Additionally, various speech attributes can be encoded in neural codecs allowing high quality and controllable speech synthesis.