S7 – FUN-Media – RESTART Foundation

FUN-Media will enable next-generation immersive networked media communications, ensuring the expected QoE, allowing for empathic communications, providing real feel of presence, ensuring the expected content and user authenticity. This is achieved via technology advances in the field of digital twins, multimodal and multisense communications, audio/acoustic user interaction, QoE-aware distribution of trustworthy contents, media generation and representations for humans and machines.

FUN-Media is part of Spoke 4 – Programmable Networks for Future Services and Media

Project PI: Enrico Magli

Main activities carried out to date

Technical advances have been made in several areas, including:

project management and purchases for the Spoke Lab
adaptive metronome algorithms and packet loss concealment for mitigating the impact of latency
methods for detecting audio manipulation
study of the impact of compression and transmission artifacts on dynamic and dense point clouds with subjective tests to explore the users’ QoE with varying combinations of degradations (compression and packet loss)
QoE-aware motion control of a swarm of drones for video surveillance
study of the effect of the adoption of augmented and virtual reality on the quality perceived by the user
learning-based viewport prediction
learning-based compression schemes based on diffusion models
methods for network sparsification and quantization
compression of point clouds and light fields
an approach to asynchronous federated continual learning.

Most significant results

The project has already generated several practical outcomes, many of which have been consolidated in scientific publications.

These includes:

a content-aware compression and transmission method for automotive Lidar data
a continual learning method for semantic image segmentation
methods for detection of synthetic and manipulated speech
a method for deepfake detection
a method for viewport prediction
a federated continual learning method
a study on the impact of VR on user attention.
stress assessment for AR based on head movements
identification of the leading sensory cue in mulsemedia VR
a VR dataset for network and QoE studies
an aerial multimodal dataset with network measurements and perception data.

Several of these methods are expected to lead to technologies exploitable by the industry during the course of the project, as the related use cases have been chosen in such a way as to be relevant for the market.

Papers:
A. Ferrarotti, S. Baldoni, M. Carli, F. Battisti, "Stress Assessment for Augmented Reality Applications based on Head Movement Features", IEEE Transactions on Visualization and Computer Graphics, 2024

Federico Miotello, Mirco Pezzoli, Luca Comanducci, Fabio Antonacci, Augusto Sarti, "Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks", IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023

Daniele Ugo Leonzio, Luca Cuccovillo, Paolo Bestagini, Marco Marcon, Patrick Aichroth, Stefano Tubaro, "Audio Splicing Detection and Localization Based on Acquisition Device Traces", IEEE Transactions on Information Forensics and Security, 2023

Project innovations

The project develops several technologies that can be the subject of industry collaboration and exploitation. We currently have two active lines of collaboration with industry. One of the project partners, Wind3, provides the business point of view about WP3 activities and WP4 scope, and highlights synergies between different network elements. Moreover, with the cascade calls we have a new industry partner (Xenia Progetti) which will help defining and demonstrating a use case of employment of digital twins for networked music performance.

Project partners

Program partners:

Cascade calls partners:

Highlights

Recent research is related to restoration of audio signals with missing parts. We have developed methods based on artificial intelligence to recover those parts and provide near-perfect playback quality. We have also addressed the problem of audio splicing, where a malicious user might replace parts of an audio with other ones by the same speaker, altering the meaning of what is being said; Ai can tell us if manipulations have been applied.

Project Progress: KPIs

Publications
Total number of publications (including journals and conference papers):
Expected: 36
Accomplished: 15
Readiness: 42%

Joint publications
(at least 30% of total number of publications)
Expected: 12
Accomplished: 2
Readiness: 17%

Talk, dissemination and outreach activities
(does not include conference presentations)
Expected: 9
Accomplished: 4
Readiness: 44%

Innovations
Expected: 10 items
Accomplished: 2 items submitted to mission 7
Readiness: 20%

Demo/PoC
Expected: 5 PoCs by the end of the project
Accomplished: 0
Readiness: 0% (work according to plan, as demo/PoCs are expected starting from the second year of the project).

Project Progress: Milestones

M1.1 First release of exploitation, dissemination and impact
Expected M12
Accomplished M12
Readiness 100%

M1.2 Second release of exploitation, dissemination and impact monitoring monitoring
Expected M24
Accomplished M12
Readiness 50%

M1.3 Third release of exploitation, dissemination and impact monitoring
Expected M36
Accomplished M12
Readiness 33%

M3.1 First release of audio and acoustic signal processing system
Expected M12
Accomplished M12
Readiness 100%

M3.2 Advanced release of audio and acoustic signal processing system
Expected M24
Accomplished M12
Readiness 50%

M3.3 Release of proof-of-concept of audio and acoustic signal processing system
Expected M36
Accomplished M12
Readiness 33%

M4.1 First release of experience-aware distribution system for authentic contents
Expected M12
Accomplished M12
Readiness 100%

M4.2 Advanced release of experience-aware distribution system for authentic contents
Expected M24
Accomplished M12
Readiness 50%

M4.3 Release of proof-of-concept of experience-aware distribution system for authentic contents
Expected M36
Accomplished M12
Readiness 33%

M6.1 First release of innovative media generation and representation system
Expected M12
Accomplished M12
Readiness 100%

M6.2 Advanced release of innovative media generation and representation system
Expected M24
Accomplished M12
Readiness 50%

M6.3 Release of proof-of-concept of innovative media generation and representation system
Expected M36
Accomplished M12
Readiness 33%

Researchers involved: The project has an estimated number of person/month for every year roughly equal to 144, estimated as 5 RTD-A researchers, 5 PhD students plus 2 full-time equivalent faculty staff. This does not include partners from the cascade calls.

Collaboration proposals:

Provisional list (contact project PI for more info):

a collaboration on networked music performance, which allows musicians to collaborate and perform together in real-time, transcending geographical boundaries. The objective is to develop a more seamless and engaging collaborative musical experience;
a collaboration on efficient viewport-based algorithms for omnidirectional video streaming systems, employing machine learning methods and taking advantage of saliency information;
a collaboration on deepfake detection models for visual information employing deep neural networks;
a collaboration on neural radiance fields and Gaussian splatting for scene rendering;
a collaboration on low-complexity (e.g. binary) neural networks for inference and compression on embedded devices;

For any proposal of collaboration within the project please contact the project PI.

FUN-Media News:

RESTART cascade calls: results of Spoke 4 calls have been published