FUN-Media will enable next-generation immersive networked media communications, ensuring the expected QoE, allowing for empathic communications, providing real feel of presence, ensuring the expected content and user authenticity. This is achieved via technology advances in the field of digital twins, multimodal and multisense communications, audio/acoustic user interaction, QoE-aware distribution of trustworthy contents, media generation and representations for humans and machines.

FUN-Media is part of Spoke 4 – Programmable Networks for Future Services and Media 

Project PI: Enrico Magli

Technical advances have been made in several areas, including:
  • project management and purchases for the Spoke Lab
  • adaptive metronome algorithms and packet loss concealment for mitigating the impact of latency
  • methods for detecting audio manipulation
  • study of the impact of compression and transmission artifacts on dynamic and dense point clouds with subjective tests to explore the users’ QoE with varying combinations of degradations (compression and packet loss)
  • QoE-aware motion control of a swarm of drones for video surveillance
  • study of the effect of the adoption of augmented and virtual reality on the quality perceived by the user
  • learning-based viewport prediction
  • learning-based compression schemes based on diffusion models
  • methods for network sparsification and quantization
  • compression of point clouds and light fields
  • an approach to asynchronous federated continual learning.
The project has already generated several practical outcomes, many of which have been consolidated in scientific publications.

These includes:
  • a content-aware compression and transmission method for automotive Lidar data
  • a continual learning method for semantic image segmentation
  • methods for detection of synthetic and manipulated speech
  • a method for deepfake detection
  • a method for viewport prediction
  • a federated continual learning method
  • a study on the impact of VR on user attention.
  • stress assessment for AR based on head movements
  • identification of the leading sensory cue in mulsemedia VR
  • a VR dataset for network and QoE studies
  • an aerial multimodal dataset with network measurements and perception data.

Several of these methods are expected to lead to technologies exploitable by the industry during the course of the project, as the related use cases have been chosen in such a way as to be relevant for the market.

Papers:
A. Ferrarotti, S. Baldoni, M. Carli, F. Battisti, "Stress Assessment for Augmented Reality Applications based on Head Movement Features",  IEEE Transactions on Visualization and Computer Graphics, 2024 

Federico Miotello, Mirco Pezzoli, Luca Comanducci, Fabio Antonacci, Augusto Sarti, "Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks",  IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 

Daniele Ugo Leonzio, Luca Cuccovillo, Paolo Bestagini, Marco Marcon, Patrick Aichroth, Stefano Tubaro, "Audio Splicing Detection and Localization Based on Acquisition Device Traces", IEEE Transactions on Information Forensics and Security, 2023 
The project develops several technologies that can be the subject of industry collaboration and exploitation. We currently have two active lines of collaboration with industry. One of the project partners, Wind3, provides the business point of view about WP3 activities and WP4 scope, and highlights synergies between different network elements. Moreover, with the cascade calls we have a new industry partner (Xenia Progetti) which will help defining and demonstrating a use case of employment of digital twins for networked music performance.
Recent research is related to  restoration of audio signals with missing parts. We have developed methods based on artificial intelligence to recover those parts and provide near-perfect  playback quality. We have also addressed the problem of audio splicing, where a malicious user might replace parts of an audio with other ones by the same speaker, altering the meaning of what is being said; Ai can tell us if manipulations have been applied.
  • Publications
    Total number of publications (including journals and conference papers):
    Expected: 36
    Accomplished: 15
    Readiness: 42%

  • Joint publications
    (at least 30% of total number of publications)

    Expected: 12
    Accomplished: 2
    Readiness: 17%

  • Talk, dissemination and outreach activities
    (does not include conference presentations)

    Expected: 9
    Accomplished: 4
    Readiness: 44%

  • Innovations
    Expected: 10 items
    Accomplished: 2 items submitted to mission 7
    Readiness: 20%

  • Demo/PoC
    Expected: 5 PoCs by the end of the project
    Accomplished: 0
    Readiness: 0% (work according to plan, as demo/PoCs are expected starting from the second year of the project).
  • M1.1 First release of exploitation, dissemination and impact
    Expected M12
    Accomplished M12
    Readiness 100%

  • M1.2 Second release of exploitation, dissemination and impact monitoring monitoring
    Expected M24
    Accomplished M12
    Readiness 50%

  • M1.3 Third release of exploitation, dissemination and impact monitoring
    Expected M36
    Accomplished M12
    Readiness 33%

  • M3.1 First release of audio and acoustic signal processing system
    Expected M12
    Accomplished M12
    Readiness 100%

  • M3.2 Advanced release of audio and acoustic signal processing system
    Expected M24
    Accomplished M12
    Readiness 50%

  • M3.3 Release of proof-of-concept of audio and acoustic signal processing system
    Expected M36
    Accomplished M12
    Readiness 33%

  • M4.1 First release of experience-aware distribution system for authentic contents
    Expected M12
    Accomplished M12
    Readiness 100%

  • M4.2 Advanced release of experience-aware distribution system for authentic contents
    Expected M24
    Accomplished M12
    Readiness 50%

  • M4.3 Release of proof-of-concept of experience-aware distribution system for authentic contents
    Expected M36
    Accomplished M12
    Readiness 33%

  • M6.1 First release of innovative media generation and representation system
    Expected M12
    Accomplished M12
    Readiness 100%

  • M6.2 Advanced release of innovative media generation and representation system
    Expected M24
    Accomplished M12
    Readiness 50%

  • M6.3 Release of proof-of-concept of innovative media generation and representation system
    Expected M36
    Accomplished M12
    Readiness 33%

Researchers involved: The project has an estimated number of person/month for every year roughly equal to 144, estimated as 5 RTD-A researchers, 5 PhD students plus 2 full-time equivalent faculty staff. This does not include partners from the cascade calls.

Collaboration proposals:

Provisional list (contact project PI for more info):

  • a collaboration on networked music performance, which allows musicians to collaborate and perform together in real-time, transcending geographical boundaries. The objective is to develop a more seamless and engaging collaborative musical experience;
  • a collaboration on efficient viewport-based algorithms for omnidirectional video streaming systems, employing machine learning methods and taking advantage of saliency information;
  • a collaboration on deepfake detection models for visual information employing deep neural networks;
  • a collaboration on neural radiance fields and Gaussian splatting for scene rendering;
  • a collaboration on low-complexity (e.g. binary) neural networks for inference and compression on embedded devices;

For any proposal of collaboration within the project please contact the project PI.


FUN-Media News: