Essentia TensorFlow Models for Audio and Music Processing on the Web

Albin Correya, Pablo Alonso-Jiménez, Jorge Marcos-Fernández, Xavier Serra, Dmitry Bogdanov

Recent advances in web-based machine learning (ML) tools empower a wide range of application developers in both industrial and creative contexts. The availability of pre-trained ML models and JavaScript (JS) APIs in frameworks like TensorFlow.js enabled developers to use AI technologies without demanding domain expertise. Nevertheless, there is a lack of pre-trained models in web audio compared to other domains, such as text and image analysis. Motivated by this, we present a collection of open pre-trained TensorFlow.js models for music-related tasks on the Web. Our models currently allow for different types of music classification (e.g., genres, moods, danceability, voice or instrumentation), tempo estimation, and music feature embeddings. To facilitate their use, we provide a dedicated JS add-on module essentia.js-model within the Essentia.js library for audio and music analysis. It has a simple API, enabling end-to-end analysis from audio input to prediction results on web browsers and Node.js. Along with the Web Audio API and web workers, it can also be used to build real-time applications. We provide usage examples, discuss possible use-cases, and report benchmarking results.

            
@inproceedings{2021_36,
  abstract = {Recent advances in web-based machine learning (ML) tools empower a wide range of application developers in both industrial and creative contexts. The availability of pre-trained ML models and JavaScript (JS) APIs in frameworks like TensorFlow.js enabled developers to use AI technologies without demanding domain expertise. Nevertheless, there is a lack of pre-trained models in web audio compared to other domains, such as text and image analysis. Motivated by this, we present a collection of open pre-trained TensorFlow.js models for music-related tasks on the Web. Our models currently allow for different types of music classification (e.g., genres, moods, danceability, voice or instrumentation), tempo estimation, and music feature embeddings. To facilitate their use, we provide a dedicated JS add-on module essentia.js-model within the Essentia.js library for audio and music analysis. It has a simple API, enabling end-to-end analysis from audio input to prediction results on web browsers and Node.js. Along with the Web Audio API and web workers, it can also be used to build real-time applications. We provide usage examples, discuss possible use-cases, and report benchmarking results.},
  address = {Barcelona, Spain},
  author = {Correya, Albin and Alonso-Jiménez, Pablo and Marcos-Fernández, Jorge and Serra, Xavier and Bogdanov, Dmitry},
  booktitle = {Proceedings of the International Web Audio Conference},
  editor = {Joglar-Ongay, Luis and Serra, Xavier and Font, Frederic and Tovstogan, Philip and Stolfi, Ariane and A. Correya, Albin and Ramires, Antonio and Bogdanov, Dmitry and Faraldo, Angel and Favory, Xavier},
  month = {July},
  pages = {},
  publisher = {UPF},
  series = {WAC '21},
  title = {Essentia TensorFlow Models for Audio and Music Processing on the Web},
  year = {2021},
  ISSN = {2663-5844}
}

Download PDF Watch video