ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition

ELGAR: Expressive CeLlo Performance Motion Generation for Audio Rendition

Zhiping Qiu^1,2 Yitong Jin^1,2 Yuan Wang² Yi Shi^1,2 Chongwu Wang²
Chao Tan³ Xiaobing Li² Feng Yu² Tao Yu^1,✉ Qionghai Dai^1,✉

¹Tsinghua University ²Central Conservatory of Music ³Weilan Tech, Beijing

^✉Corresponding Author

The art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we propose Expressive ceLlo performance motion Generation for Audio Rendition (ELGAR), a state-of-the-art diffusion-based framework for whole-body fine-grained instrument performance motion generation solely from audio. To emphasize the interactive nature of the instrument performance, we introduce Hand Interactive Contact Loss (HICL) and Bow Interactive Contact Loss (BICL), which effectively guarantee the authenticity of the interplay. Moreover, to better evaluate whether the generated motions align with the semantic context of the music audio, we design novel metrics specifically for string instrument performance motion generation, including finger-contact distance, bow-string distance, and bowing score. Extensive evaluations and ablation studies are conducted to validate the efficacy of the proposed methods. In addition, we put forward a motion generation dataset SPD-GEN, collated and normalized from the MoCap dataset SPD. As demonstrated, ELGAR has shown great potential in generating instrument performance motions with complicated and fast interactions, which will promote further development in areas such as animation, music education, interactive art creation, etc.

BibTeX

@article{qiu2025elgar, title={ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition}, author={Qiu, Zhiping and Jin, Yitong and Wang, Yuan and Shi, Yi and Wang, Chongwu and Tan, Chao and Li, Xiaobing and Yu, Feng and Yu, Tao and Dai, Qionghai}, journal={arXiv e-prints}, pages={arXiv--2505}, year={2025} }

@inproceedings{10.1145/3721238.3730756, author = {Qiu, Zhiping and Jin, Yitong and Wang, Yuan and Shi, Yi and Tan, Chao and Wang, Chongwu and Li, Xiaobing and Yu, Feng and Yu, Tao and Dai, Qionghai}, title = {ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition}, year = {2025}, isbn = {9798400715402}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3721238.3730756}, doi = {10.1145/3721238.3730756}, booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers}, articleno = {54}, numpages = {9}, keywords = {Motion Generation, Musical Instrument Performance}, series = {SIGGRAPH Conference Papers '25} }

ELGAR: Expressive CeLlo Performance Motion Generation for Audio Rendition

Abstract

Video

Play in Different Tempos

Test Set Sample

In-the-wild Sample

Retargeting

BibTeX