2024 S.T. Yau High School Science
Award
School: Bancroft School
City, Country: Worcester, USA
Name of supervising teacher: Yuxuan Zhou
AI Choreographer is a deep learning model that is able to generate dance motions according to music easily. However, several difficulties and weaknesses in the model still make it difficult to use. For example, the model generates realistic motions, but sometimes the motions are repetitive or do not respond to the audio correctly. Also, the model does not have a usable render that allows it to directly animate the provided models with generated data. We improved our base model to generate more realistic and better dance motions, and we also created a usable automated render pipeline to directly render the generated motions into an animation of the human models provided by the user. We improved the generation quality by introducing more audio features into the model so that the models can utilize more features for better results. Also, we overcame different difficulties in the rendering process, including applying the AI-generated numpy motion data to provided SMPL models and converting the animated SMPL models into usable FBX models. In general, the improved model generates motions that are more diverse and realistic than the base model, which provides dance motions that have higher quality.
Transformer Model
SMPL Model
Music Features
The Visual Model of the Transformer Model
Joints used by the converter to incorporate the
generated motions into FBX models.
The corresponding joint and the numbering are listed on the left, and the positions of each joint are presented on the right
Music Feature Extraction
Dance Pose Generation
3D Animation Rendering
The main processes of Dance Poses Generation. Rotation angles and
extracted audio features are input into the model to generate new dance
motions, and the generated dance motion in the form of NPY file will be
fed into the automated rendering pipeline for model rendering. The
automated rendering pipeline will apply the generated motion onto an FBX
model and render inside Blender
Motion Comparison between the base model (right)
and our improved model (left) on the same frame
Motion Comparison between the base model (right)
and our improved model (left) on the same frame
Motion Comparison between the base model (left)
and our improved model (right) on the same frame
Note: The realism and quality of generation can be measured by FID scores. Lower FID scores represent more realistic motion. The data of our base model and other models are from the paper of the base model.
Vaswani, Ashish, et al. “Attention is all you need.” Advances in Neural Information Processing Systems 30 (2017).
Li, Ruilong, et al. “AI Choreographer: Music Conditioned 3D Dance Generation with AIST++.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Li, Ruilong, Shan Yang, David A. Ross, Angjoo Kanazawa. AI Choreographer: Music Conditioned 3D Dance Generation with AIST++. ICCV, 2021.
McFee, Brian, et al. “librosa: Audio and Music Signal Analysis in Python.” SciPy (2015).
Loper, Matthew, et al. “SMPL: A skinned multi-person linear model.” Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 2023. 851–866.
Li, Jiaman, et al. “Learning to generate diverse dance motions with transformer.” arXiv preprint arXiv:2008.08171, 2020.
Zhuang, Wenlin, et al. “Music2Dance: Music-driven dance generation using WaveNet.” arXiv preprint arXiv:2002.03761, 2020.
Huang, Ruozi, et al. “Dance Revolution: Long-term dance generation with music via curriculum learning.” International Conference on Learning Representations, 2021.