
Microsoft Muse Can Design Video Games Based on Your Playing Style
Last Updated on February 28, 2025 by Editorial Team
Author(s): Jesus Rodriguez
Originally published on Towards AI.
Microsoft Muse Can Design Video Games Based on Your Playing Style
I recently started an AI-focused educational newsletter, that already has over 175,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:
TheSequence | Jesus Rodriguez | Substack
The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and dataβ¦
thesequence.substack.com
Games have played a monumental role in the evolution of AI. From creating training environments to simulating real world conditions, games represent incredible catalyzers on AI learning. A new field known as world action models is rapidly emerging as a field to combine games and AI. Microsoft just dropped an ecising research in this area with a model that can create games after watching human players.
Sounds crazy? Letβs discuss.
Muse, a generative AI model, marks a pivotal advancement in the convergence of artificial intelligence and video games. This model, developed by the Microsoft Research Game Intelligence and Teachable AI Experiences teams in collaboration with Xbox Game Studiosβ Ninja Theory, introduces the first World and Human Action Model (WHAM), designed to generate game visuals and controller actions. Muse aims to support human creativity by generating complex gameplay sequences. This essay provides a technical overview of Muse, emphasizing its architecture, capabilities, and key innovations.
Architectural Overview
Muse employs a transformer-based generative model trained on extensive human gameplay data. The model utilizes visuals and controller actions from the Xbox game Bleeding Edge, training current instances at a resolution of 300×180 pixels. The WHAM-1.6B instance of Muse has been trained using over 1 billion images and controller actions, which corresponds to more than 7 years of continuous human gameplay. The foundation of Muse relies on ethically sourced and responsibly used data, ensuring compliance with user agreements and privacy standards.
Capabilities of Muse
- Gameplay Generation: Muse generates complex gameplay sequences that maintain consistency for several minutes. By prompting the model with an initial 10 frames (1 second) of human gameplay and corresponding controller actions, Muse predicts subsequent game evolution in βworld model modeβ.
- Consistency: Muse ensures that generated gameplay sequences respect the inherent dynamics of the game. The generated sequences align character movements with controller actions, prevent characters from traversing walls, and generally adhere to the gameβs physics. Evaluation of consistency involves prompting the model with ground truth gameplay sequences and controller actions. The generated game visuals are then compared to the ground truth visuals using FrΓ©chet Video Distance (FVD), a metric established in the video generation community.
- Diversity: Muse generates a range of gameplay variants from identical initial prompts, covering a spectrum of potential gameplay evolutions. This includes both behavioral diversity, such as varied camera movements and path navigation, and visual diversity, including different character hoverboards. Diversity is quantitatively assessed using the Wasserstein distance, which compares model-generated sequences to the diversity found in human gameplay recordings.
- Persistency: Muse integrates user modifications into the generated gameplay sequences. For example, if a character is added to an original game visual, Muse can βpersistβ the added character and generate plausible scenarios showing how the gameplay sequence evolves from that modified starting point.
Key Innovations of Muse
- World and Human Action Model (WHAM): Muse introduces WHAM, a generative AI model capable of generating both game visuals and controller actions, representing a novel approach to modeling video game environments and human interactions.
- Data-Driven Approach: Muse uses a substantial dataset of human gameplay data from Bleeding Edge, enabling the model to learn complex game dynamics and generate realistic gameplay sequences. The model was trained on more than 1 billion images and controller actions, corresponding to over 7 years of continuous human gameplay.
- Multidisciplinary Collaboration: The development of Muse involved machine learning researchers, game developers, and creatives, ensuring the modelβs capabilities align with the needs of game creatives and ethical, responsible technology development. Input from game creators early in the process helped shape model capabilities.
- WHAM Demonstrator: The WHAM Demonstrator offers a visual interface for interacting with Muse, allowing users to load visuals as initial prompts and generate multiple potential continuations. Users can also adjust generated sequences using game controllers, facilitating iterative creative processes. The WHAM Demonstrator enables users to directly interface with the model, explore its creative potential, and test ideas.
- Evaluation Protocols: Museβs development includes evaluation protocols for consistency, diversity, and persistency, facilitating systematic performance evaluation and providing insights for enhancing capabilities. Museβs evaluation framework and user study insights allowed for the identification of key capabilities required by game creatives.
Evaluation of Muse
Museβs evaluation focuses on consistency, diversity, and persistency.
- Consistency: Muse generates gameplay sequences using ground truth gameplay sequences and controller actions, with generated game visuals compared to ground truth visuals using FrΓ©chet Video Distance (FVD).
- Diversity: Assessed quantitatively using the Wasserstein distance, comparing model-generated sequences to human gameplay recordings.
- Persistency: Demonstrated through modified gameplay sequences and observation of the modelβs integration of newly introduced elements.
Impact and Future Directions
Muse signifies a significant advancement in utilizing AI for gameplay ideation. By open-sourcing weights and sample data and offering the WHAM Demonstrator executable, Microsoft promotes further exploration and development in this domain.
Conclusion
Muse, the first WHAM, showcases generative AI modelsβ potential in supporting gameplay ideation. Museβs architecture, grounded in transformer networks and trained on extensive human gameplay data, enables the generation of consistent, diverse, and persistent gameplay sequences. The projectβs multidisciplinary approach and rigorous evaluation protocols underscore its importance. By making Muse accessible to the community, Microsoft fosters innovation and enhances the understanding of generative AI in creating novel, AI-driven game experiences.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI