Project Status

Our project’s goal is to implement a speech-to-command application for Minecraft that allows the user to be able to play the game using just voice commands. The goals that the current iteration of our project can accomplish are the speech recognition portion, where we are able to convert a recorded speech into text format. Then we are able to take this text format and parse it using natural language processing, which is the focus of our project, in order for it to be interpreted by the Malmo command framework. Using Malmo, we are able to take these parsed commands and move the agent according to the vocal command.

Approach

Speech Recognition

Natural Language Processing

Malmo Commands

Evaluation

Our project is currently performing well under the expectations placed on this project for this stage in development. We are able to perform basic speech commands and have the agent reflect those commands accurately. In addition, we accomplished the ‘go to (object)’ command in time for this stage of our project development. We still have many more ideas and commands that we would like to implement in order to make it work cleaner and be more user-friendly, but we are confident in our progress so far, and some features are currently in testing, but not yet ready for a demo.

Usage examples:

Chase a target with object avoidance:

alt text alt text alt text

Use tool and smoothly dig ground:

alt text alt text alt text

Remaining Goals and Challenges

Our project prototype is currently limited by the number of commands that it is currently able to interpret, as well as the ability for the agent to perform like a human. For improving what we have currently, certain features, like the obstacle avoidance feature, can be improved and further tested, but is working relatively well in our customized testing environment.

For the final report, we would like to implement many more commands, such as “move until hitting a wall” or “attack the zombie”. We would also like to implement more AI algorithms in order to make the agent’s movement and actions more fluid in a “real-world” Minecraft environment. One last thing we would like to add is word-vector similarity scoring, in order to incorporate a larger span of possible user inputted commands.

Given our experience so far, some of the challenges we face for completing our goals for the final report include: adequately parsing more advanced sentences, accurately reflecting some of these more advanced actions in Malmo, implementing any AI qualities in the agent for more advanced movements (“attack the zombie and come back”), and properly executing user-friendly functions (queueing up commands or overriding current commands).