Microsoft Patent | Real-time remodeling of user voice in an immersive visualization system

Patent: Real-time remodeling of user voice in an immersive visualization system

Publication Number: 10176820

Publication Date: 2019-01-08

Applicants: Microsoft


A visualization system with audio capability includes one or more display devices, one or more microphones, one or more speakers, and audio processing circuitry. While a display device displays an image to a user, a microphone inputs an utterance of the user, or a sound from the user’s environment, and provides it to the audio processing circuitry. The audio processing circuitry processes the utterance (or other sound) in real-time to add an audio effect associated with the image to increase realism, and outputs the processed utterance (or other sound) to the user via the speaker in real-time, with very low latency.


As virtual reality (VR) and augmented reality (AR) technology matures, VR and AR visualization systems are starting to be introduced into the mainstream consumer electronics marketplace. AR Head Mounted Display (HMD) devices (“AR-HMD devices”) are one promising application of such technology. These devices may include transparent display elements that enable a user to see concurrently both the real world around them and virtual content that is generated and displayed by the device. Virtual content that appears to be superimposed over a real-world view is commonly referred to as AR content.

VR and AR visualization systems can provide users with entertaining, immersive virtual environments in which they can visually and audibly experience things they might not normally experience in real life. In such environments, however, the perceived realism of the environment may be degraded if a user speaks or issues voice commands and the user’s voice does not sound consistent with what the user sees, including the displayed virtual content.


The technique introduced here includes an audio processing method by which an AR or VR visualization system can produce sound that is more consistent with the displayed imagery that the user sees, and which is therefore more realistic. In certain embodiments of the method, an HMD device displays an image of a physical thing to a user of the device, over a real world view of an environment of the user. The HMD device then inputs voice of the user via a microphone while the image is being displayed to the user, dynamically processes user voice data in real-time to incorporate an audio effect corresponding to the physical thing while the image is still being displayed, and then outputs in real-time, via a speaker, sound representing the voice of the user as affected by the physical thing, based on the dynamically modified user voice data, while the image is being displayed to the user. Other aspects of the technique will be apparent from the accompanying figures and detailed description.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.


电子邮件地址不会被公开。 必填项已用*标注