Microsoft Patent | Directional Propagation

Patent: Directional Propagation

Publication Number: 10602298

Publication Date: 20200324

Applicants: Microsoft

Abstract

The description relates to parametric directional propagation for sound modeling and rendering. One implementation includes receiving virtual reality space data corresponding to a virtual reality space. The implementation can include using the virtual reality space data to simulate directional impulse responses for initial sounds emanating from multiple moving sound sources and arriving at multiple moving listeners. The implementation can include using the virtual reality space data to simulate directional impulse responses for sound reflections in the virtual reality space. The directional impulse responses can be encoded and used to render sound that accounts for a geometry of the virtual reality space.

BACKGROUND

Practical modeling and rendering of real-time directional acoustic effects (e.g., sound, audio) for video games and/or virtual reality applications can be prohibitively complex. Conventional methods constrained by reasonable computational budgets have been unable to render authentic, convincing sound with true-to-life directionality of initial sounds and/or multiply-scattered sound reflections, particularly in cases with occluders (e.g., sound obstructions). Room acoustic modeling (e.g., concert hall acoustics) does not account for free movement of either sound sources or listeners. Further, sound-to-listener line of sight is usually unobstructed in such applications. Conventional real-time path tracing methods demand enormous sampling to produce smooth results, greatly exceeding reasonable computational budgets. Other methods are limited to oversimplified scenes with few occlusions, such as an outdoor space that contains only 10-20 explicitly separated objects (e.g., building facades, boulders). Some methods have attempted to account for sound directionality with moving sound sources and/or listeners, but are unable to also account for scene acoustics while working within a reasonable computational budget. Still other methods neglect sound directionality entirely. In contrast, the parametric directional propagation concepts described herein can generate convincing audio for complex video gaming and/or virtual reality scenarios while meeting a reasonable computational budget.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate implementations of the concepts conveyed in the present document. Features of the illustrated implementations can be more readily understood by reference to the following description taken in conjunction with the accompanying drawings. Like reference numbers in the various drawings are used wherever feasible to indicate like elements. In some cases, parentheticals are utilized after a reference number to distinguish like elements. Use of the reference number without the associated parenthetical is generic to the element. Further, the left-most numeral of each reference number conveys the FIG. and associated discussion where the reference number is first introduced.

FIGS. 1A-4 and 7A illustrate example parametric directional propagation environments that are consistent with some implementations of the present concepts.

FIGS. 5 and 7B-11 show example parametric directional propagation graphs and/or diagrams that are consistent with some implementations of the present concepts.

FIGS. 6 and 12 illustrate example parametric directional propagation systems that are consistent with some implementations of the present concepts.

FIGS. 13-16 are flowcharts of example parametric directional propagation methods in accordance with some implementations of the present concepts.

DETAILED DESCRIPTION

Overview

This description relates to generating convincing sound for video games, animations, and/or virtual reality scenarios. Hearing can be thought of as directional, complementing vision by detecting where (potentially unseen) sound events occur in an environment of a person. For example, standing outside a meeting hall, the person is able to locate an open door by listening for the chatter of a crowd in the meeting hall streaming through the door. By listening, the person may be able to locate the crowd (via the door) even when sight of the crowd is obscured to the person. As the person walks through the door, entering the meeting hall, the auditory scene smoothly wraps around them. Inside the door, the person is now able to resolve sound from individual members of the crowd, as their individual voices arrive at the person’s location. The directionality of the arrival of an individual voice can help the person face and/or navigate to a chosen individual.

Aside from the initial sound arrival, reflections and/or reverberations of sound are another important part of an auditory scene. For example, while reflections can envelop a listener indoors, partly open spaces may yield anisotropic reflections, which can sound different based on a direction a listener is facing. In either situation, the sound of reflections can reinforce the visual location of nearby scene geometry. For example, when a sound source and listener are close (e.g., within footsteps), a delay between arrival of the initial sound and corresponding first reflections can become audible. The delay between the initial sound and the reflections can strengthen the perception of distance to walls. The generation of convincing sound can include accurate and efficient simulation of sound diffracting around obstacles, through portals, and scattering many times. Stated another way, directionality of an initial arrival of a sound can determine a perceived direction of the sound, while the directional distribution of later arriving reflections of the sound can convey additional information about the surroundings of a listener.

Parametric directional propagation concepts can provide practical modeling and/or rendering of such complex directional acoustic effects, including movement of sound sources and/or listeners within complex scene geometries. Proper rendering of directionality of an initial sound and reflections can greatly improve the authenticity of the sound in general, and can even help the listener orient and/or navigate in a scene. Parametric directional propagation concepts can generate convincing sound for complex scenes in real-time, such as while a user is playing a video game, or while a colleague is participating in a teleconference. Additionally, parametric directional propagation concepts can generate convincing sound while staying within a practical computational budget.

Example Introductory Concepts

FIGS. 1A-5 are provided to introduce the reader to parametric directional propagation concepts. FIGS. 1A-3 collectively illustrate parametric directional propagation concepts relative to a first example parametric directional propagation environment 100. FIGS. 1A, 1B, and 3 provide views of example scenarios 102 that can occur in environment 100. FIGS. 4 and 5 illustrate further parametric directional propagation concepts.

As shown in FIGS. 1A and 1B, example environment 100 can include a sound source 104 and a listener 106. The sound source 104 can emit a pulse 108 (e.g., sound, sound event). The pulse 108 can travel along an initial sound wavefront 110 (e.g., path). Environment 100 can also have a geometry 111, which can include structures 112. In this case, the structures 112 can be walls 113, which can generally form a room 114 with a portal 116 (e.g., doorway), an area outside 118 the room 114, and at least one exterior corner 120. A location of the sound source 104 in environment 100 can be generally indicated at 122, while a location of the listener 106 is indicated at 124.

As used herein, the term geometry 111 can refer to an arrangement of structures 112 (e.g., physical objects) and/or open spaces in an environment. In some implementations, the structures 112 can cause occlusion, reflection, diffraction, and/or scattering of sound, etc. For instance, in the example of FIG. 1A, the structures 112, such as walls 113 can act as occluders that occlude (e.g., obstruct) sound. Additionally, the structures, such as walls 113 (e.g., wall surfaces) can act as reflectors that reflect sound. Some additional examples of structures that can affect sound are furniture, floors, ceilings, vegetation, rocks, hills, ground, tunnels, fences, crowds, buildings, animals, stairs, etc. Additionally, shapes (e.g., edges, uneven surfaces), materials, and/or textures of structures can affect sound. Note that structures do not have to be solid objects. For instance, structures can include water, other liquids, and/or types of air quality that might affect sound and/or sound travel.

In the example illustrated in FIG. 1A, two potential initial sound wavefronts 110A of pulse 108 are shown leaving the sound source 104 and propagating to the listener 106 at listener location 124. For instance, initial sound wavefront 110A(1) travels straight through the wall 113 toward the listener 106, while initial sound wavefront 110A(2) passes through the portal 116 before reaching the listener 106. As such, initial sound wavefronts 110A(1) and 110A(2) arrive at the listener from different directions. Initial sound wavefronts 110A(1) and 110A(2) can also be viewed as two different ways to model an initial sound arriving at listener 106. However, in environment 100, where the walls 113 act as an occluder, an initial sound arrival modeled according to the example of initial sound wavefront 110A(1) might produce less convincing sound because the sound dampening effects of the wall may diminish the sound at the listener to below that of initial sound wavefront 110A(2). Thus, a more realistic initial sound arrival might be modeled according to the example of initial sound wavefront 110A(2), arriving toward the right side of listener 106. For instance, in a virtual reality world based on scenario 102A, a person (e.g., listener) looking at a wall with a doorway to their right would likely expect to hear a sound coming from their right side, rather than through the wall. Note that this phenomena is affected by wall composition (e.g., a wall made out of a sheet of paper would not have the same sound dampening effect as a wall made out of 12 inches of concrete, for example). Parametric directional propagation concepts can be used to ensure a listener hears any given initial sound with realistic directionality, such as coming from the doorway in this instance.

In some cases, the sound source 104 can be mobile. For example, scenario 102A depicts the sound source 104 at location 122A, and scenario 102B depicts the sound source 104 at location 122B. In scenario 102B both the sound source 104 and listener are outside 118, but the sound source 104 is around the exterior corner 120 from the listener 106. Once again, the walls 113 obstruct a line of sight (and/or wavefront travel) between the listener 106 and the sound source 104. Here again a first potential initial sound wavefront 110B(1) can be a less realistic model for an initial sound arrival at listener 106, since it would pass through walls 113. Meanwhile, a second potential initial sound wavefront 110B(2) can be a more realistic model for an initial sound arrival at listener 106.

Environment 100 is shown again in FIG. 2, including the listener 106 and the walls 113. FIG. 2 depicts an example encoded directional impulse response field 200 for environment 100. The encoded directional impulse response field 200 can be composed of multiple individual encoded directional impulse responses 202, depicted as arrows in FIG. 2. Only three individual encoded directional impulse responses 202 are designated with specificity in FIG. 2 to avoid clutter on the drawing page. In this example, the encoded directional impulse response 202(1) can be related to initial sound wavefront 110A(2) from scenario 102A (FIG. 1A). Similarly, the encoded directional impulse response 202(2) can be related to initial sound wavefront 110B(2) from scenario 102B (FIG. 1B). For instance, notice that the arrow depicting encoded directional impulse response 202(1) is angled similarly to the arrival direction of initial sound wavefront 110A(2) at the listener 106 in FIG. 1A. Similarly, the arrow depicting encoded directional impulse response 202(2) is angled similarly to the arrival direction of initial sound wavefront 110B(2) at the listener 106 in FIG. 1B. In contrast, encoded directional impulse response 202(3) is located to the left of and slightly lower than listener 106 on the drawing page in FIG. 2. Accordingly, the arrow depicting encoded directional impulse response 202(3) is pointing in roughly an opposite direction from either of encoded directional impulse responses 202(1) or 202(2), indicating that a sound emanating from a respective location to encoded directional impulse response 202(3) would arrive at listener 106 from roughly the opposite direction as in either of scenarios 102A or 102B (FIGS. 1A and 1B).

The encoded directional impulse response field 200, as shown in FIG. 2, can be a visual representation of realistic arrival directions of initial sounds at listener 106 for a sound source 104 at virtually any location in environment 100. Note that in other scenarios, listener 106 could be moving as well. As such, additional encoded directional impulse response fields could be produced for any location of the listener 106 in environment 100. Parametric directional propagation concepts can include producing encoded directional impulse response fields for virtual reality worlds and/or using the encoded directional impulse response fields to render realistic sound for the virtual reality worlds. The production and/or use of encoded directional impulse response fields will be discussed further relative to FIG. 6, below.

FIGS. 1A-2 have been used to discuss parametric directional propagation concepts related to an initial sound emanating from a sound source 104 and arriving at a listener 106. FIG. 3 will now be used to introduce concepts relating to reflections and/or reverberations of sound relative to environment 100. FIG. 3 again shows scenario 102A, with the sound source 104 at location 122A (as in FIG. 1A). For sake of brevity, not all elements from FIG. 1A will be reintroduced in FIG. 3. In this case, FIG. 3 also includes reflection wavefronts 300. In FIG. 3, only a few reflection wavefronts 300 are designated to avoid clutter on the drawing page.

Here again, less realistic and more realistic models of reflections can be considered. For instance, as shown in the example in FIG. 3, reflections originating from pulse 108 can be modeled as simply arriving at listener 106 from all directions, indicated with potential reflection wavefronts 300(1). For instance, reflection wavefronts 300(1) can represent simple copies of sound associated with pulse 108 surrounding listener 106. However, reflection wavefronts 300(1) might create an incorrect sense of sound envelopment of the listener 106, as if the sound source and listener were in a shared room.

In some implementations, reflection wavefronts 300(2) can represent a more realistic model of sound reflections. Reflection wavefronts 300(2) are shown in FIG. 3 emanating from sound source 104 and reflecting off walls 113 inside the room 114. In FIG. 3, some of the reflection wavefronts 300(2) pass out of room 114, through the portal 116, and toward listener 106. Reflection wavefronts 300(2) account for the complexity of the room geometry. As such, the directionality of the sound at listener 106 has been preserved with reflection wavefronts 300(2), in contrast to reflection wavefronts 300(1), which simply surround listener 106. Stated another way, a model of sound reflections that accounts for reflections off of and/or around structures of scene geometry can be more realistic than simply surrounding a listener with non-directional incoming sound.

In FIG. 3, only a few reflection wavefronts 300(2) are depicted to avoid clutter on the drawing page. Note that true sound propagation may be thought of as similar to ripples in a pond emanating from a point source, rather than individual rays of light, for instance. In FIG. 2, encoded directional impulse response field 200 was provided as a representation of realistic arrival directions of initial sounds at listener 106. A reflection response field can be generated to model the directionality of arrivals of sound reflections. However, it is difficult to provide a similar visual representation for a reflection response field due the inherent complexity of the rippling sound. In some cases, perceptual parameter field can be used to refer to encoded directional impulse response fields related to initial sounds and/or to reflection response fields related to sound reflections. Perceptual parameter fields will be discussed further relative to FIG. 6, below.

Taken together, realistic directionality of both initial sound arrivals and sound reflections can improve sensory immersion in virtual environments. For instance, proper sound directionality can complement visual perception, such that hearing and vision are coordinated, as one would expect in reality. Further introductory parametric directional propagation concepts will now be provided relative to FIGS. 4 and 5. The examples shown in FIGS. 4 and 5 include aspects of both initial sound arrival(s) and sound reflections for a given sound event.

FIG. 4 illustrates an example environment 400 and scenario 402. Similar to FIG. 1A, FIG. 4 includes a sound source 404 and a listener 406. The sound source 404 can emit a pulse 408. The pulse 408 can travel along initial sound wavefronts 410 (solid lines in FIG. 4). Environment 400 can also include walls 412, a room 414, two portals 416, and an area outside 418. Sound reflections bouncing off walls 412 are shown in FIG. 4 as reflection wavefronts 420 (dashed lines in FIG. 4). A listener location is generally indicated at 422.

In this example, the two portals 416 add complexity to the scenario. For instance, each portal presents an opportunity for a respective initial sound arrival to arrive at listener location 422. As such, this example includes two initial sound wavefronts 410(1) and 410(2). Similarly, sound reflections can pass through both portals 416, indicated by the multiple reflection wavefronts 420. Detail regarding the timing of these arrivals will now be discussed relative to FIG. 5.

FIG. 5 includes an impulse response graph 500. The x-axis of graph 500 can represent time and the y-axis can represent pressure deviation (e.g., loudness). Portions of graph 500 can generally correspond to initial sound(s), reflections, and reverberations, generally indicated at 502, 504, and 506, respectively. Graph 500 can include initial sound impulse responses (IR) 508, reflection impulse responses 510, decay time 512, an initial sound delay 514, and a reflection delay 516.

In this case, initial sound impulse response 508(1) can correspond to initial sound wavefront 410(1) of scenario 402 (FIG. 4), while initial sound impulse response 508(2) can correspond to initial sound wavefront 410(1). Note that in the example shown in FIG. 4, a path length of initial sound wavefront 410(1) from the sound source 404 to the listener 406 is slightly shorter than a path length of initial sound wavefront 410(2). Accordingly, initial sound wavefront 410(1) would be expected to arrive earlier at listener 406 and sound slightly louder than initial sound wavefront 410(2). (Initial sound wavefront 410(2) could sound relatively quieter since the longer path length might allow more dissipation of sound, for instance.) Therefore, in graph 500, initial sound impulse response 508(1) is further left along the x-axis and also has a higher peak on the y-axis than initial sound impulse response 508(2).

Graph 500 also depicts the multiple reflection impulse responses 510 in section 504 of graph 500. Only the first reflection impulse response 510 is designated to avoid clutter on the drawing page. The reflection impulse responses 510 can attenuate over time, with peaks generally lowering on the y-axis of graph 500, which can represent diminishing loudness. The attenuation of the reflection impulse responses 510 over time can be represented and/or modeled as decay time 512. Eventually the reflections can be considered reverberations, indicated in section 506.

Graph 500 also depicts the initial sound delay 514. Initial sound delay 514 can represent an amount of time between the initiation of the sound event, in this case at the origin of graph 500, and the initial sound impulse response 508(1). The initial sound delay 514 can be related to the path length of initial sound wavefront 410(1) from the sound source 404 to the listener 406 (FIG. 4). Therefore, proper modeling of initial sound wavefront 410(1), propagating around walls 412 and through portal 416(1), can greatly improve the realness of rendered sound by more accurately timing the initial sound delay 514. Following the initial sound delay 514, graph 500 also depicts the reflection delay 516. Reflection delay 516 can represent an amount of time between the initial sound impulse response 508(1) and arrival of the first reflection impulse response 510. Here again, proper timing of the reflection delay 516 can greatly improve the realness of rendered sound.

Additional aspects related to timing of the initial sound impulse responses 508 and/or the reflection impulse responses 510 can also help model realistic sound. For example, timing can be considered when modeling directionality of the sound and/or loudness of the sound. In FIG. 5, arrival directions 518 of the initial sound impulse responses 508 are indicated as arrows corresponding to the 2D directionality of the initial sound wavefronts 410 in FIG. 4. (Directional impulse responses will be described in more detail relative to FIGS. 7A-7C, below.) In some cases, the directionality of initial sound impulse response 508(1), corresponding to the first sound to arrive at listener 406, can be more helpful in modeling realistic sound than the directionality of the second-arriving initial sound impulse response 508(2). Stated another way, in some implementations, the directionality of any initial sound impulse response 508 arriving within the first 1 ms (for example) after the initial sound delay 514 can be used to model realistic sound. In FIG. 5, a time window for capturing the directionality of initial sound impulse responses 508 is shown at initial direction time gap 520. In some cases, the directionality of the initial sound impulse responses 508 from the initial direction time gap 520 can be used to produce an encoded directional impulse response, such as in the examples described above relative to FIG. 2.

Similarly, initial sound loudness time gap 522 can be used to model how loud the initial sound impulse responses 508 will seem to a listener. In this case, the initial sound loudness time gap 522 can be 10 ms. For instance, the height of peaks of initial sound impulse responses 508 on graph 500 occurring within 10 ms after the initial sound delay 514 can be used to model the loudness of initial sound arriving at a listener. Furthermore, a reflection loudness time gap 524 can be a length of time, after the reflection delay 516, used to model how loud the reflection impulse responses 510 will seem to a listener. In this case, the reflection loudness time gap 524 can be 80 ms. The lengths of the time gaps 520, 522, and 524 provided here are for illustration purposes and not meant to be limiting.

Any given virtual reality scene can have multiple sound sources and/or multiple listeners. The multiple sound sources (or a single sound source) can emit overlapping sound. For example, a first sound source may emit a first sound for which reflections are arriving at a listener while the initial sound of a second sound source is arriving at the same listener. Each of these sounds can warrant a separate sound wave propagation field (FIG. 2). The scenario can be further complicated when considering that sound sources and listeners can move about a virtual reality scene. Each new location of sound sources and listeners can also warrant a new sound wave propagation field.

To summarize, proper modeling of the initial sounds and the multiply-scattered reflections and/or reverberations propagating around a complex scene can greatly improve the realness of rendered sound. In some cases, modeling of complex sound can include accurately presenting the timing, directionality, and/or loudness of the sound as it arrives at a listener. Realistic timing, directionality, and/or loudness of sound, based on scene geometry, can be used to build the richness and/or fullness that can help convince a listener that they are immersed in a virtual reality world. Modeling and/or rendering the ensuing acoustic complexity can present a voluminous technical problem. A system for accomplishing modeling and/or rendering of the acoustic complexity is described below relative to FIG. 6.

First Example System

A first example system 600 of parametric directional propagation concepts is illustrated in FIG. 6. System 600 is provided as a logical organization scheme in order to aid the reader in understanding the detailed material in the following sections.

In this example, system 600 can include a parametric directional propagation component 602. The parametric directional propagation component 602 can operate on a virtual reality (VR) space 604. In system 600, the parametric directional propagation component 602 can be used to produce realistic rendered sound 606 for the virtual reality space 604. In the example shown in FIG. 6, functions of the parametric directional propagation component 602 can be organized into three Stages. For instance, Stage One can relate to simulation 608, Stage Two can relate to perceptual encoding 610, and Stage Three can relate to rendering 612. Also shown in FIG. 6, the virtual reality space 604 can have associated virtual reality space data 614. The parametric directional propagation component 602 can also operate on and/or produce directional impulse responses 616, perceptual parameter fields 618, and sound event input 620, which can include sound source data 622 and/or listener data 624 associated with a sound event in the virtual reality space 604. In this example, the rendered sound 606 can include rendered initial sound(s) 626 and/or rendered sound reflections 628.

As illustrated in the example in FIG. 6, at simulation 608 (Stage One), parametric directional propagation component 602 can receive virtual reality space data 614. The virtual reality space data 614 can include geometry (e.g., structures, materials of objects, etc.) in the virtual reality space 604, such as geometry 111 indicated in FIG. 1A. For instance, the virtual reality space data 614 can include a voxel map for the virtual reality space 604 that maps the geometry, including structures and/or other aspects of the virtual reality space 604. In some cases, simulation 608 can include directional acoustic simulations of the virtual reality space 604 to precompute sound wave propagation fields. More specifically, in this example simulation 608 can include generation of directional impulse responses 616 using the virtual reality space data 614. Directional impulse responses 616 can be generated for initial sounds and/or sound reflections. (Directional impulse responses will be described in more detail relative to FIGS. 7A-7C, below.) Stated another way, simulation 608 can include using a precomputed wave-based approach (e.g., pre-computed wave technique) to capture the complexity of the directionality of sound in a complex scene.

In some cases, the simulation 608 of Stage One can include producing relatively large volumes of data. For instance, the directional impulse responses 616 can be nine-dimensional (9D) directional response functions associated with the virtual reality space 604. For instance, referring to the example in FIG. 1A, the 9 dimensions can be 3 dimensions relating to the position of sound source 104 in environment 100, 3 dimensions relating to the position of listener 106, a time dimension (see the x-axis in the example shown in FIG. 5), and 2 dimensions relating to directionality of the incoming initial sound wavefront 110A(2) to the listener 106. In some cases, capturing the complexity of a virtual reality space in this manner can lead to generation of petabyte-scale wave fields. This can create a technical problem related to data processing and/or data storage. Parametric directional propagation concepts can include techniques for solutions for reducing data processing and/or data storage, example of which are provided below.

In some implementations, a number of locations within the virtual reality space 604 for which the directional impulse responses 616 are generated can be reduced. For example, directional impulse responses 616 can be generated based on potential listener locations (e.g., listener probes, player probes) scattered at particular locations within virtual reality space 604, rather than at every location (e.g., every voxel). The potential listener locations can be viewed as similar to listener location 124 in FIG. 1A and/or listener location 422 in FIG. 4. The potential listener locations can be automatically laid out within the virtual reality space 604 and/or can be adaptively-sampled. For instance, potential listener locations can be located more densely in spaces where scene geometry is locally complex (e.g., inside a narrow corridor with multiple portals), and located more sparsely in a wide-open space (e.g., outdoor field or meadow). Similarly, potential sound source locations (such as 122A and 122B in FIGS. 1A and 1B) for which directional impulse responses 616 are generated can be located more densely or sparsely as scene geometry permits. Reducing the number of locations within the virtual reality space 604 for which the directional impulse responses 616 are generated can significantly reduce data processing and/or data storage expenses in Stage One.

In some cases, a geometry of virtual reality space 604 can be dynamic. For example, a door in virtual reality space 604 might be opened or closed, or a wall might be blown up, changing the geometry of virtual reality space 604. In such examples, simulation 608 can receive updated virtual reality space data 614. Solutions for reducing data processing and/or data storage in situations with updated virtual reality space data 614 can include precomputing directional impulse responses 616 for some situations. For instance, opening and/or closing a door can be viewed as an expected and/or regular occurrence in a virtual reality space 604, and therefore representative of a situation that warrants modeling of both the opened and closed cases. However, blowing up a wall can be an unexpected and/or irregular occurrence. In this situation, data processing and/or data storage can be reduced by re-computing directional impulse responses 616 for a limited portion of virtual reality space 604, such as the vicinity of the blast. A weighted cost benefit analysis can be considered when deciding to cover such environmental scenarios. For instance, door opening and closing may be relatively likely to happen in a game scenario and so a simulation could be run for each condition in a given implementation. In contrast, a likelihood of a particular section of wall being exploded may be relatively low, so simulations for such scenarios may not be deemed worthwhile for a given implementation.