Facebook Patent | Audio Augmentation Using Environmental Data
Publication Number: 10595149
Publication Date: 20200317
The disclosed computer-implemented method for performing directional beamforming according to an anticipated position may include accessing environment data indicating a sound source within an environment. The device may include various audio hardware components configured to generate steerable audio beams. The method may also include identifying the location of the sound source within the environment based on the accessed environment data, and then steering the audio beams of the device to the identified location of the sound source within the environment. Various other methods, systems, and computer-readable media are also disclosed.
Augmented reality (AR) and virtual reality (VR) devices are becoming increasingly common. AR devices typically have two main components including a display and a sound source, while VR devices typically include a display, a sound source and haptics components that provide haptic feedback to the user. The display may be a full headset in the case of VR, or may be a pair of glasses in the case of AR. The sound source may include speakers built into the AR/VR device itself or may include separate earphones.
Current speakers in such AR and VR systems are typically designed to reproduce audio for the user without a great deal of customization. In some cases, the audio may be processed using surround sound decoding. And, in such cases, the output audio may be spatialized to sound like it is coming from a certain direction (e.g., in front of, to the side of or behind the user). However, the audio processing does not take into account whether the AR/VR device itself is moving, or where the device is moving, or whether other AR/VR devices are present in the immediate area.
As will be described in greater detail below, this disclosure describes methods and systems that access environment data indicating the location of a sound source within the environment and then beamform in that direction in order to improve audio reception. In one example, a computer-implemented method for performing directional beamforming based on environment data may include accessing, at a device, environment data that includes an indication of at least one sound source within the environment. The process of “beamforming” or targeting an audio beam at a given person or location may increase a playback headset’s ability to provide a clear and intelligible audio signal to the user. The audio beam may be a focused region to which a microphone is directed in order to capture audio signals. The device may include audio hardware components that are configured to generate such steerable audio beams. The method may also include identifying the location of the sound source within the environment based on the accessed environment data, and then steering the audio beams of the device to the identified location of the sound source within the environment.
In some examples, the device may be an augmented reality (AR) or virtual reality (VR) device. The environment may include multiple AR or VR devices, where each AR or VR device records its own location. In some examples, the environment may include multiple AR devices, where each AR device may record the location of other AR devices using sensor data captured by the AR devices. In some examples, the AR device may track the location of multiple other AR devices using the environment data.
In some examples, historical device movement data may be implemented to identify a future sound source location where the sound source (e.g., a person) is likely to move. Future sound source locations may be determined on a continually updated basis. In this manner, the audio beams of the device may be continually steered to the updated future sound source location.
In some examples, the method for directionally beamforming based on an anticipated location may include detecting that a reverberated signal was received at a device at a higher signal level than a direct-path signal. The method may further include identifying a potential path traveled by the reverberated signal, and then steering the audio beams to travel along the identified path traveled by the reverberated signal. The method may also include transitioning the audio beam steering back to a direct path as the device moves between the current device location and the future sound source location.
In some examples, the audio beams may be steered based on a specific beamforming policy. Some embodiments may include accessing an audio signal that is to be reproduced using the audio beams, identifying the location of another device, and modifying the accessed audio signal to spatially re-render the audio signal to sound as if coming from the other device.
In some examples, the device may receive pre-generated environment data or historical environmental data from a remote source and may implement the received environment data or historical environmental data to identify the future sound source location. In some examples, other devices in the environment may provide environment data to a server or to another local or remote device. The server may augment the environment information to account for delay and constraints of a target device.
In some examples, steering control signals are generated upon determining that beamforming is needed to raise a signal level to a specified minimum level. In some examples, the accessed portions of environment data may be used to perform selective active noise cancellation in a specified direction. In some examples, various active noise cancellation parameters may be adjusted to selectively remove sounds from a specified a direction, or to selectively allow sounds from a specified direction. In further examples, a dry audio signal may be combined with various effects so that the modified dry audio signal sounds as if the modified dry audio signal originated in the user’s current environment.
In addition, a corresponding device for directionally beamforming based on environment data may include several modules stored in memory, including a data accessing module configured to access environment data that includes an indication of a sound source within the environment. The device may include audio hardware components configured to generate steerable audio beams. The device may further include a location identifying module configured to identify the location of the sound source within the environment based on the accessed environment data. The device may also include a beam steering module configured to steer the audio beams of the device to the identified location of the sound source within the environment.
In some examples, the above-described method may be encoded as computer-readable instructions on a computer-readable medium. For example, a computer-readable medium may include one or more computer-executable instructions that, when executed by at least one processor of a computing device, may cause the computing device to access environment data that includes an indication of a sound source within the environment, identify the location of the sound source within the environment based on the accessed environment data, and steer the audio beams of the device to the identified location of the sound source within the environment.
Features from any of the above-mentioned embodiments may be used in combination with one another in accordance with the general principles described herein. These and other embodiments, features, and advantages will be more fully understood upon reading the following detailed description in conjunction with the accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings illustrate a number of exemplary embodiments and are a part of the specification. Together with the following description, these drawings demonstrate and explain various principles of the instant disclosure.
FIG. 1 illustrates an embodiment of an artificial reality headset.
FIG. 2 illustrates an embodiment of an augmented reality headset and corresponding neckband.
FIG. 3 illustrates an embodiment of a virtual reality headset.
FIG. 4 illustrates an embodiment in which the embodiments described herein may be performed including directionally beamforming based on environment data.
FIG. 5 illustrates a flow diagram of an exemplary method for directionally beamforming based on environment data.
FIG. 6 illustrates an alternative embodiment in which then embodiments described herein may operate including directionally beamforming based on environment data.
FIG. 7 illustrates an alternative embodiment in which then embodiments described herein may operate including directionally beamforming based on environment data.
FIG. 8 illustrates an alternative embodiment in which then embodiments described herein may operate including directionally beamforming based on environment data.
FIG. 9 illustrates an alternative embodiment in which then embodiments described herein may operate including directionally beamforming based on environment data.
Throughout the drawings, identical reference characters and descriptions indicate similar, but not necessarily identical, elements. While the exemplary embodiments described herein are susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, the exemplary embodiments described herein are not intended to be limited to the particular forms disclosed. Rather, the instant disclosure covers all modifications, equivalents, and alternatives falling within the scope of the appended claims.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
The present disclosure is generally directed to methods and systems for performing directional beamforming based on environment data indicating a sound source that may be of interest to a listening user. As will be explained in greater detail below, embodiments of the instant disclosure may allow a user to more easily hear other users when using an artificial reality (AR) headset. For example, if a large number of users are in a room, or if the room has poor acoustics, users may have a hard time hearing each other. In the embodiments herein, AR headsets may be configured to perform beamforming to better focus in on a given sound source (e.g., a user who is speaking). The beamforming may not only form a beam toward a current location of a speaking user but may also direct beams to new locations in anticipation of the speaking user moving there.
Indeed, in at least some of the embodiments herein, the AR headset (or a computer system to which the AR headset is communicatively connected) may implement logic to determine where a speaking user is likely to move. The listening user’s AR headset may make this determination based on knowledge of the current environment, knowledge of the speaking user’s past movements, as well as current location and/or movement information for the speaking user. Using some or all of this information, the listening user’s AR headset may determine where the speaking user is likely to move and, in advance of the movement, may beamform in the expected direction of movement. Then, if the speaking user moves in that direction, the listening user’s AR headset will already be beamforming in that direction, thereby enhancing the listening user’s ability to hear the speaking user. The process of “beamforming” or targeting an audio beam at a given person or location may increase the AR headset’s ability to provide a clear and intelligible audio signal to the user.
Embodiments of the instant disclosure may include or be implemented in conjunction with various types of artificial reality systems. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivative thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., to perform activities in) an artificial reality.
Artificial reality systems may be implemented in a variety of different form factors and configurations. Some artificial reality systems may be designed to work without near-eye displays (NEDs), an example of which is AR system 100 in FIG. 1. Other artificial reality systems may include an NED that also provides visibility into the real world (e.g., AR system 200 in FIG. 2) or that visually immerses a user in an artificial reality (e.g., VR system 300 in FIG. 3). While some artificial reality devices may be self-contained systems, other artificial reality devices may communicate and/or coordinate with external devices to provide an artificial reality experience to a user. Examples of such external devices include handheld controllers, mobile devices, desktop computers, devices worn by a user, devices worn by one or more other users, and/or any other suitable external system.
Turning to FIG. 1, AR system 100 generally represents a wearable device dimensioned to fit about a body part (e.g., a head) of a user. As shown in FIG. 1, system 100 may include a frame 102 and a camera assembly 104 that is coupled to frame 102 and configured to gather information about a local environment by observing the local environment. AR system 100 may also include one or more audio devices, such as output audio transducers 108(A) and 108(B) and input audio transducers 110. Output audio transducers 108(A) and 108(B) may provide audio feedback and/or content to a user, and input audio transducers 110 may capture audio in a user’s environment.
As shown, AR system 100 may not necessarily include an NED positioned in front of a user’s eyes. AR systems without NEDs may take a variety of forms, such as head bands, hats, hair bands, belts, watches, wrist bands, ankle bands, rings, neckbands, necklaces, chest bands, eyewear frames, and/or any other suitable type or form of apparatus. While AR system 100 may not include an NED, AR system 100 may include other types of screens or visual feedback devices (e.g., a display screen integrated into a side of frame 102).
The embodiments discussed in this disclosure may also be implemented in AR systems that include one or more NEDs. For example, as shown in FIG. 2, AR system 200 may include an eyewear device 202 with a frame 210 configured to hold a left display device 215(A) and a right display device 215(B) in front of a user’s eyes. Display devices 215(A) and 215(B) may act together or independently to present an image or series of images to a user. While AR system 200 includes two displays, embodiments of this disclosure may be implemented in AR systems with a single NED or more than two NEDs.
In some embodiments, AR system 200 may include one or more sensors, such as sensor 240. Sensor 240 may generate measurement signals in response to motion of AR system 200 and may be located on substantially any portion of frame 210. Sensor 240 may include a position sensor, an inertial measurement unit (IMU), a depth camera assembly, or any combination thereof. In some embodiments, AR system 200 may or may not include sensor 240 or may include more than one sensor. In embodiments in which sensor 240 includes an IMU, the IMU may generate calibration data based on measurement signals from sensor 240. Examples of sensor 240 may include, without limitation, accelerometers, gyroscopes, magnetometers, other suitable types of sensors that detect motion, sensors used for error correction of the IMU, or some combination thereof.
AR system 200 may also include a microphone array with a plurality of acoustic sensors 220(A)-220(J), referred to collectively as acoustic sensors 220. Acoustic sensors 220 may be transducers that detect air pressure variations induced by sound waves. Each acoustic sensor 220 may be configured to detect sound and convert the detected sound into an electronic format (e.g., an analog or digital format). The microphone array in FIG. 2 may include, for example, ten acoustic sensors: 220(A) and 220(B), which may be designed to be placed inside a corresponding ear of the user, acoustic sensors 220(C), 220(D), 220(E), 220(F), 220(G), and 220(H), which may be positioned at various locations on frame 210, and/or acoustic sensors 220(1) and 220(J), which may be positioned on a corresponding neckband 205.
The configuration of acoustic sensors 220 of the microphone array may vary. While AR system 200 is shown in FIG. 2 as having ten acoustic sensors 220, the number of acoustic sensors 220 may be greater or less than ten. In some embodiments, using higher numbers of acoustic sensors 220 may increase the amount of audio information collected and/or the sensitivity and accuracy of the audio information. In contrast, using a lower number of acoustic sensors 220 may decrease the computing power required by the controller 250 to process the collected audio information. In addition, the position of each acoustic sensor 220 of the microphone array may vary. For example, the position of an acoustic sensor 220 may include a defined position on the user, a defined coordinate on the frame 210, an orientation associated with each acoustic sensor, or some combination thereof.
Acoustic sensors 220(A) and 220(B) may be positioned on different parts of the user’s ear, such as behind the pinna or within the auricle or fossa. Or, there may be additional acoustic sensors on or surrounding the ear in addition to acoustic sensors 220 inside the ear canal. Having an acoustic sensor positioned next to an ear canal of a user may enable the microphone array to collect information on how sounds arrive at the ear canal. By positioning at least two of acoustic sensors 220 on either side of a user’s head (e.g., as binaural microphones), AR device 200 may simulate binaural hearing and capture a 3D stereo sound field around about a user’s head. In some embodiments, the acoustic sensors 220(A) and 220(B) may be connected to the AR system 200 via a wired connection, and in other embodiments, the acoustic sensors 220(A) and 220(B) may be connected to the AR system 200 via a wireless connection (e.g., a Bluetooth connection). In still other embodiments, the acoustic sensors 220(A) and 220(B) may not be used at all in conjunction with the AR system 200.
Acoustic sensors 220 on frame 210 may be positioned along the length of the temples, across the bridge, above or below display devices 215(A) and 215(B), or some combination thereof. Acoustic sensors 220 may be oriented such that the microphone array is able to detect sounds in a wide range of directions surrounding the user wearing the AR system 200. In some embodiments, an optimization process may be performed during manufacturing of AR system 200 to determine relative positioning of each acoustic sensor 220 in the microphone array.
AR system 200 may further include or be connected to an external device. (e.g., a paired device), such as neckband 205. As shown, neckband 205 may be coupled to eyewear device 202 via one or more connectors 230. The connectors 230 may be wired or wireless connectors and may include electrical and/or non-electrical (e.g., structural) components. In some cases, the eyewear device 202 and the neckband 205 may operate independently without any wired or wireless connection between them. While FIG. 2 illustrates the components of eyewear device 202 and neckband 205 in example locations on eyewear device 202 and neckband 205, the components may be located elsewhere and/or distributed differently on eyewear device 202 and/or neckband 205. In some embodiments, the components of the eyewear device 202 and neckband 205 may be located on one or more additional peripheral devices paired with eyewear device 202, neckband 205, or some combination thereof. Furthermore, neckband 205 generally represents any type or form of paired device. Thus, the following discussion of neckband 205 may also apply to various other paired devices, such as smart watches, smart phones, wrist bands, other wearable devices, hand-held controllers, tablet computers, laptop computers, etc.
Pairing external devices, such as neckband 205, with AR eyewear devices may enable the eyewear devices to achieve the form factor of a pair of glasses while still providing sufficient battery and computation power for expanded capabilities. Some or all of the battery power, computational resources, and/or additional features of AR system 200 may be provided by a paired device or shared between a paired device and an eyewear device, thus reducing the weight, heat profile, and form factor of the eyewear device overall while still retaining desired functionality. For example, neckband 205 may allow components that would otherwise be included on an eyewear device to be included in neckband 205 since users may tolerate a heavier weight load on their shoulders than they would tolerate on their heads. Neckband 205 may also have a larger surface area over which to diffuse and disperse heat to the ambient environment. Thus, neckband 205 may allow for greater battery and computation capacity than might otherwise have been possible on a stand-alone eyewear device. Since weight carried in neckband 205 may be less invasive to a user than weight carried in eyewear device 202, a user may tolerate wearing a lighter eyewear device and carrying or wearing the paired device for greater lengths of time than the user would tolerate wearing a heavy standalone eyewear device, thereby enabling an artificial reality environment to be incorporated more fully into a user’s day-to-day activities.
Neckband 205 may be communicatively coupled with eyewear device 202 and/or to other devices. The other devices may provide certain functions (e.g., tracking, localizing, depth mapping, processing, storage, etc.) to the AR system 200. In the embodiment of FIG. 2, neckband 205 may include two acoustic sensors (e.g., 220(1) and 220(J)) that are part of the microphone array (or potentially form their own microphone subarray). Neckband 205 may also include a controller 225 and a power source 235.
Acoustic sensors 220(1) and 220(J) of neckband 205 may be configured to detect sound and convert the detected sound into an electronic format (analog or digital). In the embodiment of FIG. 2, acoustic sensors 220(1) and 220(J) may be positioned on neckband 205, thereby increasing the distance between the neckband acoustic sensors 220(1) and 220(J) and other acoustic sensors 220 positioned on eyewear device 202. In some cases, increasing the distance between acoustic sensors 220 of the microphone array may improve the accuracy of beamforming performed via the microphone array. For example, if a sound is detected by acoustic sensors 220(C) and 220(D) and the distance between acoustic sensors 220(C) and 220(D) is greater than, e.g., the distance between acoustic sensors 220(D) and 220(E), the determined source location of the detected sound may be more accurate than if the sound had been detected by acoustic sensors 220(D) and 220(E).
Controller 225 of neckband 205 may process information generated by the sensors on neckband 205 and/or AR system 200. For example, controller 225 may process information from the microphone array that describes sounds detected by the microphone array. For each detected sound, controller 225 may perform a DoA estimation to estimate a direction from which the detected sound arrived at the microphone array. As the microphone array detects sounds, controller 225 may populate an audio data set with the information. In embodiments in which AR system 200 includes an inertial measurement unit, controller 225 may compute all inertial and spatial calculations from the IMU located on eyewear device 202. Connector 230 may convey information between AR system 200 and neckband 205 and between AR system 200 and controller 225. The information may be in the form of optical data, electrical data, wireless data, or any other transmittable data form. Moving the processing of information generated by AR system 200 to neckband 205 may reduce weight and heat in eyewear device 202, making it more comfortable to the user.
Power source 235 in neckband 205 may provide power to eyewear device 202 and/or to neckband 205. Power source 235 may include, without limitation, lithium ion batteries, lithium-polymer batteries, primary lithium batteries, alkaline batteries, or any other form of power storage. In some cases, power source 235 may be a wired power source. Including power source 235 on neckband 205 instead of on eyewear device 202 may help better distribute the weight and heat generated by power source 235.
As noted, some artificial reality systems may, instead of blending an artificial reality with actual reality, substantially replace one or more of a user’s sensory perceptions of the real world with a virtual experience. One example of this type of system is a head-worn display system, such as VR system 300 in FIG. 3, that mostly or completely covers a user’s field of view. VR system 300 may include a front rigid body 302 and a band 304 shaped to fit around a user’s head. VR system 300 may also include output audio transducers 306(A) and 306(B). Furthermore, while not shown in FIG. 3, front rigid body 302 may include one or more electronic elements, including one or more electronic displays, one or more inertial measurement units (IMUs), one or more tracking emitters or detectors, and/or any other suitable device or system for creating an artificial reality experience.
Artificial reality systems may include a variety of types of visual feedback mechanisms. For example, display devices in AR system 200 and/or VR system 300 may include one or more liquid crystal displays (LCDs), light emitting diode (LED) displays, organic LED (OLED) displays, and/or any other suitable type of display screen. Artificial reality systems may include a single display screen for both eyes or may provide a display screen for each eye, which may allow for additional flexibility for varifocal adjustments or for correcting a user’s refractive error. Some artificial reality systems may also include optical subsystems having one or more lenses (e.g., conventional concave or convex lenses, Fresnel lenses, adjustable liquid lenses, etc.) through which a user may view a display screen.
In addition to or instead of using display screens, some artificial reality systems may include one or more projection systems. For example, display devices in AR system 200 and/or VR system 300 may include micro-LED projectors that project light (using, e.g., a waveguide) into display devices, such as clear combiner lenses that allow ambient light to pass through. The display devices may refract the projected light toward a user’s pupil and may enable a user to simultaneously view both artificial reality content and the real world. Artificial reality systems may also be configured with any other suitable type or form of image projection system.
Artificial reality systems may also include various types of computer vision components and subsystems. For example, AR system 100, AR system 200, and/or VR system 300 may include one or more optical sensors such as two-dimensional (2D) or three-dimensional (3D) cameras, time-of-flight depth sensors, single-beam or sweeping laser rangefinders, 3D LiDAR sensors, and/or any other suitable type or form of optical sensor. An artificial reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions.
Artificial reality systems may also include one or more input and/or output audio transducers. In the examples shown in FIGS. 1 and 3, output audio transducers 108(A), 108(B), 306(A), and 306(B) may include voice coil speakers, ribbon speakers, electrostatic speakers, piezoelectric speakers, bone conduction transducers, cartilage conduction transducers, and/or any other suitable type or form of audio transducer. Similarly, input audio transducers 110 may include condenser microphones, dynamic microphones, ribbon microphones, and/or any other type or form of input transducer. In some embodiments, a single transducer may be used for both audio input and audio output.
While not shown in FIGS. 1-3, artificial reality systems may include tactile (i.e., haptic) feedback systems, which may be incorporated into headwear, gloves, body suits, handheld controllers, environmental devices (e.g., chairs, floormats, etc.), and/or any other type of device or system. Haptic feedback systems may provide various types of cutaneous feedback, including vibration, force, traction, texture, and/or temperature. Haptic feedback systems may also provide various types of kinesthetic feedback, such as motion and compliance. Haptic feedback may be implemented using motors, piezoelectric actuators, fluidic systems, and/or a variety of other types of feedback mechanisms. Haptic feedback systems may be implemented independent of other artificial reality devices, within other artificial reality devices, and/or in conjunction with other artificial reality devices.
By providing haptic sensations, audible content, and/or visual content, artificial reality systems may create an entire virtual experience or enhance a user’s real-world experience in a variety of contexts and environments. For instance, artificial reality systems may assist or extend a user’s perception, memory, or cognition within a particular environment. Some systems may enhance a user’s interactions with other people in the real world or may enable more immersive interactions with other people in a virtual world. Artificial reality systems may also be used for educational purposes (e.g., for teaching or training in schools, hospitals, government organizations, military organizations, business enterprises, etc.), entertainment purposes (e.g., for playing video games, listening to music, watching video content, etc.), and/or for accessibility purposes (e.g., as hearing aids, visuals aids, etc.). The embodiments disclosed herein may enable or enhance a user’s artificial reality experience in one or more of these contexts and environments and/or in other contexts and environments.
Some AR systems may map a user’s environment using techniques referred to as “simultaneous location and mapping” (SLAM). SLAM mapping and location identifying techniques may involve a variety of hardware and software tools that can create or update a map of an environment while simultaneously keeping track of a user’s location within the mapped environment. SLAM may use many different types of sensors to create a map and determine a user’s position within the map.
SLAM techniques may, for example, implement optical sensors to determine a user’s location. Radios including WiFi, Bluetooth, global positioning system (GPS), cellular or other communication devices may be also used to determine a user’s location relative to a radio transceiver or group of transceivers (e.g., a WiFi router or group of GPS satellites). Acoustic sensors such as microphone arrays or 2D or 3D sonar sensors may also be used to determine a user’s location within an environment. AR and VR devices (such as systems 100, 200, and 300 of FIGS. 1 and 2, respectively) may incorporate any or all of these types of sensors to perform SLAM operations such as creating and continually updating maps of the user’s current environment. In at least some of the embodiments described herein, SLAM data generated by these sensors may be referred to as “environmental data” and may indicate a user’s current environment. This data may be stored in a local or remote data store (e.g., a cloud data store) and may be provided to a user’s AR/VR device on demand.
When the user is wearing an AR headset or VR headset in a given environment, the user may be interacting with other users or other electronic devices that serve as audio sources. In some cases, it may be desirable to determine where the audio sources are located relative to the user and then present the audio sources to the user as if they were coming from the location of the audio source. The process of determining where the audio sources are located relative to the user may be referred to herein as “localization,” and the process of rendering playback of the audio source signal to appear as if it is coming from a specific direction may be referred to herein as “spatialization.”
Localizing an audio source may be performed in a variety of different ways. In some cases, an AR or VR headset may initiate a direction of arrival (DOA) analysis to determine the location of a sound source. The DOA analysis may include analyzing the intensity, spectra, and/or arrival time of each sound at the AR/VR device to determine the direction from which the sounds originated. In some cases, the DOA analysis may include any suitable algorithm for analyzing the surrounding acoustic environment in which the artificial reality device is located.
For example, the DOA analysis may be designed to receive input signals from a microphone and apply digital signal processing algorithms to the input signals to estimate the direction of arrival. These algorithms may include, for example, delay and sum algorithms where the input signal is sampled, and the resulting weighted and delayed versions of the sampled signal are averaged together to determine a direction of arrival. A least mean squared (LMS) algorithm may also be implemented to create an adaptive filter. This adaptive filter may then be used to identify differences in signal intensity, for example, or differences in time of arrival. These differences may then be used to estimate the direction of arrival. In another embodiment, the DOA may be determined by converting the input signals into the frequency domain and selecting specific bins within the time-frequency (TF) domain to process. Each selected TF bin may be processed to determine whether that bin includes a portion of the audio spectrum with a direct-path audio signal. Those bins having a portion of the direct-path signal may then be analyzed to identify the angle at which a microphone array received the direct-path audio signal. The determined angle may then be used to identify the direction of arrival for the received input signal. Other algorithms not listed above may also be used alone or in combination with the above algorithms to determine DOA.
In some embodiments, different users may perceive the source of a sound as coming from slightly different locations. This may be the result of each user having a unique head-related transfer function (HRTF), which may be dictated by a user’s anatomy including ear canal length and the positioning of the ear drum. The artificial reality device may provide an alignment and orientation guide, which the user may follow to customize the sound signal presented to the user based on their unique HRTF. In some embodiments, an artificial reality device may implement one or more microphones to listen to sounds within the user’s environment. The AR or VR headset may use a variety of different array transfer functions (e.g., any of the DOA algorithms identified above) to estimate the direction of arrival for the sounds. Once the direction of arrival has been determined, the artificial reality device may play back sounds to the user according to the user’s unique HRTF. Accordingly, the DOA estimation generated using the array transfer function (ATF) may be used to determine the direction from which the sounds are to be played from. The playback sounds may be further refined based on how that specific user hears sounds according to the HRTF.
In addition to or as an alternative to performing a DOA estimation, an artificial reality device may perform localization based on information received from other types of sensors. These sensors may include cameras, IR sensors, heat sensors, motion sensors, GPS receivers, or in some cases, sensor that detect a user’s eye movements. For example, as noted above, an artificial reality device may include an eye tracker or gaze detector that determines where the user is looking. Often, the user’s eyes will look at the source of the sound, if only briefly. Such clues provided by the user’s eyes may further aid in determining the location of a sound source. Other sensors such as cameras, heat sensors, and IR sensors may also indicate the location of a user, the location of an electronic device, or the location of another sound source. Any or all of the above methods may be used individually or in combination to determine the location of a sound source and may further be used to update the location of a sound source over time.