Facebook Patent | Depth Measurement Assembly With A Structured Light Source And A Time Of Flight Camera

Patent: Depth Measurement Assembly With A Structured Light Source And A Time Of Flight Camera

Publication Number: 20200090355

Publication Date: 20200319

Applicants: Facebook

Abstract

A depth measurement assembly (DMA) includes an illumination source that projects pulses of light (e.g., structured light) at a temporal pulsing frequency into a local area. The DMA includes a sensor that capture images of the pulses of light reflected from the local area and determines, using one or more of the captured images, one or more TOF phase shifts for the pulses of light. The DMA includes a controller coupled to the sensor and configured to determine a first set of estimated radial distances to an object in the local area based on the one or more TOF phase shifts. The controller determines a second estimated radial distance to the object based on an encoding of structured light and at least one of the captured images. The controller selects an estimated radial distance from the first set of radial distances.

BACKGROUND

[0001] The present disclosure relates generally to systems for determining depth of a local area and more specifically to headsets for artificial reality systems that obtain depth information of a local area with a structured light source.

[0002] Localizing an object in an arbitrary environment may be useful in a number of different contexts, ranging from artificial reality to autonomous devices. A number of techniques exist to determine a three dimensional mapping of an arbitrary environment. Some rely on a time of flight (TOF) calculation to determine depth information, while others may use structured light patterns. However, both of these techniques have a number of drawbacks. A depth camera that is based on structured light may under-utilize sensor pixel density, the maximum range is limited by the baseline, and the computational costs are generally on the higher side. TOF based depth cameras suffer from multi-path error, as well as require multiple pulsed light frequencies during a single exposure window.

SUMMARY

[0003] A structured light-based TOF depth measurement assembly (DMA) is described herein, which leverages the spatial encoding of structured light with a TOF calculation. The DMA may be incorporated into a head mounted display (HMD) to determine depth information in an arbitrary environment. In an artificial reality system, virtual content may be overlaid on top of a user’s environment based on the depth information determined by the DMA.

[0004] A DMA includes an illumination source which is configured to project pulses of light (e.g., where the intensity pattern is also structured spatially) at a plurality of temporal pulsing frequencies into a local area. The DMA includes a sensor configured to capture images of the pulses of light reflected from a local area and determine, using one or more of the captured images, one or more TOF phase shifts for the pulses of light. The DMA includes a controller coupled to the sensor and configured to determine a first set of estimated radial distances to an object in the local area based on the one or more TOF phase shifts. The controller determines a second estimated radial distance to the object based on an encoding of structured light and at least one of the captured images. The controller selects an estimated radial distance from the first set of radial distances based in part on the second estimated radial distance.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] FIG. 1 is a diagram of a HMD, in accordance with one or more embodiments.

[0006] FIG. 2 is a cross section of a front rigid body of an HMD, in accordance with one or more embodiments.

[0007] FIG. 3 is a diagram of operation of a conventional structured light DMA, in accordance with one or more embodiments.

[0008] FIG. 4 is a diagram of operation of a structured TOF depth sensor, in accordance with one or more embodiments.

[0009] FIG. 5 is a portion of a phase map of a structured TOF depth sensor, in accordance with one or more embodiments.

[0010] FIG. 6A is a pixel timing diagram for a structured TOF depth sensor with three capture windows, in accordance with one or more embodiments.

[0011] FIG. 6B is a pixel timing diagram for a structured TOF depth sensor with augmented pixels, in accordance with one or more embodiments.

[0012] FIG. 7 are timing diagrams relating to the operation of structured TOF depth sensors that utilize the photodiode sensors of FIGS. 6A and 6B, in accordance with one or more embodiments.

[0013] FIG. 8 is a flow chart of a method for determining a radial distance to an object, in accordance with one or more embodiments.

[0014] FIG. 9 is a block diagram of a system environment for providing artificial reality content, in accordance with one or more embodiments.

DETAILED DESCRIPTION OF THE DRAWINGS

[0015] Providing artificial reality content to users through a head mounted display (HMD) often relies on localizing a user’s position in an arbitrary environment and determining a three dimensional mapping of the surroundings within the arbitrary environment. The user’s surroundings within the arbitrary environment may then be represented in a virtual environment or the user’s surroundings may be overlaid with additional content.

[0016] Conventional HMDs include one or more quantitative depth cameras to determine surroundings of a user within the user’s environment. Typically, conventional depth cameras use structured light or time of flight (TOF) to determine the HMD’s location within an environment. Structured light depth cameras use an active illumination source to project known patterns into the environment surrounding the HMD. Structured light uses a pattern of light (e.g., dots, lines, fringes, etc.). The pattern is such that some portions of the environment are illuminated (e.g., illuminated with a dot) and others are not (e.g., the space between dots in the pattern). Images of the environment illuminated with the structured light are used to determine depth information. However, a structured light pattern causes signification portions of a resulting image of the projected pattern to not be illuminated. This inefficiently uses the pixel resolution of sensors capturing the resulting image; for example, projection of the pattern by a structured light depth camera results in less than 10% of sensor pixels collecting light from the projected pattern, while requiring multiple sensor pixels to be illuminated to perform a single depth measurement. In addition, the range is limited by the baseline distance between camera and illumination, even if the system is not limited by SNR. Furthermore, to get high quality depth from structured light, the computational complexity can be large.

[0017] TOF depth cameras measure a round trip travel time of light projected into the environment surrounding a depth camera and returning to pixels on a sensor array. When a uniform illumination pattern is projected into the environment, TOF depth cameras are capable of measuring depths of different objects in the environment independently via each sensor pixel. However, light incident on a sensor pixel may be a combination of light received from multiple optical paths in the environment surrounding the depth camera. Existing techniques to resolve the optical paths of light incident on a sensor’s pixels are computationally complex and do not fully disambiguate between optical paths in the environment. Furthermore, TOF depth cameras often require multiple image captures over more than one illumination pulsing frequency. It is often difficult to maintain an adequate signal to noise ratio performance over a short exposure time, which may limit the ability of the sensor to reduce the total capture time.

[0018] A structured light-based TOF depth measurement assembly (DMA) is described herein, which leverages the spatial encoding of structured light with a TOF calculation. The DMA emits structured light or a combination of structured light and uniform flood illumination into a local area. A camera assembly accumulates charge associated with a TOF phase shift, and a controller in signal communication with the camera assembly determines a number of estimated radial distances of an object in the local area based on the TOF phase shifts. Using spatial light encoding, the controller selects one of the estimated radial distances, and combines it with a triangulation calculation to determine depth information of an object. The DMA thus allows for improved efficiency of a camera sensor, since structured light can be detected along with uniform flood light. The DMA also improves the signal to noise ratio performance of conventional TOF depth cameras, since fewer image captures (and associated readout times) are required over the same exposure time. Additional improvements are described in further detail below. The DMA may be incorporated into a head mounted display (HMD) to determine depth information in an arbitrary environment. In an artificial reality system, virtual content may be overlaid on top of a user’s environment based on the depth information determined by the DMA.

[0019] Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewer.

[0020] FIG. 1 is a diagram of a HMD 100, in accordance with one or more embodiments. The HMD 100 includes a front rigid body 120 and a band 130. In some embodiments, portions of the HMD 100 may be transparent or partially transparent, such as the sides of the HMD 100 on any of the sides of the front rigid body 120. The HMD 100 shown in FIG. 1 also includes an embodiment of a depth measurement assembly (not fully shown) including a camera assembly 180 and an illumination source 170, which are further described below in conjunction with FIGS. 2-9. The front rigid body 120 includes one or more electronic display elements of an electronic display (not shown). The front rigid body 120 optionally includes an inertial measurement unit (IMU) 140, one or more position sensors 150, and a reference point 160.

[0021] FIG. 2 is a cross section 200 of a front rigid body 120 of the HMD 100 of FIG. 1, in accordance with one or more embodiments. As shown in FIG. 2, the front rigid body 120 includes an electronic display 220 and an optics block 230 that together provide image light to an eye box 240. The eye box 240 is a region in space that is occupied by a user’s eye 250. In some embodiments, the front rigid body 120 further includes an eye tracker (not shown) for tracking position of the eye 250 in the eye box 240 (i.e., eye gaze), and a controller 216 coupled to a depth measurement assembly (DMA) 210 and the electronic display 220. For purposes of illustration, FIG. 2 shows a cross section 200 associated with a single eye 250, but another optics block (not shown), separate from the optics block 230, provides altered image light to another eye of the user.

[0022] In the embodiment shown by FIG. 2, the HMD 100 includes a DMA 210 comprising the illumination source 170, the camera assembly 180, and a controller 216. Note that in the illustrated embodiments, the DMA 210 is part of the HMD 100. In alternate embodiments, the DMA 210 may be part of a near-eye display, some other HMD, or some device for depth determination. The DMA 210 functions as a structured light-based TOF depth sensor, such as the structured TOF depth sensor 400 as described in further detail with reference to FIG. 4.

[0023] In various embodiments, the illumination source 170 emits structured light with an encoded periodic pattern, which may be any structured light pattern, such as a dot pattern, square wave pattern, sinusoid pattern, some other encoded structured light pattern, or some combination thereof. In some embodiments, the illumination source 170 emits structured light that is encoded with a non-periodic pattern (e.g., so that triangulation is not confused by identical periods), such as, e.g., pseudo-random dot patterns are designed to be pseudo-random. In some embodiments, the illumination source 170 emits a series of sinusoids that each have a different phase shift into an environment surrounding the HMD 100. In various embodiments, the illumination source 170 includes an acousto-optic modulator configured to generate a sinusoidal interference pattern. However, in other embodiments the illumination source 170 includes one or more of an acousto-optic device, an electro-optic device, physical optics, optical interference, a diffractive optical device, or any other suitable components configured to generate the periodic illumination pattern.

[0024] In various embodiments, the illumination source 170 emits both structured light and uniform flood illumination into the local area 260. For example, the projected pulses of light can be composed of flood illumination overlaid with a structured light dot pattern, where each dot in the dot pattern has a brightness value that is more than a brightness value of the flood illumination. In some embodiments, the illumination source 170 may include a structured light source and a second light source that emits uniform flood illumination. Adding uniform flood illumination to the structured light improves efficiency of a sensor pixel utilization of the camera assembly 180, since the additional light augments any gaps between structured light beams.

[0025] In other embodiments, an inverse dot pattern may be projected, whereby a smoothly varying illumination is projected into the area with “dark dots” positioned in various locations. In this embodiment, a dot is a location in the projection that has a brightness value that is at least a threshold amount dimmer than spaces between the dots. In some embodiments, a dot is represented by not emitting light, whereas the space between adjacent dots is represented using at least some level of illumination. For example, the projected pulses of light can be composed of flood illumination overlaid with a structured light dot pattern, where each dot in the dot pattern has a brightness value that is less than a brightness value of the flood illumination. In this scenario, structured light detection may identify regions where illumination is missing, and the TOF measurement will measure radial depth for areas where illumination is projected. The structured light detections can be used as interpolation points to disambiguate adjacent TOF measurements. Accordingly an inverse dot pattern can help increase sensor pixel utilization.

[0026] In various embodiments, the illumination source 170 emits light at a pulse rate frequency. A plurality of pulse rate frequencies of light may be emitted into the local area 260 for a single depth measurement. Thus during a single capture window, the illumination source 170 may emit light of different pulse rate frequencies. This is described in further detail with reference to FIGS. 5-8.

[0027] The camera assembly 180 captures images of the local area 260. The camera assembly includes one or more cameras that are sensitive to light emitted from the illumination source 170. At least one of the one or more cameras in the camera assembly 180 is used to detect structured light and in a structured TOF depth sensor, such as the structured TOF depth sensor 400 as described in further detail with reference to FIG. 4. In some embodiments, the one or more cameras may also be sensitive to light in other bands (e.g., visible light). The captured images are used to calculate depths relative to the HMD 100 of various locations within the local area 260, as further described below in conjunction with FIGS. 3-9. The front rigid body 120 also has an optical axis corresponding to a path along which light propagates through the front rigid body 120. In some embodiments, the camera assembly 180 is positioned along the optical axis and captures images of a local area 260, which is a portion of an environment surrounding the front rigid body 120 within a field of view of the camera assembly 180. Objects within the local area 260 reflect incident ambient light as well as light projected by the illumination source 170, which is subsequently captured by the camera assembly 180.

[0028] The camera assembly 180 captures images of the periodic illumination patterns projected onto the local area 260 using a sensor comprising multiple pixels. The sensor may be the sensor 404 as described in further detail with reference to FIG. 4. A sensor of the camera assembly 180 may be comprised of a 2-dimensional array of pixels. Each pixel captures intensity of light emitted by the illumination source 170 from the local area 260. Thus the sensor of the camera assembly 180 may detect structured light emitted by the illumination source 170 and reflected from the local area 260, or a combination of structured light and uniform flood illumination and/or ambient light reflected from the local area 260. In some embodiments, the pixels detect phase shifts of different phases and light pulse frequencies. In some embodiments, the pixels of a sensor detect different phases and light pulse frequencies in sequential capture windows. In some embodiments, the pixels of a sensor of the camera assembly 180 are augmented pixels that have more than one on-pixel charge storage regions (also referred to as bins), and collect charge of different phases during a single capture window. These embodiments are described in further detail with respect to FIGS. 6A-7.

[0029] The controller 216 determines depth information using information (e.g., images) captured by the camera assembly 180. The controller 216 estimates depths of objects in the local area 260. The controller 216 receives charge information from a sensor of the camera assembly 180. The sensor of the camera assembly 180 accumulates charge associated with different phases of light. The sensor of the camera assembly 180 conveys the charge information to the controller 216. The controller 216 estimates radial depth information based on the phase shift of the structured light detected by the camera assembly 180. The structured light encoding is then used to disambiguate between the estimated depths from a TOF calculation. This process is described in further detail with reference to FIGS. 3-9. The controller 216 is described in further detail with reference to FIG. 9.

[0030] The electronic display 220 may be configured to display images to the user in accordance with data received from a console (not shown in FIG. 1B), such as the console 910 as described in further detail with reference to FIG. 9. The electronic display 220 may emit, during a defined time period, a plurality of images. In various embodiments, the electronic display 220 may comprise a single electronic display or multiple electronic displays (e.g., a display for each eye of a user). Examples of the electronic display include: a liquid crystal display (LCD), an organic light emitting diode (OLED) display, an inorganic light emitting diode (ILED) display, an active-matrix organic light-emitting diode (AMOLED) display, a transparent organic light emitting diode (TOLED) display, some other display, a projector,* or some combination thereof*

[0031] The optics block 230 magnifies image light received from the electronic display 220, corrects optical aberrations associated with the image light, and the corrected image light is presented to a user of the HMD 100. At least one optical element of the optics block 230 may be an aperture, a Fresnel lens, a refractive lens, a reflective surface, a diffractive element, a waveguide, a filter, or any other suitable optical element that affects the image light emitted from the electronic display 220. Moreover, the optics block 230 may include combinations of different optical elements. In some embodiments, one or more of the optical elements in the optics block 230 may have one or more coatings, such as anti-reflective coatings, dichroic coatings, etc. Magnification of the image light by the optics block 230 allows elements of the electronic display 220 to be physically smaller, weigh less, and consume less power than larger displays. Additionally, magnification may increase a field-of-view of the displayed media. For example, the field-of-view of the displayed media is such that the displayed media is presented using almost all (e.g., 110 degrees diagonal), and in some cases all, of the field-of-view. Additionally, in some embodiments, the amount of magnification may be adjusted by adding or removing optical elements.

[0032] FIG. 3 is a diagram of operation of a conventional structured light based depth determination device 300, in accordance with one or more embodiments. In a conventional structured light based depth determination device, a structured light source 302 emits structured light into an environment. The structured light has an encoded structured pattern and may be pulsed at a pulse frequency. The structured light is projected into an environment, and may reflect off of a surface or any three dimensional object in the environment back towards the sensor 304. Any surface or three dimensional object in the environment distorts the output pattern from the structured light source 302. Using a triangulation calculation, a controller (not shown) that receives information from the sensor 304 can compare the distorted pattern to the emitted pattern to determine a distance R 316 of an object in the environment from the sensor 304.

[0033] The triangulation calculation relies on the following relationship:

R = B sin ( .theta. ) sin ( .alpha. + .theta. ) ( 1 ) ##EQU00001##

where R is the distance R 316 of an object from the sensor 304, B is the baseline 306 distance from the structured light source 302 to the sensor 304, 0 is 0 314 the angle between the projected light and the baseline 306, and a is a 312 the angle between the reflected light off of an object and the sensor surface 304. The baseline distance 306 B and the emitted light angle 0 314 are fixed and defined by the structure of the structured light based depth determination device and the encoded structured light. To determine a 312, a controller compares the 2-dimensional image of pixel intensities to the known structured pattern to identify the originating pattern from the structured light source 302. In a conventional structured light based depth determination device, this process would entail a full epipolar code search 308 across the full range of the structured light encoding. Following determining the value of a 312, the controller carries out a triangulation calculation using the relationship (1).

[0034] This conventional method of determining the location of an object in the environment has a number of drawbacks. The full epipolar code search 308 may require longer computational time, thus increasing the time between output light from the structured light source 302 and a controller registering the presence and location of an object in the environment. This delay may be noticeable in an example in which the conventional structure light based depth determination device is used in an artificial reality system, since determination of the location of an object may be a step in displaying virtual content to a user, leading to a visual lag in the displayed image. Additionally, the conventional structured light based depth determination device has a range limit 310 that is defined by the baseline 306 distance from the structured light source 302 to the sensor 304. The longer the distance of the baseline 306 between the structured light source 302 to the sensor 304, the greater the depth range of the conventional structured light based depth determination device. In cases where the conventional structured light based depth determination device is incorporated into another device for which a form factor is important, this may lead to a range limit of a device or a size constraint on the device in order to achieve a large enough range. Furthermore, structured light contains patterned constellations of light surrounded by areas without illumination. This leads to a significant non-illuminated portion of the image, which is some cases may lead to underutilization of the pixels in a sensor, if the sensor 304 is a pixel-based sensor (e.g., less than 10% of a sensor array collects light from the active structured light source 302).

[0035] FIG. 4 is a diagram of a structured TOF depth sensor 400, in accordance with one or more embodiments. The structured TOF depth sensor 400 may be an embodiment of the DMA 210. In a structured TOF depth sensor 400, an illumination source 402 is combined with a TOF sensor 404 to leverage both a TOF calculation with a structured light encoding. The illumination source 402 may be the illumination source 170, while the TOF sensor 404 may be part of the camera assembly 180. The structured TOF depth sensor 400 also includes a controller 412, which may be the controller 216 of the DMA 210 as described in further detail with reference to FIG. 2. The combination of the structured light encoding with a TOF calculation allows for a reduced baseline 406 in comparison to the baseline 306 without sacrificing the depth of the sensing range. The structured TOF depth sensor 400 also reduces the computation associated with a code search, since a TOF calculation limits the full epipolar code search 308 to a TOF limited epipolar search 408. This is described in further detail below.

[0036] In some embodiments, the illumination source 402 emits structured light into an environment, such as the local area 260. The illumination source 402 may emit structured light at one or more pulse frequency rates. In some examples, the illumination source 402 sequentially emits structured light at different temporal pulsing frequency rates. This is described in further detail with reference to FIGS. 5-9. In some embodiments, the illumination source 402 emits any structured light pattern, such as a symmetric or quasi-random dot pattern, grid, horizontal bars, a periodic structure, or any other pattern. The structured light is projected into an environment, and may reflect off of a surface or any three dimensional object in the environment. The reflected structured light is then directed from the object back towards the sensor 404. In some embodiments, the illumination source 402 or any other light source described herein emits structured light simultaneously and in addition to a uniform flood illumination. Thus the illumination source 402 may emit both structured light and uniform flood illumination. In other embodiments, the illumination source 402 or any other light source described herein may emit structured light, and a second light source emits uniform flood illumination.

[0037] The sensor 404 may be a fast photodiode array, or any other TOF sensor with a two-dimensional pixel array. The sensor 404 may be one of the sensors located in the camera assembly 180. The controller 412 determines, from information provided by the sensor 404, a time that light has taken to travel from the illumination source 402 to the object in the environment and back to the sensor 404 plane. This may be determined by accumulating charge at a pixel associated with different phases of reflected light. The pixel information is conveyed to the controller 412, which then performs a TOF phase shift calculation to generate estimated radial depths of an object in a local area. In some examples, the sensor 404 may measure different sets of phase shifts for different output pulse frequencies of the illumination source 402 during different exposure windows. This is described in further detail with reference to FIGS. 5-9. Unlike the sensor 304 in a conventional structured light based depth determination device, the sensor 404 thus measures a plurality of phase shifts of the structured light source, rather than accumulating charge for computing a triangulation measurement.

[0038] Referring to FIG. 4, a controller 412 causes an illumination source 402 to emits pulsed structured light two different pulsing frequencies (e.g., 40 MHz and 100 MHz). A sensor 404 captures reflected pulses, and the controller determines a set of possible distances using the captured data and TOF depth determination techniques.

[0039] The TOF measurement of the illumination source 402 produced from the plurality of phase shifts detected by the sensor 404 may not be fully disambiguated. For example, using a single temporal pulsing frequency, the TOF measurement of the controller 412 may produce several depth estimates that each result from a a ambiguity in the TOF calculation. Thus the TOF measurement may result in a plurality of phase shift estimates that each are possible solutions to a TOF calculation and are separated from each other by a factor of 2n. Each of the plurality of phase shift estimates results in a different depth measurement of an object. This is shown in FIG. 4 as the phase estimate 410a, phase estimate 410b and phase estimate 410c (collectively 410). Estimates 410 define discrete regions of possible radial depth of a detected object in an environment.

[0040] To distinguish between the depth estimates produced from the TOF calculation, the controller 412 uses depth information from structured light in at least one of the images captured by the sensor 404. Thus the controller 412 may compare the image produced by the sensor 404 to the encoding of the illumination source 402 pattern. The controller 412 may be the controller 216 as shown in FIG. 2. This may be done by the controller 412 using a lookup table (LUT) containing the structured light encoding. Thus instead of a full epipolar code search 308, the controller 412 performs a TOF limited epipolar search 408 in the regions of the estimates made with a TOF calculation. By comparing the image from the sensor 404 to the structured light encoding, the controller 412 disambiguates the TOF estimate and selects one of the phase estimates as the correct phase and the corresponding correct radial distance from a set of radial depth estimates. Note that the use of TOF along with SL allows for quick determination of depth, and can use a relatively small baseline (as an accuracy of the SL only has to be enough to disambiguate the more accurate TOF measurements. Accordingly, the structured TOF depth sensor allows for a smaller baseline 406 in comparison to the baseline 306. In some examples, the baseline 406 may be 50 mm or less (e.g., 10 mm).

[0041] FIG. 5 is a portion of a phase map 500 of a structured TOF depth sensor, in accordance with one or more embodiments. In some examples, the phase map 500 is produced from a structured TOF depth sensor, as described in FIG. 4. Thus the phase map 500 may be detected by the sensor 404 as described in further detail with reference to FIG. 4. The phase map 500 shows the disambiguated depth estimates produced from a TOF calculation following emission of structured light at two different pulse frequencies.

[0042] A structured light source may project structured light at a first phase shift frequency 502 into an environment. The first phase shift frequency 502 is a phase shift between 0 and 2 .pi. that corresponds to a first temporal frequency at which pulses are output (e.g. typical ranges are .about.1-350 MHz, but could possibly go even higher, e.g., up to 1 GHz). The structured light source may then project structured light at a second phase shift frequency 504 that is different from the first phase shift frequency 502. The second phase shift frequency 504 is a phase shift between 0 and 2.pi. that corresponds to a second temporal frequency at which pulses are output. For example, the structured light projector may output pulses at 10 MHz and may also emit pulses at 50 MHz. In some examples, the light emitted into a local environment at the first phase shift frequency 502 may be structured light, whereas the light emitted into a local environment at a second phase shift frequency 504 may be uniform flood light or any non-encoded light. The projection of light at the first phase shift frequency 502 and the second phase shift frequency 504 may be at different times, and correspond to different exposures windows of a sensor (e.g., the sensor 404). Timing of the structured light projection and sensing windows is described in further detail with reference to FIGS. 6A-7.

[0043] The phase map 500 shows the ambiguity in the TOF calculation. The y-axis shows the radial distance 506. The phase map 500 represents the detection of an object in an environment at a distance from a structured TOF depth sensor. The set of detected ranges, 508a, 508b, 508c, and 508d (collectively 508) each represent phase-wrapped solutions to a TOF calculation and correspond to a set of estimated radial distances based on a phase shift detected by a sensor. Note what is illustrated is really only a portion of the phase map 500 as there would be extra sets of detected ranges proceeding out to infinity (note in practice, the range may be limited by the amount of light emitted into the scene and the reflectivity of the objects being imaged) which are omitted for ease of illustration. The set of detected ranges 508 are referred to herein as estimated radial distances. Thus the solutions in the detected range 508a are separated by the detected range 508b by a a phase ambiguity as described in further detail above. For example, each of the detected ranges 508 may correspond to the regions shown in FIG. 4 of the TOF limited epipolar search 408 and the phase estimates 410a, 410b and 410c. Using the phase map 500, a controller compares the detected ranges 508 to a structured light encoding. The controller may be the controller 412. The structured light encoding may be stored in a look up table (LUT). Thus instead of a full epipolar code search 308, a controller performs a TOF limited epipolar search 408 in the regions of the estimates made with a TOF calculation. In some embodiments, based on a comparison between the detected ranges 508 and a LUT, the controller selects one of the detected ranges 508. The controller then performs a triangulation calculation using the relationship (1) to produce a triangulation depth estimate. In some embodiments, a controller may divide a local area illuminated by an illumination source into a number of different regions. In some embodiments, the controller identifies a corresponding region of estimated radial distances from a TOF calculation to region of triangulation depth estimates. The controller thus matches regions of TOF calculations to regions of triangulation depth estimates. In some embodiments, the controller then selects the radial depth estimate that is within a threshold distance of the triangulation depth estimate. In some embodiments, the controller selects the estimated radial distance based in part on a LUT and the second estimated radial distance. In some embodiments, the controller selects the estimated radial distance using machine learning. In some embodiments, in regions without structured light illumination or triangulation depth estimates, a controller may back-fill estimated radial distances from TOF calculations and/or interpolate between regions. In terms of interpolation, in some embodiments a uniform illumination pattern is modulated at some regions with brighter spots (e.g. dots) or null spots (e.g. dark dots–inverse dot pattern). In this scenario, nearly all pixels would have TOF information, thus increasing their utility, whereas only a subset would have SL information. But the SL information could be used to locally disambiguate neighboring regions of TOF estimates via, for example, local interpolation. Thus one of the estimated ranges in the detected ranges 508 is selected as the true distance of an object by comparing the results of the triangulation calculation to the detected ranges.

[0044] Combining the TOF calculation based on a phase shift with a triangulation calculation based on the structured light encoding allows for disambiguation of the TOF phase shift solutions shown in the phase map 500 without the need for detection of an object with additional output light frequencies. Thus the total number of TOF captures within a limited exposure window of a sensor can be reduced, which is discussed in further detail below. The structured light TOF sensor also allows for reduced accuracy in a triangulation structured light calculation, since the structured light estimate may only need to be precise enough to disambiguate between the solutions of the TOF calculation, i.e., between each of the detected ranges 508, rather than across a full depth range, as shown in FIG. 4. This also allows for a substantial decrease in the baseline distance between a structured light source and a sensor (such as the baseline 406) without sacrificing the detection capabilities of the structured light TOF sensor. A reduced baseline may allow for a smaller form factor of an HMD or any other device in which a structured TOF sensor is incorporated. In addition, the complexity of the structured light computational algorithm can be reduced since lower accuracy and precision is required. In some embodiments, the accuracy of the structured light estimation may be in the range of 0.5 to 3 meters.