This document describes support for creating a digital representation of the observed environment and estimating in real-time the camera pose by leveraging the 3D camera.

This document was published by the Crosswalk Project as an API Draft. If you wish to make comments regarding this document, please send them to crosswalk-dev@lists.crosswalk-project.org. All comments are welcome.

Introduction

Interfaces

ScenePerception

The ScenePerception interface provides methods to track and scan scenes for augmented reality applications. The ScenePerception interface is exposed through realsense module.

Promise<void> init(optional InitialConfiguration config)
Initialize the Scene Perception module with user-defined configuration.
optional InitialConfiguration config
The initial user defined configuration which will be passed to Scene Perception module.
Promise<void> start()
Start scene perception. This method should be called after the init get called and successfully returned, otherwise it will fail with error.
Promise<void> stop()
Stop scene perception. Calling this method will terminate the scene perception procedure. If scene perception has not been started, it will take no effect.
Promise<void> reset()
Reset scene perception module to its initial state.
Promise<void> destroy()
Destroy the scene perception object. After calling this method the scene perception object will no longer being valid, you need to call init to re-initialize it if you want to use again.
Promise<void> enableReconstruction(boolean enable)
Allows user to enable/disable integration of upcoming camera stream into 3D volume. If disabled the volume will not be updated. However scene perception will still keep tracking the camera. This is a control parameter which can be updated before passing every frame to the module.
boolean enable
Flag to enable/disable reconstruction.
Promise<void> enableRelocalization(boolean enable)
Allows user to enable/disable re-localization feature of scene perception's camera tracking. By default re-localization is enabled. This functionality is only available after init is called.
boolean enable
Flag to enable/disable re-localization.
Promise<boolean> isReconstructionEnabled()
Allows user to check whether integration of upcoming camera stream into 3D volume is enabled or disabled.
Promise<Sample> getSample()
Allows user to access the surface's captured sample that are within view from camera's current pose asynchronously.
Promise<Vertices> getVertices()
Allows user to access the surface's vertices that are within view from camera's current pose asynchronously.
Promise<Normals> getNormals()
Allows user to access normals of surface that are within view from the camera's current pose asynchronously.
Promise<Image> queryVolumePreview(sequence<float> cameraPose)
Allows user to access 2D projection image of reconstructed volume from a given camera pose by ray-casting.
This function is optimized for real time performance. It is also useful for visualizing progress of the scene reconstruction.
optional sequence<float> cameraPose
This is a sequence of 12 float that stores the camera pose, which user wishes to set in row-major order. Camera pose is specified in a 3 by 4 matrix:
[R | T] = [Rotation Matrix | Translation Vector]
where R = [ r11 r12 r13 ]
[ r21 r22 r23 ]
[ r31 r32 r33 ]
T = [ tx ty tz ]
Camera pose sequense layout should be: [r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz]
Translation vector is in meters.
Promise<VolumePreviewData> getVolumePreview(sequence<float> cameraPose)
Returns the volume datails as a 2D projection image by reconstructing the volume with ray-casting, surface volume normals and surface volume faces for the specified camera pose.

This method returns a promise.
The promise will be fulfilled with the VolumePreviewData instance if there are no errors.
The promise will be rejected with the DOMException object if there is a failure.

optional sequence<float> cameraPose
This is a sequence of 12 float that stores the camera pose, which user wishes to set in row-major order. Camera pose is specified in a 3 by 4 matrix:
[R | T] = [Rotation Matrix | Translation Vector]
where R = [ r11 r12 r13 ]
[ r21 r22 r23 ]
[ r31 r32 r33 ]
T = [ tx ty tz ]
Camera pose sequense layout should be: [r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz]
Translation vector is in meters.
Promise<VoxelResolution> getVoxelResolution()
To get voxel resolution used by the scene perception module.
Promise<float> getVoxelSize()
Allows user to get length of side of voxel cube in meters.
Promise<CameraIntrinsics> getInternalCameraIntrinsics()
Allows user to get the module working intrinsic parameters.

This method returns a promise.
The promise will be fulfilled with the CameraIntrinsics instance if there are no errors.
The promise will be rejected with the error message if there is a failure.

Promise<MeshingThresholds> getMeshingThresholds()
Allows user to get meshing thresholds used by scene perception.
Promise<MeshingResolution> getMeshingResolution()
Allows user to get meshing resolution.
Promise<MeshData> getMeshData()
Allows user to retrieve mesh data.
Promise<SurfaceVoxelsData> getSurfaceVoxels(optional InterestRegion region)

The getSurfaceVoxels function exports the centers of the voxels intersected by the surface scanned.
The voxels size is set based on the resolution of the color images.
Optionally, you can specify the region of interested bounding box.

The getSurfaceVoxels function returns promise.
The promise will be fulfilled with surface voxels in batches if there are no errors, the dataPending attribute in the result will be true when there are remaining surface voxels.
Please call the function again until the dataPending flag returns false.
The promise will be rejected if there is a failure.

If useColor was set to false by calling configureSurfaceVoxelsData function before getSurfaceVoxels function, the surfaceVoxelsColor member of the returned result from getSurfaceVoxels function will be null.

optional InterestRegion region
The optional region of interest by specifying the lower left and upper right of the region of interest bounding box.
Promise<Blob> saveMesh(optional SaveMeshInfo info)

Save the mesh data of the volume to an ASCII file. It will return a blob with 'text/plain' type.

optional SaveMeshInfo info
Information that needs to save mesh data.
Promise<void> setMeshingResolution(MeshingResolution resolution)
Allows user to set meshing resolution.
MeshingResolution resolution
Mesh resolution user wishes to set.
Promise<void> setMeshingThresholds(MeshingThresholds thresholds)
Is an optional function meant for expert users. It allows user to set meshing thresholds.
The values set by this function will be used by succeeding calls to getMeshData(). Set the thresholds indicating the magnitude of changes occurring in any block that would be considered significant for re-meshing.
MeshingThresholds thresholds
Thresholds information that user wants to set.
Promise<void> setCameraPose(sequence<float> cameraPose)
Allows user to enforce the supplied pose as the camera pose. The module will track the camera from this pose when the next frame is passed. this function can be called any time after module finishes processing first frame or any time after module successfully processes the first frame post a call to reset scene perception.
sequence<float> cameraPose
This is a sequence of 12 float that stores the camera pose, which user wishes to set in row-major order. Camera pose is specified in a 3 by 4 matrix:

[R | T] = [Rotation Matrix | Translation Vector]
where R = [ r11 r12 r13 ]
[ r21 r22 r23 ]
[ r31 r32 r33 ]
T = [ tx ty tz ]

Camera pose sequense layout should be: [r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz]
Translation vector is in meters.
Promise<void> setMeshingUpdateConfigs(MeshingUpdateConfigs config)
Allows user to set meshing update configurations.
MeshingUpdateConfigs config
Argument to indicate which mesh data you wish to use.
Promise<void> configureSurfaceVoxelsData(VoxelsDataConfig config)

This interface sets the configurations of getSurfaceVoxels function.

VoxelsDataConfig config
Argument to indicate which configuration for surface's voxels data you wish to use.
Promise<void> setMeshingRegion(InterestRegion region)
The setMeshingRegion function sets the meshing region of interest for the getMeshData function.
The region of interest is specified as a bounding box.
InterestRegion region
The region of interest by specifying the lower left and upper right of the region of interest bounding box.
Promise<void> clearMeshingRegion()
The clearMeshingRegion function removes any previously set meshing region of insterest for the getMeshData function.
attribute EventHandler onchecking

A property used to set the EventHandler (described in [[!HTML]]) for the CheckingEvent that is dispatched to ScenePerception when a frame is checked.

attribute EventHandler onerror

A property used to set the EventHandler (described in [[!HTML]]) for the ErrorEvent that is dispatched to ScenePerception when there is an error.

attribute EventHandler onmeshupdated

A property used to set the EventHandler (described in [[!HTML]]) for the Event that is dispatched to ScenePerception when mesh is updated.

attribute EventHandler onsampleprocessed

A property used to set the EventHandler (described in [[!HTML]]) for the SampleProcessedEvent that is dispatched to ScenePerception when mesh is updated.

CheckingEvent

readonly attribute float quality

a positive value between 0 and 1 to indicate how good is scene for starting, tracking or resetting scene perception.
1.0 -> represents ideal scene for starting scene perception.
0.0 -> represents unsuitable scene for starting scene perception.

a negative value to indicate potential reasons of a tracking failure.
-1.0 -> represents the scene lacks of structural/geometry information.
-2.0 -> represents The scene lacks enough depth data (too far away from or close to the camera.)

SampleProcessedEvent

readonly attribute float quality

a positive value between 0 and 1 to indicate how good is scene for starting, tracking or resetting scene perception.
1.0 -> represents ideal scene for starting scene perception.
0.0 -> represents unsuitable scene for starting scene perception.

a negative value to indicate potential reasons of a tracking failure.
-1.0 -> represents the scene lacks of structural/geometry information.
-2.0 -> represents The scene lacks enough depth data (too far away from or close to the camera.)

readonly attribute TrackingAccuracy accuracy
readonly attribute float[] cameraPose

Dictionaries

BlockMesh

unsigned long meshId
Unique ID to identify each BlockMesh object.
unsigned long vertexStartIndex
Starting index of the vertex inside vertex buffer.
unsigned long numVertices
Total number of vertices inside this BlockMesh object.
unsigned long faceStartIndex
Starting index of the face list in a MeshFaces buffer.
unsigned long numFaces
Number of faces forming the mesh inside this BlockMesh object.

ImageSize

unsigned long width
Width of the image size in pixels.
unsigned long height
Height of the image size in pixels.

Image

PixelFormat format
Describe the image sample pixel format.
unsigned long width
Width of the image in pixels.
unsigned long height
Height of the image in pixels.
ArrayBuffer data
Represents the image data.

InitialConfiguration

boolean? useOpenCVCoordinateSystem;
Indicates whether to use OpenCV coordinate system or not. Default value is false.
VoxelResolution? voxelResolution;
The initial voxel resolution. The voxelResolution is locked when init is called, afterwards it will remains same throughout the runtime of scene perception module. The default value of voxel resolution is LOW_RESOLUTION.
sequence<float>? initialCameraPose;
This is a sequence of 12 float that stores the camera pose, which user wishes to set in row-major order. Camera pose is specified in a 3 by 4 matrix:

[R | T] = [Rotation Matrix | Translation Vector]
where R = [ r11 r12 r13 ]
[ r21 r22 r23 ]
[ r31 r32 r33 ]
T = [ tx ty tz ]

Camera pose sequense layout should be: [r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz]
Translation vector is in meters.
MeshingThresholds? meshingThresholds;
The meshing threshold of scene perception.
ImageSize? colorImageSize;
Indicates the captured color image size in pixel, default is 320 x 240.
ImageSize? depthImageSize;
Indicates the captured depth image size in pixel, default value is 320 x 240.
float? captureFramerate;
Indicates the capture frame rate, default value is 60fps.

InterestRegion

Point3D lowerLeftFrontPoint;
The lower left point of the interest region.
Point3D upperRightRearPoint;
The upper right point of the interest region.

MeshData

sequence<BlockMesh> blockMeshes
Sequence of BlockMesh objects.
unsigned long numberOfVertices;
Represents number of vertices present in the MeshData buffer.
Float32Array vertices;
Represents an array of float points with length 4*numberOfVertices. Each vertex consists of 4 float points: (x, y, z) coordinates in meter unit + a confidence value. This confidence value is in the range [0, 1] indicating how confident scene perception is about the presence of the vertex.
Uint8Array colors;
Represents the array of colors with length 3*numberOfVertices. There are three color channels(RGB) per vertex.
unsigned long numberOfFaces;
Represent the number of faces in the buffer.
Int32Array faces;
Represents an array of faces forming the mesh (3 indices per triangle) valid range is from [0, 3*numberOfFaces].

MeshingThresholds

float max
Represents the maximum threshold of meshing. If the maximum change in a block exceeds this value, then the block will be re-meshed. Setting this value to zero will retrieve all blocks.
float avg
Represents the average threshold of meshing. If the average change in a block exceeds this value, then the block will be re-mashed. Setting this value to zero will retrieve all blocks.

MeshingUpdateInfo

boolean countOfBlockMeshesRequired;
If set, on successful call this function will set number of block meshes available for meshing.
boolean blockMeshesRequired;
Can only be set to true if countOfBlockMeshesRequired is set to true otherwise this value will be ignored. If set to true, on successfull call to this function it will update block meshes array.
boolean countOfVerticesRequired;
If set, on successful call to this function it will set number of vertices available for meshing.
boolean verticesRequired;
Can only be set if countOfVerticesRequired is set to true otherwise the value is ignored. If set, on successful call to this function it will update vertices array.
boolean countOfFacesRequired;
If set, on successful call to this function it will set number of faces available for meshing.
boolean facesRequired;
Can only be set if countOfFacesRequired is set to true otherwise the value is ignored. If set, on successful call to this function it will update faces array.
boolean colorsRequired;
If set and MeshData was created with color, on success function will fill in colors array.

MeshingUpdateConfigs

boolean? fillHoles;
Indicates whether to fill holes in mesh blocks or not. If set, it will fill missing details in each mesh block that are visible from scene perception's camera current pose and completely surrounded by closed surface(holes) by smooth linear interpolation of adjacent mesh data.
MeshingUpdateInfo? updateInfo;
Argument to indicate which mesh data you wish to use.

Normals

Represents normal vector data.
unsigned long width
Width of the normal vector.
unsigned long height
Height of the normal vector.
Float32Array data
Sequence of the normal vector. Each normal object consists of 3 float value (x, y, z).

Point3D

float x;
The x coordinate of the point.
float y;
The y coordinate of the point.
float z;
The z coordinate of the point.

Sample

The captured sample that contains multiple streams.
Image color
Color image of the sample.
Image depth
Depth image of the sample.

SaveMeshInfo

boolean? fillMeshHoles;
Flag which indicates whether to fill holes in saved mesh.
boolean? saveMeshColor;
Flag which indicates whether you wish to save mesh color.
MeshingResolution? meshResolution;
Indicates resolution for mesh to be saved.

SurfaceVoxelsData

boolean dataPending;
Indicates whether there is remaining surface voxels data.
Float32Buffer centerOfSurfaceVoxels
The array of center of surface voxels.
unsigned long numberOfSurfaceVoxels;
The number of surface voxels.
Uint8Array surfaceVoxelsColor;
The array of color channels for voxels which contains the three RGB channels for each voxel. The data type of this array is Byte and the length is 3 * numberOfSurfaceVoxels.

VolumePreviewData

Represents the volume preview data, including volume preview image dimensions, volume prview image data, vertices and normals.
unsigned long width
Width of the volume preview data.
unsigned long height
Height of the volume preview data.
Uint32Array imageData
Raw data of the projected volume image. Each pixel is a 4 byte RGBA data (a, r, g, b).
Float32Array vertices
Raw data of a sequence of vertices. Each vertex consists of 3 float values (x, y, z).
Float32Array normals
Raw data of a sequence of normals. Each normal consists of 3 float values (x, y, z).

Vertices

Represents vertices data.
unsigned long width
Width of the vertices buffer.
unsigned long height
Height of the vertices buffer.
Float32Array data
Raw data of a sequence of vertices. Each vertex consists of 3 float values (x, y, z).

VoxelsDataConfig

unsigned long voxelCount;
The maximum number of voxels you can get by calling getSurfaceVoxels function.
boolean useColor;
Indicate if color should be returned per voxel when you call getSurfaceVoxels function.

CameraIntrinsics

ImageSize imageSize
The image size in pixels.
Point2D focalLength
The color focal length in pixels.
Point2D principalPoint;
The color focal length in pixels.

Point2D

float x;
The x coordinate of the image pixel.
float y;
The y coordinate of the image pixel.

Enumerators

MeshingResolution

high

The high mesh resolution.

med

The median mesh resolution.

low

The low mesh resolution.

TrackingAccuracy

high

The high tracking accuracy.

med

The median tracking accuracy.

low

The low tracking accuracy.

failed

The tracking is failed.

VoxelResolution

high

The high voxel resolution. Use this resolution in a object-sized scenario (1/256m).

med

The median voxel resolution. Use this resolution in a table-top-sized scenario (2/256m).

low

The low voxel resolution. Use this resolution in a room-sized scenario (4/256m).

PixelFormat

rgba32

The 32-bit RGBA32 color format. When format of an Image instance is set to rgba32, the data of that image instance must follow the bytes layout of Canvas Pixel ArrayBuffer defined in [[!CANVAS-2D]].

depth

The depth map data in 16-bit unsigned integer. The values indicate the distance from an object to the camera's XY plane or the Cartesian depth.The value precision is in millimeters.

Examples

Configure and Control Scene Perception(SP) module

          // SP states and changing conditions.
          // -- IDLE
          // Before the pipeline is initialized.

          // -- INITIALIZED
          // Pipeline is initialized, SP module is paused.
          // Possible event in this state:
          //     checking event(depth quality)

          // -- STARTED
          // SP module starts to work, receiving raw smaples and reconstructing the
          // volume space according to them.
          // Typically this state is triggerred when the depth quality
          // is acceptable.
          // SP module is on tracking, and on reconstructing if reconstruction
          // flag enabled.
          // Possible events in this state:
          //     sampleProcessed event(quality, tracking accuracy, camera pose)
          //     meshupdated event(no data)

          // Interfaces get/queryXXX are used to get processed data from SP module.
          // They are availiable after the module being initialized.
          // Although, they can be accessed in both "INITIALIZED" and "TARTED" states,
          // they may return unavaliable data if the SP module haven't successfully
          // established the co-ordinate system from the first frame.
          //
          //     getSample(processed sample including color/depth image)
          //     getMeshData
          //     get/query[XXX] interfaces to get configurations or data from SP module.

          // The changing diagram.
          // --sp.destory() from other states--> "IDEL" --sp.init()--> "INITIALIZED"
          // "INITIALIZED" --sp.start()--> "STARTED"
          // "STARTED" --sp.stop()--> "INITIALIZED"

          // This example will show the skeleton frame to init and control SP module.

          var sp = realsense.ScenePerception;

          var initButton = document.getElementById('init');
          var destroyButton = document.getElementById('destroy');
          var startButton = document.getElementById('startSP');
          var stopButton = document.getElementById('stopSP');

          // Set the initial state of buttons.
          resetButtonState(true);

          function resetButtonState(beforeStart) {
            initButton.disabled = !beforeStart;
            destroyButton.disabled = beforeStart;
            startButton.disabled = beforeStart;
            stopButton.disabled = true;
          }

          initButton.onclick = function(e) {
            // Please refer to InitialConfiguration for more info.
            var initConfig = {
              useOpenCVCoordinateSystem: true,
              colorCaptureSize: {'width':320, 'height':240},
              depthCaptureSize: {'width':320, 'height':240},
              captureFramerate: 60
            };
            sp.init(initConfig).then(function() {
              // Set the state of buttons.
              resetButtonState(false);

              // Other initialization work can be done here.
              console.log('init succeeds');
            }, function(e) {console.log(e);});
          };

          startButton.onclick = function(e) {
            sp.start().then(function() {
              startButton.disabled = true;
              stopButton.disabled = false;
              console.log('SP started successfully');
            }, function(e) {console.log(e);});
          };

          stopButton.onclick = function(e) {
            sp.stop().then(function() {
              console.log('SP stops working.');
              startButton.disabled = false;
              stopButton.disabled = true;
            }, function(e) {console.log(e);});
          };

          destroyButton.onclick = function(e) {
            sp.destroy().then(function() {
              console.log('stop succeeds');
              resetButtonState(true);
            }, function(e) {console.log(e);});
          };
        

Handle events of SP module

          var qualityElement = document.getElementById('quality');
          var accuracyElement = document.getElementById('accuracy');

          sp.onchecking = function(e) {
            var quality = e.data.quality;
            qualityElement.innerHTML = 'Quality: ' + quality.toFixed(2);
          };

          // Please refer to SampleProcessedEvent inferface for event data.
          sp.onsampleprocessed = function(e) {
            accuracyElement.innerHTML = 'Accuracy: ' + e.data.accuracy;
            qualityElement.innerHTML = 'Quality: ' + e.data.quality.toFixed(2);

            sp.getSample().then(function(sample) {
              // The sample object contains color image and depth image.
              // The image data structure, which could possibly look like this:
              // sample.color = {
              //                  width: imageWidth,
              //                  height: imageHeight,
              //                  data: Uint8Array
              //                };
              // sample.depth = {
              //                  width: imageWidth,
              //                  height: imageHeight,
              //                  data: Uint16Array
              //                };
              // Please refer to getSample interface for more details.
            }, function(e) {console.log(e);});

            sp.queryVolumePreview(e.data.cameraPose).then(function(volumePreview) {
              // The volume preview image data structure is similar with color and depth image in sample.
              // volumePreview = {
              //                   width: imageWidth,
              //                   height: imageHeight,
              //                   data: Uint8Array
              //                 };
              // Please refer to VolumePreviewData interface for detail data format.
            });
          };

          sp.onmeshupdated = function(e) {
            thisObj = this;
            sp.getMeshData().then(function(meshes) {
              // In the meshes, there are information about vertices, faces and color.
              // Meshes can be used to reconstruct the scanned scene by WebGL.
              // Please refer to MeshData inferface for more details.
            }, function(e) {console.log(e);});
          };

          sp.onerror = function(e) {console.log(e););
        

Acknowledgments