This specification describes support for real-time 3D hand motion tracking by using depth camera.

This document was published by the Crosswalk Project as an API Draft. If you wish to make comments regarding this document, please send them to crosswalk-dev@lists.crosswalk-project.org. All comments are welcome.

Introduction

The APIs described in this document are exposed through realsense.HandTracking module.

Hand Tracking

The hand tracking capability can be accessed via HandModule interface. It allows to control, configure the hand tracking and retrieve the hand information.

Hand Model

The tracked hand provides full 3D skeleton information including all 22 joints and fingers data as following diagram illustrated.

The tracked hand model can be accessed via Hand interface.

Interfaces

HandModule

The HandModule is the main interface for hand tracking.

Promise<void> init()

The init() method initializes the hand tracking.

This method returns a promise. The promise will be fulfilled if there are no errors. The promise will be rejected with the DOMException object defined in [[!WEBIDL]] if there is a failure.

Promise<ImageSize> start()

The start() method starts the hand tracking.

This method returns a promise. The promise will be fulfilled with the image size if there are no errors. The promise will be rejected with the DOMException object if there is a failure.

Promise<void> stop()

The stop() method stops the hand tracking.

This method returns a promise. The promise will be fulfilled if there are no errors. The promise will be rejected with the DOMException object if there is a failure.

Promise<sequence<Hand>> track()

The track() method tracks the hands.

This method returns a promise. The promise will be fulfilled with an array of tracked hands if there are no errors. The promise will be rejected with the DOMException object if there is a failure.

Promise<Image> getDepthImage()

The getDepthImage() method gets the latest processed depth image.

This method returns a promise. The promise will be fulfilled with the processed depth image if there are no errors. The promise will be rejected with the DOMException object if there is a failure.

Hand

The Hand is the interface of the tracked hand.

readonly attribute long uniqueId

The unique ID of given tracked hand.

readonly attribute long long timeStamp

The time-stamp in which the collection of the hand data was completed.

readonly attribute boolean calibrated

true if there is a valid hand calibration, otherwise false.

readonly attribute BodySide bodySide

The side of the body to which the hand belongs (when known).

readonly attribute Rect boundingBoxImage

The location and dimensions of the tracked hand, represented by a 2D bounding box (defined in pixels).

readonly attribute Point2D massCenterImage

The 2D center of mass of the hand in image space (in pixels).

readonly attribute Point3D massCenterWorld

The 3D center of mass of the hand in world space (in meters).

readonly attribute Point4D palmOrientation

The quaternion representing the global 3D orientation of the palm.

readonly attribute double palmRadiusImage

The palm radius in image space (number of pixels).

readonly attribute double palmRadiusWorld

The palm radius in world space (meters).

readonly attribute ExtremityDataPoints extremityPoints

The extremity points of tracked hand.

readonly attribute Fingers fingerData

The finger data of tracked hand.

readonly attribute Joints trackedJoints

The joints data of tracked hand.

readonly attribute TrackingStatus trackingStatus

The tracking status.

readonly attribute long openness

The degree of openness of the hand.

readonly attribute Joints normalizedJoints

The normalized joints data of tracked hand.

Promise<Image> getSegmentationImage()

The getSegmentationImage() method retrieves the 2D image mask of the tracked hand.

This method returns a promise. The promise will be fulfilled with the mask image if there are no errors. The promise will be rejected with the DOMException object defined in [[!WEBIDL]] if there is a failure.

Promise<sequence<Contour>> getContours()

The getContours() method retrieves the contours of the tracked hand.

This method returns a promise. The promise will be fulfilled with the array of contours if there are no errors. The promise will be rejected with the DOMException object defined in [[!WEBIDL]] if there is a failure.

Dictionaries

Image

PixelFormat format
long width
long height
ArrayBuffer data

Point2D

double x

The x coordinate of the point.

double y

The y coordinate of the point.

Point3D

double x

The x coordinate of the point.

double y

The y coordinate of the point.

double z

The z coordinate of the point.

Point4D

double x

The x coordinate of the point.

double y

The y coordinate of the point.

double z

The z coordinate of the point.

double w

The w coordinate of the point.

Rect

unsigned long x

The horizontal coordinate of the top left pixel of the rectangle.

unsigned long y

The vertical coordinate of the top left pixel of the rectangle.

unsigned long w

The rectangle width in pixels.

unsigned long h

The rectangle height in pixels.

ImageSize

unsigned long width

The rectangle width in pixels.

unsigned long height

The rectangle height in pixels.

ExtremityData

Point3D pointWorld

The 3D world coordinates of the extremity point.

Point2D pointImage

The 2D image coordinates of the extremity point.

ExtremityDataPoints

ExtremityData closest

The closest point to the camera in the tracked hand.

ExtremityData leftmost

The left-most point of the tracked hand.

ExtremityData rightmost

The right-most point of the tracked hand.

ExtremityData topmost

The top-most point of the tracked hand.

ExtremityData bottommost

The bottom-most point of the tracked hand.

ExtremityData center

The center point of the tracked hand.

FingerData

long foldedness

The degree of "foldedness" of the tracked finger, ranging from 0 (least folded / straight) to 100 (most folded).

long radius

The radius of the tracked fingertip. The default value is 0.017m while the hand is not calibrated.

Fingers

FingerData thumb

The finger data of thumb finger.

FingerData index

The finger data of index finger.

FingerData middle

The finger data of middle finger.

FingerData ring

The finger data of ring finger.

FingerData pinky

The finger data of pinky finger.

JointData

long confidence

RESERVED: for future confidence score feature

Point3D positionWorld

The geometric position in 3D world coordinates, in meters.

Point3D positionImage

The geometric position in 2D image coordinates, in pixels. (Note: the Z coordinate is the point's depth in millimeters.)

Point4D localRotation

A quaternion representing the local 3D orientation of the joint, relative to its parent joint.

Point4D globalOrientation

A quaternion representing the global 3D orientation, relative to the "world" y axis.

Point3D speed

The speed of the joints in 3D world coordinates (X speed, Y speed, Z speed, in meters/second).

FingerJoints

JointData base

The joint data of finger base.

JointData joint1

The data of finger joint1.

JointData joint2

The data of finger joint2;

JointData tip

The joint data of finger tip.

Joints

JointData wrist

The center of wrist.

JointData center

The center of the palm.

FingerJoints thumb

The joints of thumb finger.

FingerJoints index

The joints of index finger.

FingerJoints middle

The joints of middle finger.

FingerJoints ring

The joints of ring finger.

FingerJoints pinky

The joints of pinky finger.

Enumerators

PixelFormat

depth

The depth map data in 16-bit unsigned integer. The values indicate the distance from an object to the camera's XY plane or the Cartesian depth.The value precision is in millimeters.

y8

The 8-bit gray format.

BodySide

unknown

The side was not determined.

left

Left side of the body.

right

Right side of the body.

TrackingStatus

good

Optimal tracking conditions.

out-of-fov

The hand is outside the field of view (in the x/y axis).

out-of-range

The hand is outside the depth range.

high-speed

The hand is moving at high speed.

pointing-fingers

The hand fingers pointing the camera.

Examples

Tracking Hands


var handModule;
var stopped = false;
var startButton = document.getElementById('start');
var stopButton = document.getElementById('stop');

function handleError(error) {
  // The error is an instance of DOMException.
  console.log(error.name + ': ' + error.message);
}

function trackHands() {
  handModule.track().then(
      function(hands) {
        // The hands argument is an array of Hand objects.

        // ... process hands ...

        // Continue to track hands.
        if (!stopped)
          trackHands();
      },
      handleError
  );
}

function main() {
  try {
    // Create an instance of HandModule interface.
    // It is the main interface of hand tracking capability.
    handModule = new realsense.Hand.HandModule();
  } catch (e) {
    console.log(e);
  }

  handModule.init().then(
      function() {
        console.log('Succeed to init.');
        startButton.disabled = false;
        stopButton.disabled = false;
      },
      handleError
  );

  startButton.onclick = function(e) {
    handModule.start().then(
        function(imageSize) {
          // The imageSize argument is an ImageSize object.
          console.log('Succeed to start.')
          stopped = false;
          trackHands();
        },
        handleError
    );
  };

  stopButton.onclick = function(e) {
    stopped = true;
    handModule.stop().then(
        function() {
          console.log('Succeed to stop.');
        },
        handleError
    );
  };
}
        

Acknowledgments