Introduction to Kinect in Customer experience.
Trying clothes in clothing stores is usually a time consuming activity. Moreover, it
might not even be possible to try on clothes in the store, such as when ordering
clothes online.
It could be easier if one could see whether or not the clothes would fit without having
to take off one’s own clothes to try out the new ones, and without having to wait in a
long queue outside the fitting rooms. It is possible to order clothes on the internet
but you never know whether the selected garments fit until you try them on at home.
A size medium e.g. can differ from one brand to another and it is difficult to judge the
quality of the textile by seeing it in picture on the computer. A better scenario is to
see the clothes on one’s own body, but in order to save time this may now be done
without having to actually put them on. The alternative to a fully virtual try on is a
mixed reality where the body is real but the clothes are digital models shaped to fit
the individual body.
Here is where AI incorporated Virtual Trial rooms are being implemented. One of the
most innovative and efficient device used for this is called KINECT, a line of motion
sensing input devices produced by Microsoft.
The first problem addressed in the design of the Virtual trail room application is the
correct position of the user and virtual cloth models. Detection and skeletal tracking
of a user in a video stream can be implemented in several ways.
Recently, Shotton et al. have developed a real-time human pose recognition system
that predicts the 3D positions of body joints, using a single depth image without
visual tags.
While depth cameras are not conceptually new, Kinect has made such sensors
accessible to all.The quality of the depth sensing, given the low-cost and real time
nature of the device is compelling and has made the sensor instantly popular with the
researchers and enthusiasts alike.
How does Kinect work?
Kinect provides a depth sensor, an
RGB camera, an accelerometer, a
motor and a multi-array
microphone. And the PrimeSense
Chip is Kinect processing core.
The Virtual Trial room application
consists of an Interactive mirror(A
vertical TV Screen) an HD Camera and a
desktop computer.
a) Depth Sensing System
It consists of the IR laser
emitter and the IR camera. The
IR laser emitter creates a known
noisy pattern of structured IR
light.
The IR camera operates at 30
Hz and pushes images with
1200x960 pixels. This images
are down sampled to 640x480
pixels with 11-bits, which
provides 2048 levels of
sensitivity.
b) RGB Camera
The RGB camera, which operates at 30 Hz, can push images at 640x480 pixels with
8-bit per channel. Kinect also has the option to switch the camera to high resolution,
running at 10 fps at 1280x1024 pixels. The camera itself possesses a set of features
including automatic white balancing, black reference, flicker avoidance, colour
saturation, and defect correction.
C. Motor, Accelerometer and Microphones
Kinect has two inter-related and important systems inside: a method to tilt the
Kinect head to up and to down, and an accelerometer. The head tilting is done by a
motor with some gearing to drive the head up and down. The accelerometer is the
way used by Kinect to determine what position the head is in.
The microphone array features four microphone capsules and operates with each
channel processing 16-bit audio at a sampling rate of 16 kHz.
Data Acquisition
The Kinect uses structured light and machine learning for Data Acquisition.
Stage 1: Structured light
Inferring body position is a two-stage process ; first compute a depth map (using
structured light), and then infer body position (using machine learning).
The colored image is obtained by a RGB camera. The depth measurement is done
using an infrared emitter and camera. The computation is done using structured
light.
This approach consists in projecting a pattern of pixels in the scene and capturing the
deformation of projection, that will allows us to compute the pixel distances (depths).
It is necessary to calibrate the IR emitter and camera to perform this calculation.
The pattern used in Kinect is a PrimeSense patent. It is known that the pattern is
based on speckle of infrared lights. It is generated from a set of diffraction gratings.
Structured light general principle : project a known pattern onto the scene and infer
depth from the deformation of that pattern.
Stage 2 : Using Machine Learning
Body parts are inferred using a randomized decision forest, learned from over 1
million training examples.
Stage 2.1 starts with 100,000 depth images with known skeletons. Using computer
graphics render all sequences for 15 different body types, and vary several other
parameters.
Thus obtains over a million training examples and it transforms depth image to body
part image.
Learn a randomized decision forest, mapping depth images to body parts. A
randomized decision forest is a more sophisticated version of the classic decision
tree.
What kind of “questions” can the Kinect ask in its twenty questions?
•Simplified version:
–“is the pixel at that offset in the background?”
•Real version:
–“how does the (normalized) depth at that pixel compare to this pixel?”
Thus after determining the body shape and creating inferences from the decision
tree, the images of garments that are already fed into the system as 3D models are
displayed on the UI with respect to the fit and best matching to the person in front of
the Interactive mirror.
Users are able to turn their body within a reasonable range in front of the Interactive
Mirror and still have the digital clothes properly fit to their body, just like what they
can see in front of a real mirror. The user selects menu items and outfit items using
hand gestures. Different tops, bottoms, and accessories can be added and matched
on the fly.
CONCLUSION
This application offers several advantages over traditional retailing. It attracts more
customers through providing a new and exciting retail concept, and creates interest
in the brand and store by viral marketing campaigns through customers sharing their
experiences in Social Media such as Facebook. Furthermore, it reduces the need for
floor space and fitting rooms, thereby reducing rental costs and shortening the time
for trying on different combinations and making purchase decisions.