© 2017 IJSRST | Volume 3 | Issue 3 | Print ISSN: 2395-6011 | Online ISSN: 2395-602X
Themed Section: Science and Technology
A Virtual Dressing Room Using Kinect
Jagtap Prajakta Bansidhar, Bhole Sheetal Hiraman, Mate Kanchan Tanaji, Prof. S. V. More, Prof. B. S.
Shirole
Sanghavi College of Engineering, Nashik., Varvandi, Maharashtra, India
ABSTRACT
We Shows a novel virtual fitting room framework using a depth sensor, which provides a realistic fitting experience
with customized motion filters, size adjustments and physical simulation. The proposed scaling method adjusts the
avatar and calculate a standardized apparel size according to the user's measurements, prepares the collision mesh
and the physics simulation, with a total of 1 s preprocessing time. The real-time motion filters prevent unnatural
artifacts due to the bug from depth sensor or self barrier body parts. We apply bone splitting to realistically render
the body parts near the joints. All components are combined efficiently to keep the frame rate higher than previous
works while not sacrificing realism.
Keywords : virtual try-on, Kinect, HD camera, OpenNI, Kinect for Windows Augmented Reality, Human-
Computer Interaction, Kinect.
I. INTRODUCTION
Trying clothes in clothing stores is usually a time
consuming activity. Moreover, it might not even be
possible to try on clothes in the store, such as when
ordering clothes online. Here we propose a simple
virtual dressing room application to make shopping for
clothing faster, easier, and more accessible. The first
problem we address in the design of our application is Figure 1. The user interface of the application
the correct position of the user and virtual cloth models.
Detection and skeletal tracking of a user in a video or a better fitting experience. We present a novel virtual
stream can be implemented in several ways. For fitting room framework that provides all the basic
example, Kjaerside et al. [1] proposed a tag-based features expected from such an application, along with
augmented reality dressing room, which needs sticking enhancements in various aspects for higher realism.
visual tags for motion capture. More recently, Shotton et These enhancements include motion filtering,
al. [2] have developed a real-time human pose customized user scaling, and the use of a physics engine.
recognition system that predicts the 3D positions of The motion filtering process starts with temporal
body joints, using a single depth image without visual averaging of joint positions in order to overcome the
tags. In this project, we use Shotton et al.’s method and a high noise of the depth sensor. However, temporal
Microsoft Kinect sensor to create a tag less, real-time averaging does not prove to be sufficient because
augmented reality dressing room application. Developer unnatural movements take place due to limited
tools, such as the ones included in the OpenNI recognition capabilities and self-occlusion. We
framework and the Microsoft Kinect SDK, ease implement customized joint angle filters, along with
developing applications based on the Kinect sensor. We bone splitting, to let limbs twist in a more natural way.
used the Kinect SDK as it includes a robust real-time We also employ filtering on hip and knee joints to
skeletal body tracker based on [2]. overcome the foot skating problem.
IJSRST1733142 | 05 April 2017 | Accepted: 15 April 2017 | March-April-2017 [(2)3: 384-389]
384
II. LITERATURE SURVEY calculate the height of user.We simply specify the floor
area from the Kinect depth data manually, and the
J. Shotton, T. Sharp[2], Markerless human motion normal vector of the floor plane in Kinect’s view can be
tracking is a long-standing problem in computer estimated.
vision.With the recent advances in depth cameras and
sensors, especially the Kinect sensor research on human
skeletal pose tracking has made great improvements .
Our system builds on top of these techniques by utilizing
publicly available SDKs that incorporate some of these
state-of-the-art algorithms.
K. Kjærside[1], Hilliges, O., Kim,[4], Kinect has also
enabled various interactive applications that are creative
and fun, Most relevant to our Interactive Mirror is the
ever-growing virtual fitting room systems available on
the market, such as Fitnect and TriMirror [4]. However, Figure 2. Major steps for content creation. Catalogue
we have not been able to find any technical details of images are first manually modeled and textured offline
these systems. From their demo videos alone, the major in 3DS Max. We then augment the digital clothes with
difference between our system and TriMirror, for relevant size and skinning information. At runtime, 3D
example, is that we do not simulate clothes in our clothes are properly resized according to a user’s height,
system.We simply render the deformed clothes on top of skinned to the tracked skeleton, and then rendered with
the user’s video stream, and this requires a high-quality proper camera settings. Finally, the rendered clothes are
calibration between the Kinect com bind with the HD recording of the user in realtime.
III. RELATED WORK
A. Camera Calibration
Vision-based augmented reality systems need to trace
the transformation relationship between the camera and
the tracking target in order to augment the target with
virtual objects. In our virtual try-on system, precise
calibration between the Kinect sensor and the HD
camera is crucial in order to register and overlay Figure 3. Left: the UI for virtual try-on. Right: the UI
imaginary garments seamlessly onto the 2D HD video for clothing item selection. To summarize, the result of
stream of the shoppers. Furthermore, we prefer a quick the camera calibration procedure adds: – extrinsic
and semi-automatic calibration process because the camera parameters (translation and rotation) of the HD
layout between Kinect and HD camera with respect to camera with respect to the Kinect depth camera. – the
the floor plan may be different for different stores, or tilting angles of the Kinect sensor and the HD camera
even for the same store at different times. To this end, with respect to the horizontal ground plane. – FoV of the
we use the CameraCalibrate and StereoCalibrate HD camera.
modules in OpenCV [3] for camera calibration. More
specifically, we recommend to collect a minimum of 30 B. Content creation
pairs of checkerboard picture seen at the same instant of
time from Kinect and HD camera, and calculate each Our virtual 3D clothes are based on actual catalogue
pair’s correspondences, as shown in Fig. 7. In addition, images, so that new fashion lines can be included to the
the Kinect sensor is usually not perfectly horizontal to system quickly. Fig. 4 shows the major steps of
the ground plane, and its tilting angle is needed to converting catalogue images to 3D digital clothes. In the
International Journal of Scientific Research in Science and Technology (www.ijsrst.com)
385
preprocessing stage, our artists manually generatedone
standard digital male mannequin and one female
mannequin. Then they modeled the catalogue images
into 3D clothes that fit the proportions of the default
mannequins. Corresponding textures were also adapted
and applied to the digital clothes. Then we augment the
digital clothes with relevant size and skinning data
Atruntime,3D
D. Height estimation
Digital clothes need to be rescaled according to users’
body size, for good fitting and try-on experiences. We
propose two methods to estimate a user’s shoulder
height. The first one simply uses the neck to feet height
difference, when both the neck and the feet joints are
detected by Kinect skeletal tracking SDKs.
Figure 4. Shoulder height estimation when the user’s
feet are not in the field of view of Kinect. IV. PROPOSED SYSTEM
The tilting angle of the Kinect sensor, the depth of the Our virtual try-on system consists of a vertical TV
neck joint, and the offset of the neck joint with respect to screen, a Microsoft Kinect sensor, an HD camera, and a
the center point of the depth image can jointly determine desktop computer. Fig. 6 shows the front view of the
the physical height of the neck joint in the world space. Interactive Mirror together with the Kinect and HD
clothes are properly resized according to a user’s height, camera. The Kinect sensor is an input device marketed
by Microsoft, and intended as a gaming interface for
skinned to the tracked skeleton, and then rendered with
Xbox 360 consoles and PCs. It consists of a depth
proper camera settings. Lastly, the rendered clothes are camera, an RGB camera, and microphone arrays. Both
merged with the HD recording of the user in realtime. the depth and the RGB camera have a horizontal
Our content development team modeled 115 clothing viewing range of 57.5 degrees, and a vertical viewing
items in total, including male clothes, female clothes, range of 43.5 degrees. Kinect can also tilt up and down
and accessories. On average it took about two man days within -27 to +27 degrees. The range of the depth
to create and test one item for its inclusion into the camera is [0.8_4]m in the normal mode and [0.4_3]m in
the near mode. The HD camera supports a full resolution
virtual try-on system.
of 2080 _ 1552, from which
C. User interface
Fig. 5 depicts the user interface of the Interactive Mirror.
Because our clothes are 3D models rather than 2D
images, users are able to turn their body within a
reasonable range in front of the Interactive Mirror and
still have the digital clothes properly fit to their body,
just like what they can see in front of a real mirror. The
user selects menu items and outfit items using hand
gestures. Different tops, bottoms, and accessories can be
added and matched on the fly. Figure 6. The front view of the Interactive Mirror with
Kinect and HD camera placed on top.
International Journal of Scientific Research in Science and Technology (www.ijsrst.com)
386
Proposed System Screenshots
Figure 7. Major software components of the virtual try-
on system.
Fig. 7illustrates the major software components of the
virtual try-on system. During the the offline
preprocessing stage, we need to calibrate the Kinect and
HD cameras, and create 3D clothes and accessories.
During the online virtual try-on, we first detect the
nearest person among the people in the area of interest.
This person will then become the subject of interest to
be tracked by the motion tracking component
implemented on two publicly available Kinect SDKs, as Figure 9. Login Window
will be discussed in Section 4. The user interacts with
the Interactive Mirror with her right hand to control the
User Interface (UI) and select clothing items. The UI
layout will be discussed in more details
Figure 10. Splash Window
Figure 11. DashBoard
Figure 8. The camera calibration process. The
checkerboard images seen by the Kinect RGB camera
(left) and the HD camera (right) at the same instant of
time.
International Journal of Scientific Research in Science and Technology (www.ijsrst.com)
387
Figure 13. A set of poses that are used in the evaluation
of the performance
Arm 90°
Rotation 0° 45°
76.64%
Performance 89.80% 90.54%
Body
Rotation 45° 0° 45°
Performance 83.68% 90.19% 88.84%
Figure 12. Dressing Room Horizontal -45° 0° 45°
Rotation
V. EXPRIMENTAL RESULTS 77.87%
74.88% 87.74%
We evaluated the performance of the application on a set Performance
of 12 poses with different angles of rotation and distance
from the sensor (Fig. 13). We measured the performance
as the amount of overlap between the constructed cloth Distance From
model and manually labeled ground truth data as P = Ac Sensor 1.5m° 2m° 3m°
∩ Ag Ac ∪ Ag (6) where Ac is the area of the
constructed model and Ag is the area of the ground truth
model in terms of the number of pixels. We observed an Performance 86.64% 89.87% 70.98%
average overlap of 83.97 or higher between the ground
truth and the computed models, within a rotation range
of 0−45◦ . Rotations along the vertical axis dropped the Average 0°
performance as the fitting is performed in only 2- Overlap
dimensions. The best results were obtained when the 83.97%
distance from the sensor was 2 meters. The results are TABLE 1 Experimental Results
summarized in Table 1.
International Journal of Scientific Research in Science and Technology (www.ijsrst.com)
388
[6]. Kim, K., Bolton, J., Girouard, A., Cooperstock, J.,
VI. CONCLUSION AND FUTURE WORK Vertegaal, R.: Telehuman: effects of 3d perspective on
gaze and pose estimation with a life-size cylindrical
EON Interactive Mirror offers several advantages over telepresence pod. In: CHI’12. pp. 2531–2540 (2012)
traditional retailing. It attracts more customers through [7]. Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N.:
Estimating human 3d pose from time-of-flight images
providing a new and exciting retail concept, and creates
based on geodesic distances and optical flow. In: IEEE
interest in the brand and store by viral marketing
Conference on Automatic Face and Gesture
campaigns through customers sharing their experiences Recognition (FG). pp. 700–706 (2011)
in Social Media such as Facebook. Furthermore, it [8]. Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T.,
reduces the need for floor space and fitting rooms, Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-
thereby reducing rental costs and shortening the time for time human pose recognition in parts from single depth
trying on different combinations and making purchase images. In: CVPR. pp. 1297–1304 (2011)
decisions. We encourage interested readers to search our [9]. Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime
demo videos with keywords EON Interactive Mirror at performance-based facial animation. ACM Trans.
http://www.youtube.com. Graph. 30(4), Article 77 (2011)
[10]. Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.:
Accurate 3d pose estimation from a single depth image.
We developed a real-time virtual dressing room
In: ICCV. pp. 731–738 (2011)
application that requires no visual tags. We tested our [11]. Zhu, Y., Dariush, B., Fujimura, K.: In: CVPRW’08,
application under different conditions. Our experiments IEEE Computer Society Conference on Computer
showed that the application performs well for regular Vision and Pattern Recognition Workshop
postures. The application can be further improved
towards creating more realistic models by using 3D
cloth models and a physics engine.
VII. REFERENCES
[1]. K. J. Kortbek, H. Hedegaard, and K. Grønbæk,
“Ardresscode: augmented dressing room with tag-based
motion tracking and real-time clothes simulation,” in
Proceedings of the Central European Multimedia and
Virtual Reality Conference, 2005.
[2]. J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M.
Finocchio, A. Blake, M. Cook, and R. Moore, “Real-
time human pose recognition in parts from single depth
images,” Communications of the ACM, vol. 56, no. 1,
pp. 116–124, 2013.
[3]. Benko, H., Jota, R., Wilson, A.D.: Miragetable:
Freehand interaction on a projected augmented reality
tabletop. In: CHI’12 (2012)
[4]. Hilliges, O., Kim, D., Izadi, S., Weiss, M., Wilson,
A.D.: Holodesk: Direct 3d interactions with a situated
see-through display. In: Proceedings of the 2012 ACM
annual conference on Human Factors in Computing
Systems. CHI ’12 (2012)
[5]. Izadi, S., Newcombe, R.A., Kim, D., Hilliges, O.,
Molyneaux, D., Hodges, S., Kohli, P., Shotton, J.,
Davison, A.J., Fitzgibbon, A.: Kinectfusion: real-time
dynamic 3d surface reconstruction and interaction. In:
SIGGRAPH 2011 Talks. p. Article 23 (2011)
International Journal of Scientific Research in Science and Technology (www.ijsrst.com)
389