A Virtual Try-On System Based On Deep Learning
A Virtual Try-On System Based On Deep Learning
Abstract—This study introduces a deep learning-driven images, but this method is time-consuming and quality of
virtual fitting system that allows users to virtually experiment generated images cannot be guaranteed.
with garments, resulting in visually appealing effects. Firstly, We
utilize cutting-edge style transfer algorithms to apply the user's Virtual try-on technology is one of the fastest-growing e-
input image style onto the virtual scene. Then, we employ a commerce technologies in recent years. Many researchers have
generative adversarial network (GAN) on the modified image to attempted various methods, including video streaming-based,
generate content while maintaining image details. Specifically, 3D modeling-based, and image processing-based methods, to
The initial step in achieving a realistic try-on and detailed achieve virtual try-on. Despite some progress being made using
clothing representation involves predicting the semantic layout of these techniques, challenges such as low efficiency and
the reference image to be altered after the try-on process, unrealistic results still persist.
followed by determining its image contents based on the
predicted semantic layout.The network model obtained by The transfer of clothing items onto a reference person
training on the clothing dataset can realize the virtual try-on through image-based visual try-on has garnered attention due to
system for people through the network. Finally, the user's try-on the swift advancement of image synthesis technology [1, 2, 3, 4]
operation is completed in the virtual environment, and the final as evidenced by previous studies [5,6]. Despite the
effect picture is generated. The implementation and construction advancements achieved [7, 8], constructing a practical virtual
of the overall system is based on Python Web. Experimental try-on system in real-world scenarios remains a formidable task
results show that the system achieves more accurate and due to disparities in semantics and geometry between the
satisfactory recommendations. In conclusion, this system can desired apparel and reference images, alongside challenges
effectively achieve virtual try-on operations with high visual associated with interactions and occlusions involving the torso
effects. and limbs.
Keywords: Deep learning; virtual try-on; convolutional neural The present study introduces a deep learning-based virtual
network ; generative adversarial network try-on system that emulates the experience of wearing clothing
in a virtual setting.
I. INTRODUCTION
Virtual try-on technology is an emerging technique that II. RALATED WORK
simulates the effect of wearing clothing in a virtual Recently, virtual image try-on based on deep learning has
environment, which improves the shopping experience and gained prominence as a research area. These methods leverage
saves costs for businesses. However, existing virtual try-on deep learning generative models to solve the virtual try-on
systems encounter challenges including low efficiency and problem by transforming it into a conditional image generation
unrealistic outcomes, which significantly restrict the potential problem. By inputting a user's personal photo along with a
applications of virtual try-on technology. clothing model image to evaluate the wearing effect, a neural
network can be trained to generate an image showing how the
As deep learning technology continues to advance, an clothing would look on the user.
increasing number of researchers are investigating its potential
for enhancing virtual try-on technology. For example, Several novel approaches have been proposed in this field.
researchers have successfully used convolutional neural Han et al. [15] introduced an image-based virtual try-on
networks to implement virtual try-on, effectively generating network (VITON) that leverages a shape context matching
realistic clothing images. Meanwhile, some researchers have algorithm to accurately fit the fabric onto the target person and
used generative adversarial networks to generate virtual try-on learns image synthesis using a U-Net network generator. Wang
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 20,2025 at 14:44:28 UTC from IEEE Xplore. Restrictions apply.
et al. [16] introduced an image-based feature-preserving virtual III. SYSTEM OVERVIEW AND ARCHITECTURE
try-on network (CP-VTON) that employs a convolutional Our system is a virtual dressing application designed to
neural network architecture for set matching. Pandey [17] provide users with an online try-on service. It utilizes the latest
proposed PolyGAN (generative adversarial networks) to ACGPN[9] algorithm for model training and has been
simplify the network and use a single network to complete optimized for challenging scenarios such as occluded arms. It
tasks such as deformation and fusion. Marelli [18] developed a offers high accuracy, fast processing speed, and a seamless user
virtual try-on system that employed deep learning and web experience. The system architecture is composed of two
technologies to enable users to virtually try on clothes. primary components: the front-end and the back-end. The
Notably, the virtual try-on method based on deep learning front-end component manages user interaction and presentation,
does not require any 3D information, which reduces the while the back-end component oversees data processing and
complexity and computational cost of the system. This method model computation. Figure 1 illustrates the specific architecture
can achieve relatively good try-on results while also being cost- being discussed in this paper.
effective, making it a promising solution for various
applications.
We developed the front-end of our virtual try-on system manages and maintains data in the database, ensuring the
using HTML5, and we display the resulting virtual try-on system's stability and reliability.
results on the front-end webpage. When designing the interface,
we paid close attention to user experience, keeping it simple, The entire system is built using a distributed architecture,
clear, and easy to use. The front-end interface interacts with the with the back-end system deployed on cloud servers, providing
back-end system through data exchange and interaction high security and stability. Furthermore, we have used
functions provided by the back-end API. techniques like load balancing and disaster recovery backups to
ensure the system's operational efficiency and reliability. In
The back-end system is structured into four layers: data, conclusion, our virtual try-on system is a complete application
service, algorithm, and application. The data layer is that provides high accuracy, speed, and user experience,
responsible for data storage and management, and we use a enabling customers to access online fitting services from all
MySQL database for this purpose. The service layer handles angles.
data requests and computation models. It was developed using
the Django framework, on which we performed encapsulation IV. IMAGE VIRTUAL DRESSING ALGORITHM
and optimization, thus improving the system's performance and Over the years, the rapid development of image synthesis
efficiency. We implemented the virtual try-on method in the technology has driven the advancement of image virtual
algorithm layer, drawing inspiration from the ACGPN dressing[10]. Popular algorithms in this field include CP-
algorithm, and selected it for virtual try-on applications. VTON[11], VTNCT[12], and ACGPN[9]. CP-VTON and
The application layer interacts with the front-end system VTNCT weaken the supervised information of the person's
and provides API interfaces to receive user requests, process image by applying the frontal view of the same clothing onto
them, and return the corresponding results. Additionally, it the processed representation, aiming to reduce the dominance
104
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 20,2025 at 14:44:28 UTC from IEEE Xplore. Restrictions apply.
of strong references that may hinder the network's continuously enhance the level of application to provide an
generalization to different clothing items. In summary, these even better user experience.
algorithms involve extracting key points. However, a drawback
of such methods is the loss of fine details, which ACGPN ACGPN is structured with three modules, the first module,
effectively addresses by preserving the correct semantic Semantic Generation Module (SGM), the second module
structure to a greater extent. Our system employs ACGPN for Clothing Warping Module (CWM), the third module Content
image virtual dressing.Our virtual dressing system combines Fusion Module (CFM), Semantic Generation Module (SGM):
style transfer and GANs to achieve virtual try-on. Specifically, Semantic generation modules can be utilized to effectively
we first employ cutting-edge style transfer algorithms to isolate specific regions of clothing items and preserve body
transfer the user-provided image style into the virtual scene. parts (arms) without changing pose and other body details
Then, we use GANs to adaptively generate and preserve the Clothing Warping Module (CWM) Second-order difference
transformed image details. constraints are introduced in CWM to achieve geometric
matching and character preservation module content fusion The
To achieve this objective, our system predicts the initial CFM consists primarily of two modules, that is, the steps in the
semantic layout of the reference image that undergoes above figure. The overall architecture of ACGPN is divided
alterations post-try-on.Based on the predicted semantic layout, into three steps: in step 1, the semantic generation module
our system determines if image content should be generated or utilizes the clothing image from the target picture, human body
retained and synthesizes the final image accordingly.This pose and fused human body semantics as input prediction
enables more realistic virtual dressing operations and richer Semantic distribution and clothing templates, in which the
clothing details. clothing warping module deforms the target image based on the
predicted semantic module, and To stabilize this process,
We utilize extensive clothing datasets to train the network second-order difference constraints are introduced. Step 3
models, ensuring accurate predictions and high-quality results. combines the content fusion modules in 3 and 4, using random
Finally, we integrate the trained models into the virtual The mask template generates the RGB image with the occluded
dressing system and generate the final output image. torso and fuses it together to complete the image by fusing the
A. Dataset neural network. The core idea is the awareness of the semantic
First, we obtained the CP-VTON[13] and VITON[14] layout (semantic layout), and the network can adaptively retain
datasets by downloading them.These datasets included around all areas that do not change , and introduce the idea of layers,
19,000 pairs of images. In each image pair, there was a front- divide the hands and clothes into two layers to reduce the
view female image paired with a corresponding clothing image. mutual influence.
After eliminating invalid image pairs, we acquired a grand total C. Algorithm effect display
of 16,253 image pairs. These pairs were subsequently The ACGPN algorithm utilizes a convolutional neural
partitioned into 14,221 pairs assigned to the training set and network (CNN) in deep learning training to enable automated
2,032 pairs designated for the testing set. identification and classification of various components of
We conducted experiments based on this dataset. The clothing, thereby providing fundamental data and feature
model training was performed on a GPU server, and we vectors for subsequent style matching. Additionally, this paper
obtained an excellent model. We then embedded this model leverages texture mapping technology to establish a map of the
into the system for result prediction. input target clothing image onto the corresponding sections of
the human body image using coordinate transformation. This
B. Basic Principle of the Algorithm then allows for adaptive stretching, compression, and rotation
The system utilizes three powerful modules of the ACGPN based on the shape and curvature of each individual
algorithm and second-order difference constraints for section.The effect after the replacement is obtained, as shown
processing. Once the generative adversarial neural network in Figure 2:
(GAN) is trained, the resulting virtual try-on outcomes are
stored in the database and streamed to the frontend webpage for
display. This approach allows us to omit the step of person
recognition, thus enhancing the user experience. By applying
various clothing and pants data, we can improve the accuracy
and applicability of predictions through the learning of neural
networks and GANs. Additionally, based on the training results
of the GAN, we can generate flat images on the webpage to
provide more intuitive references regarding the data and
predicted outcomes. These improvement measures make our
virtual try-on system more reliable, stable, and versatile,
delivering a high-quality user experience.
Our virtual try-on application performs well in handling
challenging scenarios, and with the integration of database
storage, streaming, and webpage display, it offers
comprehensive and high-quality virtual dressing services. We
will continue to explore cutting-edge technologies and
105
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 20,2025 at 14:44:28 UTC from IEEE Xplore. Restrictions apply.
VI. DISCUSS
The virtual dressing system proposed in this paper can
provide users with a more convenient and efficient virtual try-
on experience. The system can simulate clothing changing in a
virtual environment with better results and lower time costs. In
addition, the system can also be applied to e-commerce
platforms to help customers experience clothing better. Of
course, our virtual dressing system has some limitations. For
example, when using style transfer algorithms, we need to
consider the quality and quantity of input images. Also, our
system may not be able to handle some complex clothing
textures or patterns with many layers, etc. Subsequent studies
could concentrate on tackling these challenges, optimizing the
system's efficacy and scalability, thereby rendering it
compatible with deployment at a larger scale.
VII. SUMMARIZE
This paper introduces a virtual dress-changing system that
utilizes deep learning that aims to address the challenges faced
Figure 2. Effect diagram of virtual clothes-changing algorithm by existing virtual try-on systems and achieve better
performance. The system uses a combination of style transfer
V. EXPERIMENTAL ANALYSIS AND RESULTS and GANs to achieve virtual try-on and uses a large clothing
We employed a diverse range of metrics to dataset for network model training. Experimental results
comprehensively evaluate the effectiveness of our virtual dress- demonstrate that our system can simulate clothing changing in
up system. In particular, we assessed the accuracy, authenticity, a virtual environment with better results and lower time costs.
and efficiency of the system in simulating clothing changes Future research can focus on improving the efficiency and
within a virtual environment. Our experimental results scalability of the system, making it suitable for large-scale
demonstrate that our system produces high-quality visuals and deployment.
can effectively simulate clothing changes.
ACKNOWLEDGMENT
Importantly, our system outperforms previous virtual try-on The authors acknowledge Foundation for 2022 Basic and
systems in terms of both accuracy and realism. To provide Applied Basic Research Project of Guangzhou Basic Research
more insight into the performance of our system, we conducted Plan (research on video compression algorithm based on dual
quantitative and qualitative analyses. Specifically, we tested neural network, Grant: 202201011753), Computer Vision
various virtual scenarios and evaluated the quality of the Application Innovation Team(Grant:2022KCXTD047),
generated images as well as the system's response time. Guangdong Intelligent Vocational Education Engineering
Our analysis yielded promising results, indicating that our Technology Research Center (Grant: 2021A118). Guangdong
system can generate high-quality virtual images quickly and Polytechnic of Science and Technology College-level
efficiently. Additionally, our system is capable of Project(Deep Learning-Based Classroom Teaching Quality
accommodating various types of clothing, enabling users to try Evaluation System Grant:XJPY202302), The Fundamental
on a wide variety of garments with ease. Overall, our findings Research Funds for the Central Universities , Southwest
suggest that our virtual dress-up system is a robust and versatile Minzu University (Grant:ZYN2022048), Sichuan Science and
solution for fashion applications. Technology Program (Grant:2021JDRC0063), Scientific
Research Projects of Colleges in Guangdong
The system effect is shown in Fig. 3. Province(Grant:2020KTSCX238), Special projects in key
fields of colleges and universities in Guangdong Province (new
generation electronic information Grant No.2022ZDZX1053),
Guangdong Polytechnic of Science and Technology College-
level Project(Grant:XJPY202306).
REFERENCES
[1] Isola, Phillip, et al. "Image-to-image translation with conditional
adversarial networks." Proceedings of the IEEE conference on computer
vision and pattern recognition. 2017.
[2] Park, Taesung, et al. "Semantic image synthesis with spatially-adaptive
normalization." Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition. 2019.
[3] Karras, Tero, et al. "Progressive growing of gans for improved quality,
stability, and variation." arXiv preprint arXiv:1710.10196 (2017).
106
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 20,2025 at 14:44:28 UTC from IEEE Xplore. Restrictions apply.
[4] Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator [11] Fele, Benjamin, et al. "C-vton: Context-driven image-based virtual try-
architecture for generative adversarial networks." Proceedings of the on network." Proceedings of the IEEE/CVF winter conference on
IEEE/CVF conference on computer vision and pattern recognition. 2019. applications of computer vision. 2022.
[5] Jetchev, Nikolay, and Urs Bergmann. "The conditional analogy gan: [12] Chang, Yuan, et al. "VTNCT: an image-based virtual try-on network by
Swapping fashion articles on people images." Proceedings of the IEEE combining feature with pixel transformation." The Visual Computer
international conference on computer vision workshops. 2017. (2022): 1-14.
[6] Han, Xintong, et al. "Viton: An image-based virtual try-on network." [13] Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin,
Proceedings of the IEEE conference on computer vision and pattern and Meng Yang. Toward characteristic-preserving image-based virtual
recognition. 2018. try-on network. In ECCV(13), volume 11217 of Lecture Notes in
[7] Wang, Bochao, et al. "Toward characteristic-preserving image-based Computer Science,pages 607–623. Springer, 2018.
virtual try-on network." Proceedings of the European conference on [14] Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S.Davis.
computer vision (ECCV). 2018. VITON: an image-based virtual try-on network. In CVPR, pages 7543–
[8] Dong, Haoye, et al. "Towards multi-pose guided virtual try-on network." 7552. IEEE Computer Society, 2018.
Proceedings of the IEEE/CVF international conference on computer [15] Han, Xintong, et al. "Viton: An image-based virtual try-on network."
vision. 2019. Proceedings of the IEEE conference on computer vision and pattern
[9] Yang, Han, et al. "Towards photo-realistic virtual try-on by adaptively recognition. 2018.
generating-preserving image content." Proceedings of the IEEE/CVF [16] Wang, Bochao, et al. "Toward characteristic-preserving image-based
conference on computer vision and pattern recognition. 2020. virtual try-on network." Proceedings of the European conference on
[10] Morelli, Davide, et al. "Dress Code: High-Resolution Multi-Category computer vision (ECCV). 2018.
Virtual Try-On." Proceedings of the IEEE/CVF Conference on [17] Pandey, Nilesh. Poly-GAN: A Multi-Conditioned GANfor Multiple
Computer Vision and Pattern Recognition. 2022. Tasks. Rochester Institute of Technology, 2019.
[18] Marelli, Davide, Simone Bianco, and Gianluigi Ciocca. "Designing an
AI-Based Virtual Try-On Web Application." Sensors 22.10 (2022): 3832.
107
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 20,2025 at 14:44:28 UTC from IEEE Xplore. Restrictions apply.