Smile.AI: Deep Learning for Digital Smile Design
Smile.AI: Deep Learning for Digital Smile Design
Ganna allah Khaled Ali Riham Essam Gamal Sarah Hatem Nabil
Bioinformatics Department Bioinformatics Department Bioinformatics Department
Computer and Information Science Computer and Information Science Computer and Information Science
Ain Shams University Ain Shams University Ain Shams University
Cairo, Egypt Cairo, Egypt Cairo, Egypt
[email protected] [email protected] [email protected]
568
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)
and image processing. The patient participates in the smile- of dental crown images which was collected manually by
designing process by giving an image of his/her smile and themselves and they evaluated the performance using various
the application visualizes and simulates the desired smile by metrics. The results showed that their proposed system can
using stable diffusion models with ControlNet. Making the generate accurate and detailed restorations, improving the
decision to Proceed in the treatment process is a burden for efficiency and accuracy of dental crown restorations while
both the dentist and the patient as the patient can’t make up reducing the workload of dentists, as The PAN [2], model
his mind about improving his/her smile and going through achieved the highest result with an SSIM [3] score of 0.017,
the procedures of a smile makeover. Moreover, the doctor while the Pix2PixHD [4] model achieved the lowest result with
finds it difficult to persuade the patient to start treatment an RMSE [5] score of 0.005. In [6], Leonardi, et al. utilized
and the doctor can’t guarantee that the patient will like different techniques, such as CNN [7], RNN [8], and transfer
the result. Smile.AI assists with visual communication and learning along with data pre-processing to improve accuracy
engages patients in the process of designing their own smile, as AI was used for the classification of growth patterns,
which leads to more predictable treatment results and a greater achieving a 64% success rate in distinguishing between good
likelihood that patients will agree to the proposed treatment and poor growth, they applied them to both 2D images (lateral
plan. In addition, Smile.AI helps the dentist in dealing with cephalometric X-rays) and 3D images (CBCT). They gave
patients by giving an understanding of the possible solution, examples of how deep learning and computer vision can be
therefore, educating and motivating them about the benefits of used in diagnosis and treatment planning, such as in detecting
the treatment and reducing the patient’s reluctance. dental caries and analyzing 3D models. They also identified
The first used approach is based on image segmentation areas for future research, such as improving the interpretability
using deep learning segmentation models, such as Unet++, of deep learning models and developing more robust models
MAnet, DeepLabv3+, and FPN. The second approach is based to handle variations in dental images.
on ControlNet. ControlNet Stable Diffusion is a type of deep Moreover, Zhou et al. [9] presented a method for improving
learning models that is employed for the process of image facial attractiveness using Generative Adversarial Networks
segmentation. Its main objective is to recognize and divide (GANs). GANs are a type of deep learning architecture that
objects or specific areas of interest in an image. The model can produce new data that resembles a given dataset. Their
uses a technique based on diffusion to spread information proposed method involved training a GAN on a facial image
throughout the image, while also ensuring that numerical dataset to generate new images that were more attractive
instability is avoided, and stability is maintained. ControlNet than the original images. They evaluated the method using
Stable Diffusion has been used in a wide range of applications, a facial image dataset and found that the generated images
including medical imaging, object recognition, and natural were perceived as more attractive by human evaluators as the
language processing. In the context of the article, ControlNet BGAN [10] model achieved the highest result with an FID
Stable Diffusion is being utilized to analyze dental images and [11] score of 371.8. They also conducted a user study to
enhance the precision of Smile.AI in generating and presenting assess the effectiveness of the method, The paper discussed
the desired smile. ControlNet is becoming more popular for a the potential applications of the method in cosmetic surgery
number of reasons. Firstly, it produces higher-quality outputs and social media, such as giving patients a realistic image of
with better control compared to other methods like Depth-to- how they would look after cosmetic surgery and enhancing
Image. Secondly, it is highly efficient, with training times that facial attractiveness in social media photos to improve social
are significantly shorter than traditional approaches. Finally, interactions.
ControlNet prevents overfitting. Zhang et al. [12] presented a novel technique for producing
The rest of the paper is divided as follows: The related work images from textual descriptions using diffusion models as
is presented in Section II. This section includes comparisons they used the LAION-5B dataset [13], which were generative
between the proposed work and other approaches. The process models that mimic the movement of particles in a fluid.
of data collection and data preprocessing is described in Their proposed method allowed for conditional control over
Section III. The system architecture is illustrated in Section the generated images by conditioning the diffusion model on
IV. The flow of the application is described along with how textual descriptions of the desired image. The effectiveness
the deep learning model was built. After that, the experimental of the approach was demonstrated by generating high-quality
results are shown in Section V. Finally, Section VI concludes images of birds and flowers using textual descriptions as input.
the paper and suggests directions for future work. Also, they compared their model to other state-of-the-art text-
to-image generative models on various benchmark datasets,
II. RELATED WORK and their model outperformed the others. Furthermore, the
Tian et al. [1], introduced a new system for automating VQGAN [14] model achieved the highest result with an
the time-consuming and specialized task of dental crown FID score of 26.2, to evaluate the quality of text-to-image
restoration. The system used a two-stage generative adversarial generative models, which measures the similarity between
network (GAN) to generate high-quality and diverse dental the distribution of textual descriptions and the distribution of
crown restorations that are comparable to those produced by generated images.
expert dentists. The dataset used in this paper was a dataset
569
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)
In [15] Saharia, et al. proposed a model that was a novel It was a challenge to find a large number of before-and- after
method for generating photorealistic images from textual de- images of teeth and smiles due to the shortage of availability
scriptions using diffusion models and deep language under- of such data. This data is necessary to train a machine
standing. They suggested a new approach that combined a learning model, which is in our case, ControlNet, while
diffusion model and a deep language understanding module maintaining a high prediction accuracy, which is one of the
for generating images with fine-grained control from textual most important features in the application. 1022 images of
descriptions. They showed the efficiency of their method by teeth were manually collected to train the model, after
generating high-quality images Of the MS-COCO dataset [16] contacting dentists and clinics abroad and searching various
using textual descriptions as input and demonstrated that their international medical and dental websites and clinics online.
model outperformed other text-to-image generative models on The images were divided into two halves: ‘before’ which shows
various benchmark datasets, Additionally, the AttnGAN model teeth with various problems, such as lack of whiteness,
[17] achieved the highest result with FID score of 35.4. receding gums, crookedness, spacing between teeth, and more,
570
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)
and ‘after’ which shows these teeth after treatment. •Dentist Data: It holds the data related to the Dentist.
Duplicate images were removed using ”dupeGuru” software. •Patient Data: It contains all patient data including images
A sample of the collected images is shown in Fig. 1 and Fig. entered into the application.
2. Data was divided into 90% for training, divided into two
files, ‘before’ and ‘after’ processing, and 10% for testing,
divided in the same way as a training file, but before dividing
that data, it was treated according to the principles of data pre-
processing
as follows:
• Rename image: to map image before and after to the same
number.
• Resize image: resize all images to 224×224.
• Unify the number of channels of images to make it all RGB.
• Change the file extension to ”jpg”.
V. SYSTEM ARCHITECTURE
The System architecture is presented in a layered-
architecture style. It consists of three main layers each
layer is responsible for delivering a specific service. The
upperlayer is the Presentation Layer which holds the frontend
code of the user’s interface. The middle layer is the Business
layer which is responsible for the system’s logic and
fulfilling the functional requirements. Finally, the lower layer
is the Database layer which holds the application’s data. The
systemarchitecture is shown in Fig. 3. 1. Presentation layer:
3. Data layer:
It holds the two data entities:
Fig. 3. The Proposed System Architecture
571
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)
TABLE I TABLE II
SEGMENTATION MODELS’ TRIALS CONTROLNET TRIALS
[19] Unet++ 92 73 43 40
[21] DeepLabv3+ 96 71 41 38
[22] FPN 94 72 36 33
29 25
572
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)
573
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.