Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
195 views6 pages

Smile.AI: Deep Learning for Digital Smile Design

The document describes Smile.AI, a deep learning system for digital smile design. Smile.AI uses machine learning, deep learning, and image processing to simulate a desired smile. It aims to help patients participate in planning their smile makeover and improve treatment outcomes and acceptance, similar to existing digital smile design software used by dentists to collaboratively plan treatments with customized 3D visualizations of patients' goals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views6 pages

Smile.AI: Deep Learning for Digital Smile Design

The document describes Smile.AI, a deep learning system for digital smile design. Smile.AI uses machine learning, deep learning, and image processing to simulate a desired smile. It aims to help patients participate in planning their smile makeover and improve treatment outcomes and acceptance, similar to existing digital smile design software used by dentists to collaboratively plan treatments with customized 3D visualizations of patients' goals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)

Smile.AI: A Deep Learning System for Digital


Smile Design
2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS) | 979-8-3503-2210-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICICIS58388.2023.10391194

Ganna allah Khaled Ali Riham Essam Gamal Sarah Hatem Nabil
Bioinformatics Department Bioinformatics Department Bioinformatics Department
Computer and Information Science Computer and Information Science Computer and Information Science
Ain Shams University Ain Shams University Ain Shams University
Cairo, Egypt Cairo, Egypt Cairo, Egypt
[email protected] [email protected] [email protected]

Khloud Aboelhasan Abdelbadei Taghreed Samir Mohamed Amr Mohamed Said


Bioinformatics Department, Bioinformatics Department Bioinformatics Department
Computer and Information Science , Computer and Information Science Computer and Information Science
Ain Shams University Ain Shams University Ain Shams University
Cairo, Egypt Cairo, Egypt Cairo, Egypt
[email protected] [email protected] [email protected]

Mohammad Essam Maryam Nabil Al-Berry


Scientific Computing Department Scientific Computing Department
Computer and Information Sciences Computer and Information Science
Ain Shams University Ain Shams University
Cairo, Egypt Cairo, Egypt
[email protected] maryam [email protected]

Abstract—Digital Smile Design (DSD) is an advanced technique I. INTRODUCTION


that enables dentists to create customized treatment plans for
each patient by creating a digital representation of their desired
smile. This approach promotes collaboration between dentists The desire for a beautiful and confident smile is universal.
and patients to meet individual needs and goals. Special software However, some patients may be hesitant to undergo treatment
like DSD allows dentists to plan treatments more precisely and
achieve the patient’s desired outcome. Smile.AI is an application
as they cannot visualize the outcome. To address this concern,
inspired by DSD, which utilizes machine learning, deep learning, DSD software was developed to assist clinicians in improving
and image processing to simulate a desired smile.Smile.AI helps the patient’s visualization of possible solutions, ultimately
patients participate in the smile creation process and allows increasing case acceptance. DSD is an advanced method that
for more predictable outcomes, improving patient acceptance enables dental professionals to create personalized treatment
of suggested treatment plans. Smile.AI also helps dentists to
simplify the connection with their patients, educate them about
plans for each patient. Essentially, DSD involves generating
the benefits of treatment, and trying to reduce patient reluctance. a digital representation of the patient’s desired smile. This
The first approach is based on image segmentation using a approach promotes collaboration between the dentist and the
segmentation model. Unet++, MAnet, DeepLabv3+, FPN, while patient in achieving their respective needs and objectives. Us-
the second approach is based on ControlNet, ControlNet is a type ing specialized software, dentists can obtain a highly detailed
of neural network architecture that controls diffusion models by
introducing additional constraints. This innovation represents a
view of the patient’s teeth and mouth, enabling them to plan
major advance in the field of AI imaging as it allows excellent treatments with greater precision and ensure that the final
control over stable diffusion ControlNet is growing in popularity outcome aligns with the patient’s expectations. With DSD,
for several reasons. First, it produces high-quality output with patients can visualize the appearance of their new smile before
better control compared to other methods like depth-to-image. making the decision to start treatment. DSD is commonly
Second, it is highly efficient and requires significantly less training
time than traditional approaches. Finally, ControlNet prevents utilized in cosmetic dentistry procedures like dental implants,
overfitting. Overall, Smile.AI is a revolutionary tool that can veneers, orthodontics, and teeth whitening. Additionally, it can
improve the quality of dental treatments and patient satisfaction. effectively deal with various dental issues such as misaligned
teeth and gaps.
Index Terms—DSD, Image Segmentation, ControlNet, Smile “Smile.AI” is inspired by DSD as a similar approach
Makeover has been used by using machine learning, deep learning,

979-8-3503-2210-1/23/$31.00 ©2023 IEEE

568
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)

and image processing. The patient participates in the smile- of dental crown images which was collected manually by
designing process by giving an image of his/her smile and themselves and they evaluated the performance using various
the application visualizes and simulates the desired smile by metrics. The results showed that their proposed system can
using stable diffusion models with ControlNet. Making the generate accurate and detailed restorations, improving the
decision to Proceed in the treatment process is a burden for efficiency and accuracy of dental crown restorations while
both the dentist and the patient as the patient can’t make up reducing the workload of dentists, as The PAN [2], model
his mind about improving his/her smile and going through achieved the highest result with an SSIM [3] score of 0.017,
the procedures of a smile makeover. Moreover, the doctor while the Pix2PixHD [4] model achieved the lowest result with
finds it difficult to persuade the patient to start treatment an RMSE [5] score of 0.005. In [6], Leonardi, et al. utilized
and the doctor can’t guarantee that the patient will like different techniques, such as CNN [7], RNN [8], and transfer
the result. Smile.AI assists with visual communication and learning along with data pre-processing to improve accuracy
engages patients in the process of designing their own smile, as AI was used for the classification of growth patterns,
which leads to more predictable treatment results and a greater achieving a 64% success rate in distinguishing between good
likelihood that patients will agree to the proposed treatment and poor growth, they applied them to both 2D images (lateral
plan. In addition, Smile.AI helps the dentist in dealing with cephalometric X-rays) and 3D images (CBCT). They gave
patients by giving an understanding of the possible solution, examples of how deep learning and computer vision can be
therefore, educating and motivating them about the benefits of used in diagnosis and treatment planning, such as in detecting
the treatment and reducing the patient’s reluctance. dental caries and analyzing 3D models. They also identified
The first used approach is based on image segmentation areas for future research, such as improving the interpretability
using deep learning segmentation models, such as Unet++, of deep learning models and developing more robust models
MAnet, DeepLabv3+, and FPN. The second approach is based to handle variations in dental images.
on ControlNet. ControlNet Stable Diffusion is a type of deep Moreover, Zhou et al. [9] presented a method for improving
learning models that is employed for the process of image facial attractiveness using Generative Adversarial Networks
segmentation. Its main objective is to recognize and divide (GANs). GANs are a type of deep learning architecture that
objects or specific areas of interest in an image. The model can produce new data that resembles a given dataset. Their
uses a technique based on diffusion to spread information proposed method involved training a GAN on a facial image
throughout the image, while also ensuring that numerical dataset to generate new images that were more attractive
instability is avoided, and stability is maintained. ControlNet than the original images. They evaluated the method using
Stable Diffusion has been used in a wide range of applications, a facial image dataset and found that the generated images
including medical imaging, object recognition, and natural were perceived as more attractive by human evaluators as the
language processing. In the context of the article, ControlNet BGAN [10] model achieved the highest result with an FID
Stable Diffusion is being utilized to analyze dental images and [11] score of 371.8. They also conducted a user study to
enhance the precision of Smile.AI in generating and presenting assess the effectiveness of the method, The paper discussed
the desired smile. ControlNet is becoming more popular for a the potential applications of the method in cosmetic surgery
number of reasons. Firstly, it produces higher-quality outputs and social media, such as giving patients a realistic image of
with better control compared to other methods like Depth-to- how they would look after cosmetic surgery and enhancing
Image. Secondly, it is highly efficient, with training times that facial attractiveness in social media photos to improve social
are significantly shorter than traditional approaches. Finally, interactions.
ControlNet prevents overfitting. Zhang et al. [12] presented a novel technique for producing
The rest of the paper is divided as follows: The related work images from textual descriptions using diffusion models as
is presented in Section II. This section includes comparisons they used the LAION-5B dataset [13], which were generative
between the proposed work and other approaches. The process models that mimic the movement of particles in a fluid.
of data collection and data preprocessing is described in Their proposed method allowed for conditional control over
Section III. The system architecture is illustrated in Section the generated images by conditioning the diffusion model on
IV. The flow of the application is described along with how textual descriptions of the desired image. The effectiveness
the deep learning model was built. After that, the experimental of the approach was demonstrated by generating high-quality
results are shown in Section V. Finally, Section VI concludes images of birds and flowers using textual descriptions as input.
the paper and suggests directions for future work. Also, they compared their model to other state-of-the-art text-
to-image generative models on various benchmark datasets,
II. RELATED WORK and their model outperformed the others. Furthermore, the
Tian et al. [1], introduced a new system for automating VQGAN [14] model achieved the highest result with an
the time-consuming and specialized task of dental crown FID score of 26.2, to evaluate the quality of text-to-image
restoration. The system used a two-stage generative adversarial generative models, which measures the similarity between
network (GAN) to generate high-quality and diverse dental the distribution of textual descriptions and the distribution of
crown restorations that are comparable to those produced by generated images.
expert dentists. The dataset used in this paper was a dataset

569
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)

Fig. 1. Sample of the dataset images

Fig. 2. Examples of Post-Processing Binary Segmentation of After Images

III. DATA COLLECTION

In [15] Saharia, et al. proposed a model that was a novel It was a challenge to find a large number of before-and- after
method for generating photorealistic images from textual de- images of teeth and smiles due to the shortage of availability
scriptions using diffusion models and deep language under- of such data. This data is necessary to train a machine
standing. They suggested a new approach that combined a learning model, which is in our case, ControlNet, while
diffusion model and a deep language understanding module maintaining a high prediction accuracy, which is one of the
for generating images with fine-grained control from textual most important features in the application. 1022 images of
descriptions. They showed the efficiency of their method by teeth were manually collected to train the model, after
generating high-quality images Of the MS-COCO dataset [16] contacting dentists and clinics abroad and searching various
using textual descriptions as input and demonstrated that their international medical and dental websites and clinics online.
model outperformed other text-to-image generative models on The images were divided into two halves: ‘before’ which shows
various benchmark datasets, Additionally, the AttnGAN model teeth with various problems, such as lack of whiteness,
[17] achieved the highest result with FID score of 35.4. receding gums, crookedness, spacing between teeth, and more,

570
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)

and ‘after’ which shows these teeth after treatment. •Dentist Data: It holds the data related to the Dentist.
Duplicate images were removed using ”dupeGuru” software. •Patient Data: It contains all patient data including images
A sample of the collected images is shown in Fig. 1 and Fig. entered into the application.
2. Data was divided into 90% for training, divided into two
files, ‘before’ and ‘after’ processing, and 10% for testing,
divided in the same way as a training file, but before dividing
that data, it was treated according to the principles of data pre-
processing

as follows:
• Rename image: to map image before and after to the same
number.
• Resize image: resize all images to 224×224.
• Unify the number of channels of images to make it all RGB.
• Change the file extension to ”jpg”.

V. SYSTEM ARCHITECTURE
The System architecture is presented in a layered-
architecture style. It consists of three main layers each
layer is responsible for delivering a specific service. The
upperlayer is the Presentation Layer which holds the frontend
code of the user’s interface. The middle layer is the Business
layer which is responsible for the system’s logic and
fulfilling the functional requirements. Finally, the lower layer
is the Database layer which holds the application’s data. The
systemarchitecture is shown in Fig. 3. 1. Presentation layer:

It holds the web user interfaces presented in six pages:


•Home page: Users view brief info about the app and
chooseto log in or sign up.
•Sign-up page: Users enter their sign-up data. Login page:
Users enter their login credentials.
•Patient’s table: Shows patients for the logged-in doctor with
brief info.
•Patient’s details: Shows patient’s data in detail.
•Patient’s smile: Show the model’s output of the patient.
2. Business layer:
It holds the functionalities of the application:
•Dentist Component: Responsible for all the logic and
functionalities related to the Dentist entity including
accessing the database.
•Patient Component: Responsible for all the logic and
functionalities related to the Patient entity including accessing
the database
•Sending data to the database, receiving data from the
database, and generating the smile image using the ControlNet
model.
•Validation of data entered by dentist and authentication to
control permissions of user types.

3. Data layer:
It holds the two data entities:
Fig. 3. The Proposed System Architecture

571
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)
TABLE I TABLE II
SEGMENTATION MODELS’ TRIALS CONTROLNET TRIALS

Pixel Accuracy (%) SSIM


Model’s Name Model’s Name
Train Test Train Test

[19] Unet++ 92 73 43 40

[20] MAnet 89 73 stable-diffusion-2.1-base 30 28

[21] DeepLabv3+ 96 71 41 38

[22] FPN 94 72 36 33

29 25

VI. RESULTS σxy is the covariance of the pixel intensities between


The conducted experiments evaluate the performance of two images x and y. It quantifies the correlation or
tasks : image segmentation and image generation. Each task is interdependence among the pixels in both images. In order
evaluated with respect to the chosen metrics, and the outcomes to prevent instability in the SSIM equation when the means
are highlighted below and variances approach zero, c1 and c2 are introduced as
constants. These constants are small positive values
A. Segmentation implemented to prevent division by zero.
Segmentation models were used to segment the before and
after images containing both training and testing images. The
resulting binary masks were used to mask the after images
only irrespective of their categorization into the train or test
set by using pixel accuracy for evaluation (1), where n is the
number of correctly classified pixels And t is the total number
of pixels. Table II shows the results obtained from the ControlNet model.
The model was applied to images of the same resolution
(512×512) using different parameters: epochs of 10, 10, 10,
3, and 1, gradient accumulation steps of 16, 8, 16, 8, and 16;
learning rate of 1e-5 for all epochs except the last one, which
was set to 5e-6. The model shared the same number of training
Table I shows the results obtained on the segmentation task
batches, which was 1.
using different models.
All these models were applied to different encoders, namely
mobileone s4, ResNet34, ResNet152, Timm-gernet- m, and
ResNet34, using the same ImageNet pre-trained weights. IV. CONCLUSION
The models were trained with a resolution of 224×224 and
Both dentists and patients may find the treatment procedure
shared the same parameters of 10 epochs, a learning rate of
difficult. They frequently struggle to determine whether to
0.001, RAdam optimizer, a batch size of 16, and Binary
improve their smile and undertake smile makeover operations,
Cross Entropy as the loss function.
while dentists struggle to encourage them to proceed with
treatment. Furthermore, there is no assurance that patients
will be pleased with the result. Smile.AI, on the other hand,
B. ControlNet overcomes these challenges by enabling visual communication
After applying the stable-diffusion-2.1-base ControlNet and integrating patients in the creation of their own smiles.
model to our dataset, which includes both train and test This technique guarantees predictable treatment outcomes and
images in before-after pairs, SSIM (Structural Similarity raises the possibility of patients accepting the recommended
Index) is used for evaluation where x, y represent the two case. Furthermore, Smile.AI assists dentists in dealing with
images being compared. x is the “before” image, and y is the patients by teaching and persuading them about the benefits
“after” image. Also, µx, µy are the mean of the pixel of treatment, as well as lowering patient hesitation. One of
intensities in images x and y respectively. As for σ2 x and
σ2y, these are the variancesof the pixel intensities in images x
and y respectively. They measure the amount of dispersion
in the pixel values.

572
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS)

the difficulties encountered was the scarcity of dental imaging


for basic dental disorders before and after treatment. As a [19] Zhou, Zongwei, et al. ”Unet++: A nested u-net architecture for medical
image segmentation.” Deep Learning in Medical Image Analysis and
result, we took the initiative to personally collect data from Multimodal Learning for Clinical Decision Support: 4th International
dental clinics throughout the world and create contact with Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS
dentists and dental clinics. This was done to verify that we 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, Septem-
ber 20, 2018, Proceedings 4. Springer International Publishing, 2018.
had enough data to start working on our project. We are [20] Li, Rui, et al. ”Multiattention network for semantic segmentation of fine-
presently working on gathering a huge dataset appropriate resolution remote sensing images.” IEEE Transactions on Geoscience
for training our ControlNet model. We choose to employ and Remote Sensing 60 (2021): 1-13.
[21] Chen, Liang-Chieh, et al. ”Encoder-decoder with atrous separable con-
segmentation models and binary segmentation masks to tackle volution for semantic image segmentation.” Proceedings of the European
this difficulty. This method enabled us to work with the conference on computer vision (ECCV). 2018.
existing data and speed up the training process. the website. [22] Lin, Tsung-Yi, et al. ”Feature pyramid networks for object detection.”
Proceedings of the IEEE conference on computer vision and pattern
After addressing the errors, the model resumes applying for recognition. 2017.
photos and creating new images. Using segmentation models
such as Unet++, Manet, FPN, and Deeplab v3+ with binary
masks and a learning rate of 0.001, we obtained the following
results: 73%,73%,72%, and 71% respectively.
REFERENCES
[1] Tian, Sukun, et al. ”DCPR-GAN: dental crown prosthesis restoration
using two-stage generative adversarial networks.” IEEE Journal of
Biomedical and Health Informatics 26.1 (2021): 151-160.
[2] C. Wang, C. Xu, C. Wang, and D. Tao, “Perceptual adversarial networks
for image-to-image transformation,” IEEE Trans. Image Process., vol.
27, no. 8, pp. 4066–4079, Aug. 2018.
[3] J. Nilsson, “Understanding SSIM,” arXiv.org, Jun. 24, 2020.
[4] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro,
“High-resolution image synthesis and semantic manipulation with con
ditional gans,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp.
8798–8807
[5] D. S. K. Karunasingha, “Root mean square error or mean absolute error?
Use their ratio as well,” Information Sciences, vol. 585, pp. 609–629,
Mar. 2022, doi: 10.1016/j.ins.2021.11.036.
[6] R. Leonardi, et al., “Deep learning and computer vision: Two promising
pillars, powering the future in orthodontics,” Seminars in Orthodontics,
vol. 27, no. 2, pp. 62–68, Jun. 2021, doi: 10.1053/j.sodo.2021.05.002.
[7] L. Alzubaidi et al., “Review of deep learning: concepts, CNN architec-
tures, challenges, applications, future directions,” Journal of Big Data,
vol. 8, no. 1, Mar. 2021, doi: 10.1186/s40537-021-00444-8.
[8] J. Xiao and Z. Zhou, “Research Progress of RNN Language
Model,” 2020 IEEE International Conference on Artificial Intel-
ligence and Computer Applications (ICAICA), Jun. 2020, doi:
10.1109/icaica50127.2020.9182390.
[9] Zhou, Yuhongze, and Qinjie Xiao. ”Gan-based facial attractiveness
enhancement.” arXiv preprint arXiv:2006.02766 (2020).
[10] N. Diamant, “Beholder-GAN: Generation and Beautification of Facial
Images with Conditioning on Their Beauty Level,” arXiv.org, Feb. 07,
2019.
[11] M. Soloveitchik, T. Diskin, E. Morin, and A. Wiesel, “Conditional
frechet inception distance,” arXiv (Cornell University), Mar. 2021, doi:
10.48550/arxiv.2103.11521.arXiv:2006.02766 (2020).
[12] Zhang, Lvmin, and Maneesh Agrawala. ”Adding conditional control
to text-to-image diffusion models.” arXiv preprint arXiv:2302.05543
(2023).
[13] C. Schuhmann, “LAION-5B: An open large-scale dataset for training
next generation image-text models,” arXiv.org, Oct. 16, 2022.
[14] P. Esser, “Taming Transformers for High-Resolution Image Synthesis,”
arXiv.org, Dec. 17, 2020. https://arxiv.org/abs/2012.09841
[15] Saharia, Chitwan, et al. ”Photorealistic text-to-image diffusion models
with deep language understanding.” Advances in Neural Information
Processing Systems 35 (2022): 36479-36494.
[16] T.-Y. Lin et al., “Microsoft COCO: Common Objects in context,” in Lec-
ture Notes in Computer Science, 2014, pp. 740–755. doi: 10.1007/978-
3-319-10602-148.
[17] T. Xu, “AttnGAN: Fine-Grained Text to Image Generation with Atten-
tional Generative Adversarial Networks,” arXiv.org, Nov. 28, 2017.
[18] R. Buongiorno, D. Germanese, L. Colligiani, S. C. Fanni, C. Romei, and
S. Colantonio, “Artificial intelligence for chest imaging against
COVID- 19: an insight into image segmentation methods,” in Elsevier
eBooks, 2023, pp. 167–200. doi: 10.1016/b978-0-323-90531-2.00008-
4.

573
Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on February 27,2024 at 09:11:35 UTC from IEEE Xplore. Restrictions apply.

You might also like