Department of Information Technology
Artificial Intelligence
DC-323
Assignment 2
Dated: 3rd April, 2023
Section: Afternoon ‘A’
Submitted to: Sir Ali Raza
Submitted by: Group
Roll No(s):
BSCS-F20-A-:_______________
Natural language processing
techniques: Rule-based Vs
Machine-learning based
Abstract
Natural Language Processing (NLP) has become a prominent area of research, and it involves
developing tools and techniques that enable machines to understand, interpret, and generate human
language. The two most commonly used NLP techniques are rule-based systems and machine learning-
based systems. This research paper presents a comparative analysis of these two techniques, with a focus
on their strengths, weaknesses, and applications. The paper also discusses the current state-of-the-art
research in this area and identifies research gaps that need to be addressed.
Keywords: Natural Language Processing, Rule-Based Systems, Machine Learning-Based Systems,
Comparative Analysis.
I. Introduction
Natural Language Processing (NLP) is an interdisciplinary field that combines linguistics, computer
science, and artificial intelligence. NLP aims to enable machines to understand and process human
language. Two popular NLP techniques are rule-based systems and machine learning-based systems.
Rule-based systems use pre-defined rules and patterns to extract information from text, while machine
learning-based systems use statistical models and algorithms to identify patterns and extract meaning
from text.
The primary objective of this research paper is to compare these two techniques and identify their
strengths, weaknesses, and applications. Additionally, the paper aims to identify current research gaps in
this area and suggest areas for future research.
II. Problem Statement
The problem addressed in this paper is to compare rule-based and machine learning-based NLP
techniques to identify their strengths, weaknesses, and applications. This comparison will help in
understanding which technique is more suitable for different types of NLP tasks.
III. Literature Review
The literature review provides an overview of the current state-of-the-art research on rule-based and
machine learning-based NLP techniques. Several research articles were analyzed to understand the
strengths, weaknesses, and applications of these techniques.
A. Rule-Based Systems
Rule-based systems are widely used for NLP tasks that involve specific rules and patterns. These
systems are effective in extracting structured information from text, such as dates, names, and addresses.
Rule-based systems use domain-specific knowledge and expertise to create rules that can extract
information from text. However, they are less effective in handling complex and ambiguous text. Rule-
based systems also require manual intervention to create and update rules, which can be time-consuming
and costly.
B. Machine Learning-Based Systems
Machine learning-based systems are effective in handling complex and ambiguous text. These systems
use statistical models and algorithms to identify patterns and extract meaning from text. They are
particularly useful for tasks such as sentiment analysis, topic modeling, and text classification. Machine
learning-based systems require large amounts of labeled data for training, which can be a challenge for
some NLP tasks.
The literature review also revealed that hybrid systems, which combine rule-based and machine
learning-based techniques, are becoming increasingly popular in NLP. Hybrid systems can leverage the
strengths of both rule-based and machine learning-based techniques to achieve better performance on
NLP tasks.
IV. Methodologies
To compare rule-based and machine learning-based NLP techniques, several research articles that used
these techniques for different NLP tasks were analyzed. The articles were selected based on their
relevance to the problem statement and the quality of their methodology and results.
For each article, the NLP task, the NLP technique used, the dataset used, and the results obtained were
identified. The strengths and weaknesses of each technique were analyzed, and their performance on
different NLP tasks was compared.
Scenario-based observation:
Case Scenario I (Collecting Data from Social Media) (Murphy RM, 2023):
Here is a scenario of a rule-based NLP technique for mining social media data to extract adverse drug
events (ADEs) taken from a book (Sun, H., Zhang, X., Xing, C., & Zhang, D. (2018).). Any unintended,
negative impact of a medication is referred to as an ADE. Although social media sites like Twitter and
Facebook offer a wealth of information on ADEs, manually extracting this data is time-consuming and
costly.
The authors created a rule-based NLP technique that mines social media data for knowledge about ADE
using regular expressions. On a dataset of tweets mentioning five widely used drugs, the technique was
assessed. The findings demonstrated that the method had an accuracy of over 80% for extracting data
pertaining to ADE.
Although the systems' overall performance was usually excellent, it was found that focusing solely on
the ADE entity or ADE relation class resulted in a significant decline in performance. This is due to the
fact that non-ADE entities, such as drug names, are fairly consistent in the data ("furosemide" will
always refer to a drug name in the text), but this is not true of ADEs, as "cough" can be an ADE in the
context of Lisinopril, an indication in the context of codeine, or a symptom in the context of
tuberculosis. This makes it harder to locate an ADE. The capacity of systems to recognize these ADE
mentions in the text is more crucial than overall performance because we are interested in detecting
ADEs. Because overall performance artificially inflates the perception that ADEs can be identified in
the text, it is crucial to consider performance on the ADE class when evaluating performance rather than
just overall performance. This supports our claim that, in order to enable accurate and consistent
labeling of the data, an ADE should be depicted in data annotation as a relation between a drug and non-
drug entities.
Case Scenario II (Detection of cognitive impairment) (Charles P. Larson, 2019):
This case is about AI-based NLP technique for the automated detection of cognitive impairment in
elderly individuals using non-linguistic vocal biomarkers. The technique uses machine learning
algorithms to analyze features of the voice, such as pitch, jitter, and shimmer, to detect subtle changes
that may be indicative of cognitive impairment. The authors evaluated the technique on a dataset of
audio recordings from elderly individuals with and without cognitive impairment and achieved an
accuracy of 82% in detecting cognitive impairment. The study demonstrates the potential of AI-based
NLP techniques in early detection of cognitive impairment and improving clinical outcomes for elderly
individuals.
Case Scenario III (Detection of dementia) (Renjie Li, 2022):
A smart environment can be used to identify wandering behaviors, which are a symptom of early
dementia (Batista, 2016). Position data from GPS, infrared, and RFID proximity devices were used to
calculate the trajectory. To differentiate between wandering and non-wandering patterns, the movement
data was put into machine learning techniques. They came to the conclusion that it is still difficult to
identify wandering behaviors in intelligent environments because of the environment's unpredictability
and individuals' distinctive behavioral patterns.
In order to identify wandering behavior, Khodabandehloo (Khodabandehloo, 2020) programmed
ambient sensors to track people's actual activities. In order to accomplish two tasks—trajectory
segmentation and wandering episode detection—they used a collaborative learning technology. Machine
learning techniques were used to identify wandering episodes based on the trajectory segmentation of a
set of location data collected by ambient sensors. The random forest model outperformed a variety of
machine learning techniques with a total accuracy of 80.7% in the case of personalized selection and
70.5% in the case of non-personalized selection.
In a similar manner, Lotfi (Ahmad Lotfi, 2012)installed passive infrared sensors (PIR) throughout the
house, including flood sensors, bed/sofa pressure sensors, electricity usage sensors, and door entry-point
sensors, to forecast subjects' abnormal behavior through movement analysis. Different actions were
categorized using the echo state network (ESN). The training's root mean square error (RMSE) for the
kitchen and back entrance sensors, respectively, was about 1% and 7%.
In conclusion, capturing everyday behaviors in smart home environments may eventually offer a brand-
new way to identify the earliest signs of dementia. The diverse events and the complex surroundings,
however, present more difficulties. Other problems that need to be investigated include privacy and
public acceptance.
Case Scenario V: (Erica Chatbot) (Taylor, 2016):
In 2018, Bank of America released Erica, a robot with rule-based NLP. Through a conversational
interface, Erica was created to assist users in managing their money, making payments, and accessing
account information. To comprehend client inquiries and give pertinent answers, Erica combines rules-
based algorithms with natural language processing. Checking account balances, transferring money
between accounts, and setting up bill payments are just a few of the many duties it can manage. The
capability of Erica to offer proactive financial guidance is one of its key characteristics. As an
illustration, if a client's checking account balance is low, Erica might advise ways to cut costs or move
money from another account to prevent overdraft fees.
The bank’s customers did more than 246 billion payments in the third quarter. Mobile banking
customers logged into their accounts more than 950 million times — or 46 times per user — over that
same period. With all that data, BofA has a good idea what sort of queries to program into the algorithm
behind Erica’s brain, said Moore. The chatbot now has the ability to report a lost or stolen card and seek
an increase in credit limit, among other new features that the bank has continued to add. A benefit of
using a rule-based NLP system like Erica is that it can be trained on a particular topic, like banking, to
increase accuracy and efficacy. Rule-based systems can, however, struggle to handle more complicated
or nuanced language and may not be able to handle queries outside of their predefined domain.
V. Results and Discussion
The results of the analysis showed that both rule-based and machine learning-based NLP techniques
have their strengths and weaknesses, and their suitability depends on the specific NLP task. Rule-based
systems are effective in handling structured information and are easy to understand and interpret while
machine learning-based systems are more adaptive in collecting unstructured and raw data and using it
for decision making. All the scenarios above show that both the systems aid in many tasks but may also
have some drawbacks. However, both provide great facilities in major parts of human lives and are
highly involved in medical, educational, economical and many more aspects.
VI. References:
Lotfi, A., Langensiepen, C., Mahmoud, S. M., & Akhlaghinia, M. J. (2012). Smart homes for the
elderly dementia sufferers: identification and prediction of abnormal behaviour. Journal of
ambient intelligence and humanized computing, 3, 205-218.
Khodabandehloo, E., & Riboni, D. (2020). Collaborative trajectory mining in smart-homes to
support early diagnosis of cognitive decline. IEEE Transactions on Emerging Topics in
Computing, 9(3), 1194-1205.
Batista, E., Casino, F., & Solanas, A. (2016, July). On wandering detection methods in context-
aware scenarios. In 2016 7th International Conference on Information, Intelligence, Systems &
Applications (IISA) (pp. 1-6). IEEE.
Murphy, R. M., Klopotowska, J. E., de Keizer, N. F., Jager, K. J., Leopold, J. H., Dongelmans,
D. A., ... & Schut, M. C. (2023). Adverse drug event detection using natural language
processing: A scoping review of supervised learning methods. Plos one, 18(1), e0279842.
Taylor, H. (2018). Bank of America launches AI chatbot Erica: Here’s what it does. 2016.
https://www.cnbc.com/2016/10/24/bank-of-america-launches-ai-chatbot-erica--heres-what-it-
does.html