IMPORTANT MILESTONES OF NLP
The journey of NLP began in the 1950s when Signal Processing Scientists started processing
speech signals. Machine translations works started during that period. Some of the landmark
work in NLP is listed below.
1950 Automatic translation from Russian to English using the IBM 701 mainframe
computer. They used statistics and grammatical rules for translation.
1956 McCarthy coined the term "Artificial Intelligence" in Dartmouth Conference.
1957 Chomsky's language models made a remarkable change in the field of NLP
1958 McCarthy introduced LISP programming language
1963 Giuliano introduced Automatic language processing concept
1964 ELIZA(NLP computer program) was developed at MIT by Joseph Weizenbaum
1966 Halted research on machine translation as it was very costlier than human
translation
1970 SHRDLU (NL understanding computer) project for rearranging blocks by Terry
Winograd was able to understand sentences like 'Put the blue cube on the top of the red
cube.
1975 Parsing program for automatic text to speech
1979 N-gram concept
1981 Knowledge-based machine translation
1982 Concept of the chatbot was created, and the project Jabberwacky began
1985-1990 Natural language processing using Knowledge Base
1990 Speech recognition using HMM
1992 Neural net for Knowledge extraction
1995 Use of linguistic patterns for knowledge-based information extraction
1998 Classification of text documents
1999 New methods for syntax and semantic analysis
By the beginning of the 21st century, NLP research becomes more advanced with the evolution
of modern computers, increased computation power, and memory. The new technologies such as
Machine learning, Neural network, probabilistic methods, statistics were used to develop NLP
applications with cognitive abilities. Apples' SIRI, IBM Watson are examples of the system with
cognitive skills. Natural Language Processing research is continuing to find more efficient
methods for Natural Language Processing, Natural language Understanding & Natural Language
generation.
1950s
The Birth of NLP: In the 1950s, computer scientists began to explore the possibilities of
teaching machines to understand and generate human language.
1960s-1970s
Rule-based Systems: During the 1960s and 1970s, NLP research focused on rule-based
systems. These systems used a set of predefined rules to analyse and process text
1980s-1990s
Statistical Approaches and Machine Learning: In the 1980s and 1990s, statistical
approaches and machine learning techniques started gaining prominence in NLP. One
groundbreaking example during this period is the development of Hidden Markov
Models (HMMs) for speech recognition
2000s-2010s
Deep Learning and Neural Networks: The 2000s and 2010s witnessed the rise of deep
learning and neural networks, propelling NLP to new heights. One of the most significant
breakthroughs was the development of word embeddings, such as Word2Vec and GloVe
2017
In 2017, Google introduced Google Translate’s neural machine translation (NMT)
system, which used deep learning techniques to improve translation accuracy. The system
provided more fluent and accurate translations compared to traditional rule-based
approaches. This development made it easier for people to communicate and understand
content across different languages.
Present Day
Transformer Models and Large Language Models: In recent years, transformer
models like OpenAI’s GPT (Generative Pre-trained Transformer) have made significant
strides in NLP. These models can process and generate human-like text by capturing the
contextual dependencies within large amounts of training data.
Language and Grammar-Processing Indian Languages
language and grammar are fundamental for understanding and generating human
language.
Grammar in NLP is a set of rules for constructing sentences in a language used to
understand and analyze the structure of sentences in text data.
Grammar is defined as the rules for forming well-structured sentences and plays an
essential role in denoting the syntactical rules that are used for conversation in natural
languages
This includes identifying parts of speech such as nouns, verbs, and adjectives,
determining the subject and predicate of a sentence, and identifying the relationships
between words and phrases.
To make computers understand language, they must have a structure to follow.
Syntax is the branch of grammar dealing with the structure and formation of sentences in
a language governing how words are arranged to form phrases, clauses, and sentences.
Regular languages and parts of speech refer to how words are arranged together but
cannot support easily, such as grammatical or dependency relations.
Representation of Grammar:
Any Grammar can be represented by 4 tuples –
• N – Finite Non-Empty Set of Non-Terminal Symbols.
• T – Finite Set of Terminal Symbols.
• P – Finite Non-Empty Set of Production Rules.
• S – Start Symbol (Symbol from where we start producing our sentences or strings).
Grammar is basically composed of two basic elements:
• Terminal Symbols: Terminal symbols are those that are the components of the sentences
generated using grammar and are represented using small case letters like a, b, c, etc.
• Non-Terminal Symbols: Non-terminal symbols are those symbols that take part in the
generation of the sentence but are not the component of the sentence. Non Terminal Symbols are
also called Auxiliary Symbols and Variables. These symbols are represented using a capital letter
like A, B, C, etc.
Types of Grammar in NLP
Three types of grammar: context-free, constituency, and dependency.
Key Aspects of Language and Grammar in NLP
1.Syntactic Analysis
Syntactic analysis in natural language processing (NLP) refers to the process of analyzing the
structure and grammar of a sentence or text in order to understand the relationships between
words and phrases. This analysis involves identifying the syntactic categories of words (such as
nouns, verbs, adjectives, etc.) and how they are organized in a sentence according to the rules of
a given language.
2.Parsing
Parsing techniques in Natural Language Processing (NLP) are methods used to analyze and
understand the structure of sentences and text. They break down sentences into their grammatical
components, revealing the relationships between words and phrases
3. Context-Free Grammar
In Natural Language Processing (NLP), a Context-Free Grammar (CFG) is a set of rules
defining the syntactic structure of a language. It allows for the generation of all possible well-
formed sentences by describing how symbols (words and phrases) can be combined to form valid
sentences.
4. Dependency Grammars:
Dependency grammars represent sentence structure as a tree where words are linked based on
their grammatical relationships.
5. Semantic Analysis
Semantic analysis in Natural Language Processing (NLP) focuses on understanding the meaning
of words, phrases, and sentences in context, going beyond simply analyzing individual words or
their grammatical structure.