Data structures for Language
processing
Agenda
Classification
Search Data Structure
Other Organization
Fixed Size Record
Variable Size Record
Hybrid Record
Tree Representation
Hashed Representation
Allocation Data Structure
Stack & Extended Stack
Heap
Classification
1. Based on nature ---- Linear and Non-linear
eg :- Linear = array , stack etc.
Non-Linear = Tree , Graph etc.
2. Based on Purpose --- Search and allocation
eg :- Search = Binary search tree
allocation = stacks,heaps
3.Based on Lifetime ---- whether used during Language Processing or during
target program executions
eg :- Lang. Processing = Object based data model
Target program = Hash tables
Search Data Structures
A Search data structure (or search structure ) is a
set of entries accommodating the information
concerning one entity. Each entity is assumed to
contain a key field which forms the basis for search .
Search Data Structures.. 1
Fixed Size Record
Variable Size Record
Hybrid Record
Search Data Structures.. 2
Fixed Size Record
Variable Size Record
Hybrid Record
Each entry has same type and size
Eg Array
Search Data Structures.. 3
Fixed Size Record
Variable Size Record
Hybrid Record
Type and size of each record could be different
Search Data Structures.. 4
Fixed Size Record
Variable Size Record
Hybrid Record
Entry has both fixed length part and variable length
part
Entry Format
Generic Search Procedure for locating
the entry of symbol
Binary Search Organization
Hash Tables
h is the Hashing function.
S is the symbols for entry
S(e) is current entry symbol
Hashing Function
Hashing function is used to make search system faster.
It transforms the source symbol or group of symbols to
numerical numbers to make faster comparisons and searching
Hashing do not change the original meaning of symbols it just
transforms them to other form.
Size is pre decided for transforming message to particular
format
If message is of less size than that size , it performs folding
operation
In folding message is padded with 0s to complete the size of
it.
Properties of good hashing func.
Collision in hashing
Many function result into same number generation which leads to collision of
numbers and searching will crash
Thus to avoid collision we have various collision handling techniques
1. Rehasing technique
2. Overflow chaining technique
Allocation Data Structure
Important Allocation Data Structures
Stack & Extended Stack
Heap
Stacks
Extended Stack model
An extended stack is needed for handling a variable
length record . A record consists of a set of
consecutive stack entries
In addition to base and top a new pointer Previous is
used.
Heaps
Use of Heap in Memory management
Due to repetition of allocation and deallocation of
memory area holes are created in memory area.
Memory management takes care of this holes and
reallocate this area by managing it properly
It increases performance and speed of allocation and
deallocation of memory spaces