Pulsating STM - The In-Memory Optimistic Concurren
Pulsating STM - The In-Memory Optimistic Concurren
Published By:
Retrieval Number: A9525109119/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.A9525.109119 1966 & Sciences Publication
Pulsating STM – The in-memory Optimistic Concurrency Control Technique for Multi Core Systems
the elements in the read/ write set respectively or it has to based validation and timestamp based validation. It involves
abort and restart. Aborting and restarting is a costlier affair. the use of global version locks. The second characteristic is
Also having a centralized timestamp manager may become encounter time lock sorting. This feature is introduced in
the reason of bottlenecks in a highly loaded system. order to avoid livelocks. The third characteristic is coalesced
Timestamping alone, is therefore not a very good solution read/write set organization in which the read/write set of all
always. Optimistic concurrency control is as the name transactions within a warp are merged for reducing the
suggests, optimistic in its approach towards accessing and overhead of transaction bookkeeping.
writing the elements in a shared datastructure. This technique In [6], the authors have studied that prior ways to amortize
assumes that initially all the transactions are allowed to access the commit latencies in GPU SIMT applications, for example
and update the data elements in the shared datastructure and reducing the transactional warps to very few per SIMT core,
then the validity of the transaction is decided later before the aborting and restating the transactions; have resulted in poor
final commit. Being optimistic, this technique ensures performance and not actual reduction in commit latencies.
maximum threads or transactions participating in the system The authors have thus, proposed a new GPU Hardware TM
thus increasing the concurrency. However, in a highly called GETM (GPU TM) based on eager conflict detection
contended workload there will still be a number of aborts and and lazy version management. The solution is based on
restarts. The challenge of a good concurrency control timestamping and lock mechanism.
algorithm is to minimize these aborts and restarts and to In [7], the authors have presented APUTM, a transactional
ensure maximum concurrency among transactions or threads. memory approach for Accelerated Processing Units (APUs).
Multi version concurrency control is an Optimistic Here they have deployed the concept of minimizing the access
concurrency control technique. In this, the in memory data to the shared memory for reducing the conflicts among
objects are assumed to have multiple versions each. This is transactions. They have adopted a lazy conflict detection and
done to ensure maximum concurrency. The transactions have lazy version management approach. One implementation is
freedom to update as many versions as they want concurrently based on global sequence lock for reducing the commit
if they happen to pass the validity test. latency of the transactions and the other implementation
checks the transactional conflicts by using a private read set.
II. RELATED WORK In [8], the authors have reviewed the currently existing
Lot of work has been done on Transactional Memory and concurrency control techniques for in-memory databases.
much is still going on. Some of the relevant study made is Three such techniques have been discussed with their pros
mentioned below: and cons. These are Cicada[9], MOCC[10], TicToc[11].
In[3], Nir Shavit and Dan Touitou invented a Software MOCC is based on optimistic concurrency control with a
Transactional Memory as a novel method towards translating slight variation by using the concept of temperature to acquire
sequential implementations of objects into highly concurrent selective read locks and minimize aborts. TicToc is a
non-blocking ones using k-word compare&swap timestamp ordering scheme and is optimistic in nature. The
STM-transaction. The work was based on multi-processors. commit timestamp for a transaction is calculated dynamically
Mohammed El-Shambakey and Binoy Ravindran have and is allotted just before the commit point. Cicada is
studied the SoftwareTransactional Memory approach in real optimistic, multi version and multi clock concurrency control
time embedded system in [1]. They have analytically scheme.
established the upper bounds on the transactional retry and In [12], the authors of NEMO, a NUMA-aware TM
response time. algorithm have proposed a well optimized solution for
Bratin Saha et.al. have presented a novel high performance providing scalability to applications running in NUMA
software transactional memory system for a multi-core architectures. NEMO is tested using well-known and
runtime in [4]. This paper has done detailed study of the synthetic OLTP transactional workloads. The authors have
various STM tradeoffs like optimistic concurrency control performed two tests whose results form the basis of the design
versus pessimistic concurrency control; undo logging versus of NEMO. In the first test various STM algorithms TL2[13],
write buffering and object based versus cache line based SwissTM[14], TinySTM[15], RingSTM[16], and NOrec[17],
conflict detection. Also the authors have developed the novel implementing a version of the Bank benchmark, partition the
STM designs that works in cooperation with other accounts across different NUMA zones and threads operate
components of McRT system to prevent blocking of active only on accounts stored in their local NUMA zones. The test
transactions through inactive transactions. The McRT STM is shows that on incrementing the number of threads beyond 16,
read versioning and undo-logging system and implements the algorithms cease to scale and the cost of updating the
both object-based conflict detection and cache-line based global metadata at this point also goes up significantly. In the
conflict detection. The scheme is based on locking and second test, the authors have calculated the latency required
versioning of the locks. for incrementing the logically shared timestamp through
Yunlong Xu et.al. have developed a STM based technique Compare-and-Swap. They have deployed two configurations:
for GPU based systems in [5]. The authors claim that their one in which there is a single timestamp in one NUMA zone
technique is free from livelocks and is scalable. The technique incremented by multiple threads; the second configuration in
involves three characteristics. The first characteristic is which there are 8 timestamps
Hierarchical validation that implements the conflict detection. located in 8 different NUMA
Hierarchical validation is said to be the combination of value zones and incremented by 8
threads in their local NUMA zones. The result shows that the a) sets the read and write timestamps of its write set to its own
former configuration provides almost no scalability due to commit timestamp.
heavy traffic in case of high number of threads. The latter b) Copies the write set to the shared memory.
configuration, on the other hand provides better scalability
even when CAS primitive is used. IV. DESIGN OF PULSATINGSTM
PulsatingSTM is a timestamp based optimistic concurrency
III. PULSATINGSTM
control system. The main features of this design are:
In an effort for developing a more efficient concurrency 1. Avoids deadlock as there are no locks.
control scheme for in-memory data structures, the authors 2. Being optimistic, it allows all the threads to access and
have developed PulsatingSTM. This STM approach is lock as update the copy of the element in the datastructure without
well as deadlock free scheme. It is primarily inspired from worrying about any conflicts, thus increases the concurrency
TicToc[11], the timestamp ordering scheme for in-memory among threads.
databases. But it is better than TicToc as there are no locks in 3. Each element maintains a metadata. This metadata stores
it. Also TicToc is for in-memory databases, whereas the read and write timestamps of the element, data held in the
PulsatingSTM is for in-memory data structures. element and a pointer in the original data structure. A thread
PulsatingSTM is primarily based upon the optimistic (transaction) accessing this element will use its metadata.
concurrency control scheme and is free from the overheads of 4. Every transaction has certain timestamp associated with
centralized timestamp manager. Here the commit timestamp itself which the unique value allotted to it when it enters the
of a transaction is computed not early than the commit phase. system.
The authors have adopted lazy conflict detection[6]. The 5. Every transaction has a read and write set associated with
three phases in this scheme are: (i) Read phase, (ii) Validation itself. The read set contains all the copies of elements that the
phase and (iii) Write phase. transaction has read and write set contains all the copies of
elements that it has updated along with their updated values.
A. Read Phase
6. The commit timestamp of the transaction is computed late
In this phase the threads as transactions are allowed to read in the execution just before commit from the read and write
the elements from the shared memory into their private read/ timestamps of elements in its read/ write set.
write set based upon the purpose of access. If the access is Due to the property of Isolation of Transaction, they do not
read access, the thread reads that element in its private read interfere with each other’s read/write sets. Since the write
set. It notes down the read and write timestamp of that element operation is atomic in nature, the transaction will roll back on
in the set, displays the element and sets the pointer to the reading an invalid version or incorrect data. Consistency is
element in the shared array. If the access is the write access, maintained as the transactions are serializable.
the thread reads the element in its private write set, notes Each node of the in-memory datastructure has some metadata
down read and write timestamps of the element, updates the associated with it which gives information about the
element in the write set, and sets the pointer to the element in read/write timestamp, a pointer to the node in original data
the shared array. structure and the data value of the node, as mentioned above.
B. Validation Phase The read/ write timestamp associated with the elements of the
datastructure is the timestamp value of the last committed
In this phase the commit timestamp of the transaction is
transaction that read or wrote that element. These metadata
calculated based upon the read and write timestamps of the
are tabulated in table1.
elements in its read/write set. It has following three major
Table- I: Metadata of a node and of a transaction in
steps:
PulsatingSTM
a) Firstly, the transaction’s current timestamp is checked
rtime Read timestamp
against the read and write timestamps of the element in the
wtime Write timestamp
read set of that transaction. The transaction’s timestamp must
be greater than write timestamp and less than read timestamp. point Pointer to the node in the original data
If not then it is adjusted to some value abiding this constraint. structure
b) Secondly, the validation is done against the read set. If the data Data value of the node
transaction’s timestamp is in between write timestamp and tranread [ ] An array maintaining read set of the transaction
read timestamp for every element in the read set, then it is tranwrite [ ] An array maintaining write set of the
assumed that the transaction has read a valid version, else it is transaction
assumed that the version read by the transaction is invalid. In
timestamp Timestamp associated with the transaction
that case the changes made by the transaction in its write set
when it enters the system
are rolled back.
c) Thirdly, if the transaction has read a valid version, then its
final commit timestamp is calculated which should be greater
than the read timestamp of all the elements in the write set.
Algorithm 1 shows the BeginTX, ReadTX, ValidateTX,
C. Write Phase
and WriteTX procedures of
After successful validations, each transaction does the pulsatingSTM.
following: BeginTX begins by allotting
Published By:
Retrieval Number: A9525109119/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.A9525.109119 1968 & Sciences Publication
Pulsating STM – The in-memory Optimistic Concurrency Control Technique for Multi Core Systems
Published By:
Retrieval Number: A9525109119/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.A9525.109119 1970 & Sciences Publication
Pulsating STM – The in-memory Optimistic Concurrency Control Technique for Multi Core Systems
and write phase covers around 78% of CPI. The total time concurrency control technique based on timestamping. It
taken by this algorithm with 50 threads is 1.095 milliseconds. employs lazy conflict detection among the threads. The
algorithm is better than lock based protocols and other STM
VI. EVALUATION protocols as it is free from locks. The algorithm is built on
We have closely observed the results obtained on Sniper by multiple threads doing the same job of reading and writing. It
comparing the values obtained by running PulsatingSTM with is optimistic as all the threads are allowed to read a copy of
16, 32, 40 and 50 threads on 64 cores employing the data from the shared memory in their private read/write sets
gainestown configuration. The results suggest that on and perform write operation in their private set. Only when
increasing the number of threads, the throughput is the writing is complete, the validation phase begins. In the
increasing. This is attributable to the fact that as the thread validation phase, transaction’s timestamp is validated and its
count increases, the branch misses, L1, L2 cache misses and commit timestamp is calculated just before the write phase.
DRAM access reduces, thus, giving a better performance. The The algorithm is run on 64 cores using 16, 32, 40 and 50
results obtained on running the PulsatingSTM on sniper are threads on sniper, the multi-core simulator. The results
tabulated in Table 2. obtained show that the throughput obtained on running this
algorithm increases with increase in the number of threads.
Table-II: Parametric values from sniper for running
VIII. FUTURE WORK
PulsatingSTM employing different number of threads on
64 cores The authors next will implement the algorithm with
different number threads doing different jobs of reading and
Threads
writing. The authors have proposed a multi-version flavor of
16 32 40 50
this algorithm in the upcoming work wherein each element of
Instructions 5.580 19.35 29.25 44.25
the shared data structure will have multiple versions and
m m m m
threads writing to it will be writing new versions on the data
IPC 0.064 0.145 0.186 0.237
structure. Also, the authors propose employing this algorithm
Cycles 1.352 2.093 2.455 2.913
m m m m to techniques like parallel sorting of enormous arrays and
Time 508.3 786.7 922.8 1.095 come up with the results.
μs μs μs ms
Branch 1.997 0.855 0.655 0.508 REFERENCES
MPKI 1. El-Shambakey, Mohammed and Binoy Ravindran, ―STM concurrency
L1-I MPKI 1.090 0.577 0.469 0.302 control for multicore embedded real-time software: time bounds and
tradeoffs.‖ In Proceedings of SAC (2012), Riva del Garda, Italy,
L1- D MPKI 1.291 0.596 0.470 0.371 March 25-29, 2012, pp. 1602-1609.
L2 MPKI 2.202 1.115 0.899 0.725 2. Yan Solihin, Fundamentals of Parallel Multi core systems, Broken
DRAM 0.912 0.402 0.312 0.249 Sound Parkway NW: CRC Press, Taylor and Francis Group, 2016.
APKI 3. Nir Shavit and Dan Touitou, ―Software Transactional memory.‖ In
Proceedings of the 14th Annual ACM Symposium of PODC 95,
IPC: Instructions Per Cycle, MPKI: Misses Per Kilo Instructions, Ottawa Ontario CA, August 20-23, 1995, pp. 204-213.
L1-I: Instruction level L1 Cache, L1-D: Data level L1 Cache, L2: L2 4. Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao
cache, DRAM: Dynamic Random Access Memory, APKI: Access Minh, Benjamin Hertzberg, ―McRT-STM: A High Performance
Per Kilo Instructions Software Transactional Memory System for a Multi-Core Runtime.‖,
In Proceedings of 11th ACM SIGPLAN symposium on PPoPP, New
Fig. 5 shows the increase in throughput by running York, NY, USA., ’06 March 29-31, 2006, pp. 187-197.
5. Yunlong Xu, Rui Wangy, Nilanjan Goswamiz, Tao Liz, Lan Gaoy,
PulsatingSTM on higher number of threads. Depei Qian, ―Software Transactional Memory for GPU Architectures‖,
In Proceedings of IEEE/ACM International Symposium on CGO ’14,
Orlando, FL, USA, February 15 - 19 2014, pp. 1
6. Xiaowei Ren and Mieszko Lis, ―High-performance GPU
Transactional Memory via Eager Conflict Detection‖, In Proceedings
of 2018 International Symposium on High Performance Computer
Architecture, Vienna, Austria, Feb 24-28, 2018, pp. 235-246
7. Alejandro Villegas , Angeles Navarro, Rafael Asenjo, Oscar Plata,
―Toward a software transactional memory for heterogeneous
CPU–GPU processors‖ The Journal of Supercomputing,
https://doi.org/10.1007/s11227-018-2347-0, pp. 1-16
8. Sana Jafar, Pankaj Kumar, Ranjana Rajnish, ―Reviewing the Current
Concurrency Control Techniques in Multi and Many core systems‖, In
Proceedings of the 12th INDIACom; INDIACom-2018 5th 2018
International Conference on ―Computing for Sustainable Global
Development‖, Bharati Vidyapeeth’s Institute of Computer
Applications and Management (BVICAM), New Delhi (INDIA),
March 14th – 16th, 2018, pp. 525-530.
Fig. 5. Throughput Versus Number of Threads
the 2017 ACM International Conference on Management of Data secured a second position in women badminton in the annual sports meet of
SIGMOD, Chicago, Illinois, USA, May 14 - 19, 2017, pp. 21 – 35 Amity University Lucknow (Sangathan) in 2015.
10. T. Wang, and H. Kimura, ―Mostly-Optimistic Concurrency Control for Sana Jafar is diligently working towards inventing innovative and efficient
Highly contended dynamic workloads on a thousand cores‖, In ways for improving concurrency control methods in multi and many core
proceedings of VLDB Endowment, vol. 10. No. 2., 2016, pp. 49-60. systems using STM and optimistic methods.
11. X. Yu, A. Pavlo, D. Sanchez, and S. Devadas, ―TicToc: Time travelling
Optimistic Concurrency Control‖, In Proceedings of the 2016
International Conference on Management of Data SIGMOD, San
Francisco, California, USA, June 26 - July 01, 2016, pp. 1629-1642. Dr. Ranjana Rajnish is an Assistant
12. Mohamed Mohamedin, Sebastiano Peluso, Masoomeh Javidi Kishi, Professor at Amity Institute of Information
Ahmed Hassan, Roberto Palmieri ― Nemo: NUMA-aware Concurrency Technology at Amity University,
Control for Scalable Transactional Memory‖, In Proceedings of 47th Lucknow. Dr. Ranjana possesses
International Conference on Parallel Processing, Eugene, OR, USA, approximately 25 years of experience in
August 13–16, 2018, Article No. 38. academics/research. She has been engaged
13. Dave Dice, Ori Shalev, and Nir Shavit, ―Transactional Locking II.‖, In with institutions like U.P. Technical
Proceedings of the 20th international conference on Distributed University and Amity University in roles
Computing, Stockholm, Sweden, September 18 - 20, 2006 , pp. ranging from a faculty in computer science to Academic Head. Her area of
194–208. interest includes Software Engineering, Opinion Mining/Sentiment Analysis
14. Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka, ― and Healthcare.
Stretching Transactional Memory‖, In Proceedings of the 30th ACM She has several publications in national and international journals and
SIGPLAN Conference on Programming Language Design and conference proceedings of National and International Conferences of repute.
Implementation, Dublin, Ireland, June 15 - 21, 2009, pp. 155-165. She is also member of various professional bodies like Computer Society of
15. Pascal Felber, Christof Fetzer, and Torvald Riegel, ―Dynamic India (CSI), Association of Computing Machinery(ACM), International
Performance Tuning of Word-based Software Transactional Memory‖, Association of Engineers (IAENG), Internet Society and Computer Science
In Proceedings of the 13th ACM SIGPLAN Symposium on Principles Teaching Association (CSTA).
and practice of parallel programming, Salt Lake City, UT, USA, Along with being a committed teacher and a passionate researcher, Dr.
February 20 - 23, 2008, pp. 237–246. Ranjana is reviewer for various International Journal and member of
16. Michael F. Spear, Maged M. Michael, and Christoph von Praun, editorial board for different International Journals. She is also reviewer,
―RingSTM: Scalable Transactions with a Single Atomic Instruction‖, member of technical programme committee in various conferences of repute
In Proceedings of the twentieth annual symposium on Parallelism in in and outside India. She has many Ph.D. scholars pursuing Ph.D. under her.
algorithms and architectures, Munich, Germany, June 14 - 16, 2008,
pp. 275–284.
17. Luke Dalessandro, Michael F. Spear, and Michael L. Scott, ―NOrec: Dr. Pankaj Kumar is currently working as
Streamlining STM by Abolishing Ownership Records‖. In Proceedings Assistant Professor (Reader) in Department
of the 15th ACM SIGPLAN Symposium on Principles and Practice of of Computer Science & Engineering in Sri
Parallel Programming. Bangalore, India, January 09 - 14, 2010, pp. Ramswaroop Group of Professional College,
67–78. Lucknow. He has more than 18 years of
18. [18] T. E. Carlson, W. Heirman, and L. Eeckhout., ―Sniper: Exploring teaching experiences. He received his MCA
the level of abstraction for scalable and accurate parallel multi-core degree in 2001, M.Tech in 2010 and PhD
simulations‖, In Proceedings of International Conference on High degree in Computer Application in 2011. His
Performance Analysis, Networking, Storage and Analysis, Seatle, WA, Area of Expertise is Parallel Computing/
USA, Nov. 12-18, 2011, pp. 1-12. Mining/Security. More than 50 research papers of Dr. Pankaj Kumar have
been published in various national/international journals and IEEE
proceeding publication. He is Senior Member of IEEE, Professional Member
AUTHORS PROFILE
of ACM and Life member of CSI, IETE, ISTE, IAENG, ISOC and IACSIT.
He is member of Management Committee of CSI and IETE Lucknow
Sana Jafar is currently working as an IT Chapter. He is reviewer for various International Journal and member of
consultant with Argus Technology LLC. She editorial board for different International Journals. He also participated in
is a research scholar in the faculty of various conferences as reviewer, member technical committee, and co-chair.
Information Technology from Amity One PhD thesis is awarded and eight students are enrolled as PhD scholar
University Uttar Pradesh Lucknow Campus, under his guidance. More than 10 students are guided by him in M.Tech
enrolled since January 2015. She has worked Thesis.
as an Assistant Professor (Computer Science
& IT) in the Department of Amity School of
o Engineering and Technology, Amity
University Uttar Pradesh Lucknow Campus from 2009 till 2018. She
completed her MCA with silver medal and received her degree with honors
in 2009. Her area of research is Parallel Computing and High Performance
Computing. She is a student member of IEEE. She has 4 papers published
and presented in IEEE sponsored International and National conferences and
one book chapter published in Scopus Indexed Ebook series titled
―Advances in Parallel Computing‖, IOS Press, Netherlands. Sana Jafar has
Participated in the Short Term Course (under QIP IIT Delhi) on many core
parallel Programming at IIT Delhi from 4th June -15th June 2018., learning
hands on Nvidia CUDA: API for parallel programming in GPU based
architecture and accessed the HPC clusters at IIT Delhi (PADUM). She has
also worked as an intern under Prof Subodh Kumar (Dept. CSE at IIT Delhi)
under the Summer Faculty Research Fellow Program from 4th June -13th
July 2018 at IIT Delhi. She has published a useful workbook on Object
oriented programming using C++ as main author(publishers: Alok
Prakashan) for the B.Tech students of Amity University and is in the process
of generalizing it for the B.Tech pursuing students of all the engineering
colleges in Uttar Pradesh. She has successfully attended various faculty
development programs and workshops in Amity University Lucknow
campus and outside. As well has played an important part in conducting
such programs within the Amity University Lucknow campus. She has
attended the five days military training camp organized by Amity University
Manesar in 2016 as faculty guide with post graduate students. She has also
Published By:
Retrieval Number: A9525109119/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.A9525.109119 1972 & Sciences Publication