Exadata Platform Deepdive PDF
Exadata Platform Deepdive PDF
Deepanshu Agarwal
Cyril Malaki
The following is intended to outline our general product direction. It is intended for information purposes
only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing decisions. The development,
release, timing, and pricing of any features or functionality described for Oracle’s products may change
and remains at the sole discretion of Oracle Corporation.
Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and
prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed
discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and
Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q
under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website
at http://www.oracle.com/investor. All information in this presentation is current as of September 2019
and Oracle undertakes no duty to update any statement in light of new information or future events.
• 100Gb RDMA over Converged Ethernet 100 GbE RoCE network fabric
(RoCE) Network Fabric
Smart Storage
• Scale-Out Intelligent 2-Socket Storage 32 cores for SQL offload
Servers 192 GB DRAM
1.5 TB Persistent Memory
>
5.8 GB/sec 5 GB/sec
• InfiniBand was the only viable RDMA capable network at the inception of Exadata
• Ethernet has caught up
RDMA Write
CPU CPU
RDMA Read
Management
Client Access RoCE
Private RoCE Switch
* BONDETH0 can be either
copper or optical links
RoCE
Switch
Switch
• RoCE Priority-based Flow Control (PFC)
Buffers • Tells sender to pause the messages if switch buffer is full
Faster
• Writes survive power failure unlike DRAM FLASH
16
Persistent Memory with Traditional Storage
• Persistent Memory usage with traditional storage:
Compute Server • Database issues read I/O call to OS
• OS sends message to storage
SAN • Storage CPU issues read to Persistent Memory
Storage Server • Storage CPU sends reply to Server OS
• Server OS wakes up Database
Persistent
• Speed of Persistent Memory read is overwhelmed
Hot Memory
by high cost of network and I/O software,
interrupts, and context switches
• Performance benefit is limited
Context
Switch:
Database 8K Read Storage Server 10s of µs
End-to-end Latency: Software
~200 µsec
18 Copyright © 2020 Oracle
What if we drop in PMEM as is?
Database
Server Database
Software
Context
Storage Switch:
Server 10s of µs
Kernel/OS
(Database Server)
PMEM Read
Raw Latency: ~1
µs Kernel/OS
(Storage Server)
PMEM
Context
Switch:
Storage Server 10s of µs
Database 8K Read Software
End-to-end Latency:
~100 µsec
Over 90% of the Time Wasted.
19 Copyright © 2020 Oracle
Conventional Two-Sided Read
• Conventional Storage Performs Server-side Cache Read
Storage Looks up Flash Cache
[Disk, Offset] -> Location on Flash,
DB Sends READ Request to Issues Read to Flash
Storage
[Disk, Offset ]
Database Server
… Storage Server
Warm FLASH
• Persistent Memory mirrored automatically across storage
servers for fault-tolerance
Cold • 16Million IOPS, <19us latency for 8K I/Os from the
database
22 Copyright © 2020 Oracle Enabled with Exadata System Software 19.3 and Database Software 19c
Exadata Data Access Tiers
PMEM
MAA Characteristics
Database Node Storage Cell
• Not drawn to scale J
• Primary copy of data placed in PMEM
X
cache on a read miss Database Read PMEM
• Secondary copy of data placed in Buffer Cache Hot
PMEM
Kernel/OS
(Storage Server)
Database 8K Read
End-to-end Latency: Context
Switch:
<19 µsec Storage Server 10s of µs
Software
10x Faster than Exadata X8
24 Copyright © 2020 Oracle
Ultra Fast One-Sided Read
• New Disruptive Technology Enables PMEM Cache Read via RDMA
PMEM Cache
26 Copyright © 2020 Oracle Enabled with Exadata System Software 19.3 and Database Software 19c
What about Redo Log Writes to PMEM?
Database
Server Database
Software
Context
Storage Switch:
Server 10s of µs
Kernel/OS
(Database Server)
Context
Switch:
Storage Server 10s of µs
Database Log Write Software
End-to-end Latency:
~100 µsec
Over 90% of the Time Wasted.
27 Copyright © 2020 Oracle
Conventional Two-Sided Log Write
• DB Sends Requests to Storage
• Storage Writes to Flash Log and Sends Ack
…
PMEM Log Buffer
Database Server PMEM Log Buffer Storage Server
PMEM…Log Buffer
…
• Secure Erase
• Crypto Erase option
• Drop CellDisk Erase=1pass/3pass/7pass does crypto erase on underlying
PMEM DIMMs
More Efficient
Machines
More Isolation efficient but less isolated
• DB Resource manager isolation adds no overhead
• Resources can be shared much more dynamically
Multiple DBs • Must configure systems correctly – shared resources
on one Server • Best strategy is to combine VMs with database
native consolidation
• Multiple trusted DBs or Pluggable DBs in a VM
Multitenant • Few VMs per server to limit overhead of fragmenting
Databases CPUs/memory/patching etc.
37 Copyright © 2020
2019 Oracle
Oracle and/or its affiliates.
Just in Time Smart Columnar Decryption
ename job sal
• Columns are decrypted only when needed John Staff 120k
• If many columns are in a cache-line, decrypt only columns Joe Staff 230k
of interest
Jane Sr. Staff 440k
• If first predicate returns nothing, don’t access columns for
Mary Sr. Staff 380k
second predicate Encrypted
• Example Scan
• select ename from emp
• where job like ‘%VP’ and sal + bonus > 500000
• Projected columns (ename) and one or more predicated columns are job
decrypted (not entire cache-line) Staff
• For data regions that do not have a VP, salary is not accessed
Staff
• Up to 30% faster encrypted smart scans and reduced
Sr. Staff
storage server CPU usage
Sr. Staff
Decrypted
Enabled with Exadata Storage Server 19.3 for in-memory customers
38
Copyright © 2020 Oracle
Fast In-Flash Columnar Cache Creation
In-Memory
• Runtime analyzer finds best compression algorithm Columnar scans
• Reuse dictionary created in-line during analysis
• Faster Symbol lookup and insert using new dictionary
data structure
• Up to 35% faster columnar cache creation In-Flash
Columnar scans
42
References
v Exadata Manuals (Security and Maintenance)
https://docs.oracle.com/en/engineered-systems/exadata-database-
machine/books.htm
v Oracle Exadata Best Practices (Doc ID 757552.1)
v Exadata Patching Overview Including Database and Storage Server
Upgrade Demo (Doc ID 1584200.1)
v Exadata Database Machine and Exadata Storage Server Supported
Versions (Doc ID 888828.1)
v https://blogs.oracle.com/in-memory/how-to-determine-if-columnar-
format-on-exadata-flash-cache-is-being-used