0% found this document useful (0 votes)

105 views31 pages

Scalable OpenSource Storage

This document discusses several open source storage solutions from GRAU DATA including OpenArchive, LIO, and Ceph. OpenArchive is GRAU DATA's archiving software that uses an HSM approach and supports various backend devices. LIO is a high performance Linux SCSI target that can be used with OpenArchive. Ceph is introduced as a next generation cluster file system that provides scalable object storage using a decentralized architecture with no single points of failure.

Uploaded by

jweb911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views31 pages

Scalable OpenSource Storage

Uploaded by

jweb911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

GRAU DATA Scalable OpenSource Storage CEPH, LIO, OPENARCHIVE

GRAU DATA: More than 20 years experience in storage

OPEN ARCHIVE

2009 2007 1992 2000 2004

Mainframe Tape Libraries Open System Tape Libraries Archive Systeme FMA Software HP OEM FSE & FMA ARCHIVEMANAGER OPENARCHIVE

OPENARCHIVE - Homepage

OPENARCHIVE: 2nd generation of archiving software

Start of development in 2000 100 man-years of development Scalable archive solution from 1 TB up to several PB Common code base for Windows and Linux HSM approach data gets migrated to tapes or disks Fileystem interface (NTFS, POSIX) simplyfies integration Support for all kinds of SCSI backend devices (FC, iSCSI) API only necessary for special purposes > 150 Customers worldwide References: Bundesarchiv, Charite, many DMS vendors

OPENARCHIVE: internal architecture

CLI GUI API
Tape Library

HSMnet Event Manager

HSMnet Server
Management Interface Library Agent

CIFS

Client File System

HSMnet I/F

Partition Manager

NFS

Client File System

HSMnet I/F

Partition Manager

Resource Manager Client File Disks System

Back End Agent(s) Back End Agent(s )

Current limitations of OPENARCHIVE

Number of files and performance is limited by native

filesystems (ext3/ext4, NTFS)

Archiving and recalling of many files in parallel is slow Cluster file systems are not supported Performance is not appropriate for HPC environments Scalability might not be appropriate for cloud environments

High performance SCSI Target: LIO

LIO is the new standard SCSI target in Linux since 2.6.38 LIO is devloped and supported by RisingTide System Completely kernel based SCSI target engine Support for cluster systems (PR, ALUA) Fabric modules for:
iSCSI (TCP/IP) FCoE FC (Qlogic) Infiniband (SRP)

Transparent blocklevel caching for SSD High speed iSCSI initiator (MP for performance and HA) RTSadmin for easy administration

LIO performance comparision (IOPS)

Based on 2 prozesses: 25% read 75% write

iSCSI numbers with standard Debian client. RTS iSCSI Initiator plus kernel patches give much higher results.

Next generation ClusterFS: CEPH

Scalable storage system
1 to 1000s of nodes Gigabytes to exabytes

Reliable
No single points of failure All data is replicated Self-healing

Self-managing
Automatically (re)distributes stored data

Developed by Sage Weil (Dreamhost) based on Ph.D. Theses

Upstream in Linux since 2.6.34 (client), 2.6.37 (RDB)

Ceph design goals

Avoid traditional system designs
Single server bottlenecks, points of failure Symmetric shared disk (SAN)

Avoid manual workload partition

Data sets, usage grow over time Data migration is tedious

Avoid passive storage devices

Storage servers have CPUs; use them

Object storage

Objects
Alphanumeric name Data blob (bytes to gigabytes) Named attributes (foo=bar)

Object pools
Separate flat namespace

Cluster of servers store all objects

RADOS: Reliable autonomic distributed object store

Low-level storage infrastructure

librados, radosgw RBD, Ceph distributed file system

Ceph data placement

Files striped over objects
4 MB objects by default
File

Objects mapped to placement

groups (PGs)
pgid = hash(object) & mask

Objects

PGs mapped to sets of OSDs

crush(cluster, rule, pgid) = [osd2, osd3] ~100 PGs per node Pseudo-random, statistically uniform

PGs

OSDs
(grouped by failure domain)

distribution

Fast O(log n) calculation, no lookups Reliable replicas span failure domains Stable adding/removing OSDs moves few PGs

Ceph storage servers

Ceph storage nodes (OSDs)
cosd object storage daemon btrfs volume of one or more disks
cosd btrfs
Object interface

cosd btrfs

Actively collaborate with peers

Replicate data (n timesadmin can choose) Consistently apply updates Detect node failures Migrate data

It's all about object placement

OSDs acts intelligently - everyone knows where objects are
located
Coordinate writes with replica peers Copy or migrate objects to proper location

OSD map completely specifies data placement

OSD cluster membership and state (up/down etc.) CRUSH function mapping objects PGs OSDs

cosd will peer on startup or map change

Contact other replicas of PGs they store Ensure PG contents are in sync, and stored on the correct nodes

Identical, robust process for any map change

Node failure Cluster expansion/contraction Change in replication level

Why btrfs?
Featureful
Copy on write, snapshots, checksumming, multi-device

Leverage internal transactions, snapshots

OSDs need consistency points for sane recovery

Hooks into copy-on-write infrastructure

Clone data content between files (objects)

ext[34] can also work...

Inefficient snapshots, journaling

Object storage interfaces: librados

Direct, parallel access to entire OSD cluster PaaS, SaaS applications
When objects are more appropriate than files

C, C++, Python, Ruby, Java, PHP bindings

rados_pool_t pool; rados_connect(...); rados_open_pool("mydata", &pool); rados_write(pool, foo, 0, buf1, buflen); rados_read(pool, bar, 0, buf2, buflen); rados_exec(pool, baz, class, method, inbuf, inlen, outbuf, outlen); rados_snap_create(pool, newsnap); rados_set_snap(pool, oldsnap); rados_read(pool, bar, 0, buf2, buflen); /* old! */ rados_close_pool(pool); rados_deinitialize();

Object storage interfaces: radosgw

Proxy: no direct client access to storage nodes HTTP RESTful gateway
S3 and Swift protocols

http

ceph

RDB: RADOS block device

Virtual disk image striped over objects
Reliable shared storage Centralized management VM migration between hosts

Thinly provisioned
Consume disk only when image is written to

Per-image snapshots Layering (WIP)

Copy-on-write overlay over existing 'gold' image Fast creation or migration

RDB: RADOS block device

Native Qemu/KVM (and libvirt) support
$ qemu-img create -f rbd rbd:mypool/myimage 10G $ qemu-system-x86_64 --drive format=rbd,file=rbd:mypool/myimage

Linux kernel driver (2.6.37+)

$ echo 1.2.3.4 name=admin mypool myimage > /sys/bus/rbd/add $ mke2fs -j /dev/rbd0 $ mount /dev/rbd0 /mnt

Simple administration
CLI, librbd
$ rbd create foo --size 20G $ rbd list foo $ rbd snap create --snap=asdf foo $ rbd resize foo --size=40G $ rbd snap create --snap=qwer foo $ rbd snap ls foo 2 asdf 20971520 3 qwer 41943040

Object classes

Start with basic object methods

{read, write, zero} extent; truncate {get, set, remove} attribute delete

Dynamically loadable object classes

Implement new methods based on existing ones e.g. calculate SHA1 hash, rotate image, invert matrix,

etc.

Moves computation to data

Avoid read/modify/write cycle over the network e.g., MDS uses simple key/value methods to update objects

containing directory content

POSIX filesytem
Create file system hierarchy on top of objects Cluster of cmds daemons
No local storage all metadata stored in objects Lots of RAM function has a large, distributed, coherent cache

arbitrating file system access

Dynamic cluster
New daemons can be started up dynamically Automagically load balanced

POSIX example
Client MDS Cluster

fd=open(/foo/bar, O_RDONLY)
Client: requests open from MDS MDS: reads directory /foo from object store MDS: issues capability for file content

read(fd, buf, 1024)

Client: reads data from object store

close(fd)
Client: relinquishes capability to MDS MDS out of I/O path Object locations are well knowncalculated from object name
Object Store

Dynamic subtree partitioning

Root MDS 0 MDS 1 MDS 2 MDS 3 MDS 4

Busy directory fragmented across many MDSs

Scalable
Arbitrarily partition metadata, 10s-100s of nodes

Adaptive
Move work from busy to idle servers Replicate popular metadata on multiple nodes

Workload adaptation

Extreme shifts in workload result in redistribution of metadata across cluster

Metadata initially managed by mds0 is migrated

many directories

same directory

Metadata scaling
Up to 128 MDS nodes, and 250,000 metadata ops/second
I/O rates of potentially many terabytes/second File systems containing many petabytes of data

Recursive accounting
Subtree-based usage accounting
Recursive file, directory, byte counts, mtime
$ ls -alSh | head total 0 drwxr-xr-x 1 root root 9.7T 2011-02-04 15:51 . drwxr-xr-x 1 root root 9.7T 2010-12-16 15:06 .. drwxr-xr-x 1 pomceph pg4194980 9.6T 2011-02-24 08:25 pomceph drwxr-xr-x 1 mcg_test1 pg2419992 23G 2011-02-02 08:57 mcg_test1 drwx--x--- 1 luko adm 19G 2011-01-21 12:17 luko drwx--x--- 1 eest adm 14G 2011-02-04 16:29 eest drwxr-xr-x 1 mcg_test2 pg2419992 3.0G 2011-02-02 09:34 mcg_test2 drwx--x--- 1 fuzyceph adm 1.5G 2011-01-18 10:46 fuzyceph drwxr-xr-x 1 dallasceph pg275 596M 2011-01-14 10:06 dallasceph $ getfattr -d -m ceph. pomceph # file: pomceph ceph.dir.entries="39" ceph.dir.files="37" ceph.dir.rbytes="10550153946827" ceph.dir.rctime="1298565125.590930000" ceph.dir.rentries="2454401" ceph.dir.rfiles="1585288" ceph.dir.rsubdirs="869113" ceph.dir.subdirs="2"

Fine-grained snapshots
Snapshot arbitrary directory subtrees
Volume or subvolume granularity cumbersome at petabyte scale

Simple interface
$ m kdir foo/ .s na p / one # c re a te s n a ps h ot $ ls foo / .s na p on e $ ls foo / b a r/ .s na p _o ne _1099 511 627 776 # pa re nt's s n a p n a m e is m a ng le d $ rm foo / m yfile $ ls -F fo o b a r/ $ ls foo / .s na p/ on e m yfile b a r/ $ rm dir foo/ .s na p / one # re m o ve s n a ps h ot

Efficient storage
Leverages copy-on-write at storage layer (btrfs)

POSIX filesystem client

POSIX; strong consistency semantics

Processes on different hosts interact as if on same host Client maintains consistent data/metadata caches

Linux kernel client

# modprobe ceph # mount -t ceph 10.3.14.95:/ /mnt/ceph # df -h /mnt/ceph Filesystem Size Used Avail Use% Mounted on 10.3.14.95:/ 95T 29T 66T 31% /mnt/ceph

Userspace client
cfuse FUSE-based client libceph library (ceph_open(), etc.) Hadoop, Hypertable client modules (libceph)

Integration of CEPH, LIO, OpenArchive

CEPH implements complete POSIX semantics OpenArchive plugs into VFS layer of Linux Integrated setup of CEPH, LIO and OpenArchive provides:
Scalable cluster filesystem Scalable HSM for cloud and HPC setups Scalable high available SAN storage (iSCSI, FC, IB)

I have a dream...
Archive
Tape Disk Archive

ClusterFS

SAN

OpenArchive CEPH

Filesystem Filesystem Filesystem CEPH

Blocklevel Blocklevel Blocklevel Initiator

CEPH MDS CEPH OSD

LIO Target RDB

Contact

Thomas Uhl Cell: +49 170 7917711 [email protected] [email protected] www.twitter.com/tuhl de.wikipedia.org/wiki/Thomas_Uhl

RH342 RHEL8.4 en 1 20211202
No ratings yet
RH342 RHEL8.4 en 1 20211202
492 pages
rh124 9.0 Student Guide
75% (4)
rh124 9.0 Student Guide
672 pages
Red Hat Open Shift DO280 Student Guide
100% (3)
Red Hat Open Shift DO280 Student Guide
474 pages
Red Hat System Administration II RH134 9.0 - Student - Ashish Lingayat, Bernardo Gargallo, Ed Parenti, Jacob - 2023 - Anna's Archive
100% (1)
Red Hat System Administration II RH134 9.0 - Student - Ashish Lingayat, Bernardo Gargallo, Ed Parenti, Jacob - 2023 - Anna's Archive
438 pages
Linux InterviewQuestions - For - Level - 2 - Ratnakar PDF
80% (10)
Linux InterviewQuestions - For - Level - 2 - Ratnakar PDF
225 pages
Downloadable Official CompTIA A+ Core 1 and Core 2 Student Guide
99% (72)
Downloadable Official CompTIA A+ Core 1 and Core 2 Student Guide
1,260 pages
James Pengelly - Official CompTIA Network+ Student Guide (Exam N10-008) (2021)
92% (12)
James Pengelly - Official CompTIA Network+ Student Guide (Exam N10-008) (2021)
613 pages
AZ 500T00A ENU TrainerHandbook PDF
100% (4)
AZ 500T00A ENU TrainerHandbook PDF
333 pages
AZ 305 Designing Microsoft Azure Infrastructure Solutions
100% (10)
AZ 305 Designing Microsoft Azure Infrastructure Solutions
278 pages
Linux Essentials For Cybersecurity
100% (23)
Linux Essentials For Cybersecurity
1,966 pages
Ceph Architecture
No ratings yet
Ceph Architecture
15 pages
Jb371 7.0 Student Guide
No ratings yet
Jb371 7.0 Student Guide
244 pages
Best 20 Hacking Tutorials
93% (30)
Best 20 Hacking Tutorials
404 pages
The Python Bible
97% (31)
The Python Bible
506 pages
The Complete Cyber Security Course, Hacking Exposed
97% (29)
The Complete Cyber Security Course, Hacking Exposed
282 pages
Practical Projects
100% (30)
Practical Projects
478 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (44)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
QV4311 StudentGuide
No ratings yet
QV4311 StudentGuide
158 pages
Course: 10961 Automating Administration With Windows PowerShell
100% (8)
Course: 10961 Automating Administration With Windows PowerShell
519 pages
IT Pre-Sales Essentials
75% (4)
IT Pre-Sales Essentials
12 pages
Unit V
No ratings yet
Unit V
91 pages
Network+ Student Guide
92% (13)
Network+ Student Guide
816 pages
Finish Carpentry Module 1
No ratings yet
Finish Carpentry Module 1
6 pages
3PAR RMC Student Guide
No ratings yet
3PAR RMC Student Guide
101 pages
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
100% (11)
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
187 pages
Learn Python in A Day
100% (14)
Learn Python in A Day
141 pages
Linux Commands Handbook PDF
100% (15)
Linux Commands Handbook PDF
135 pages
File Systems
No ratings yet
File Systems
6 pages
OpenStack Ceph Deployment Guide
No ratings yet
OpenStack Ceph Deployment Guide
168 pages
Storage Tiering and Erasure Coding in Ceph - 150222
No ratings yet
Storage Tiering and Erasure Coding in Ceph - 150222
79 pages
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
No ratings yet
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
31 pages
Halit Sahitaj - Criminal Network and Russian Intelligence Ties
No ratings yet
Halit Sahitaj - Criminal Network and Russian Intelligence Ties
5 pages
Fault Code List For Base Module (GM) Control Unit 2
No ratings yet
Fault Code List For Base Module (GM) Control Unit 2
4 pages
File System and Memory Manipulation With Programming Language
No ratings yet
File System and Memory Manipulation With Programming Language
23 pages
Billion-Files File Systems (BFFS) : A Comparison: Sohail Shaikh
No ratings yet
Billion-Files File Systems (BFFS) : A Comparison: Sohail Shaikh
9 pages
PP Riseofchina
No ratings yet
PP Riseofchina
16 pages
RH Storage RHCC SanDiego 2020
No ratings yet
RH Storage RHCC SanDiego 2020
45 pages
Front Cover: Power Systems For AIX III: Advanced Administration and Problem Determination
No ratings yet
Front Cover: Power Systems For AIX III: Advanced Administration and Problem Determination
26 pages
TCC2640 Architecture Student Guide v1-0
No ratings yet
TCC2640 Architecture Student Guide v1-0
188 pages
RONSAIRO
No ratings yet
RONSAIRO
3 pages
Ceph Storage: Features and Components
No ratings yet
Ceph Storage: Features and Components
8 pages
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
No ratings yet
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
43 pages
Storage For Containers Whitepaper
No ratings yet
Storage For Containers Whitepaper
11 pages
Selection Errors MBA
No ratings yet
Selection Errors MBA
3 pages
Introduction To Bash Scripting Dark
100% (10)
Introduction To Bash Scripting Dark
122 pages
Unit 5 CC
No ratings yet
Unit 5 CC
8 pages
Lecture 08
No ratings yet
Lecture 08
25 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
23 pages
APPROVED Vendor Pending List
No ratings yet
APPROVED Vendor Pending List
177 pages
Linux - A Beginner's Guide
100% (14)
Linux - A Beginner's Guide
107 pages
TN206
No ratings yet
TN206
37 pages
Notification of New Employment
No ratings yet
Notification of New Employment
1 page
The Ultimate Guide To Wordpress - Ebook
91% (11)
The Ultimate Guide To Wordpress - Ebook
100 pages
Evaluating Cephfs Performance vs. Cost On High Density Commodity Disk Servers
No ratings yet
Evaluating Cephfs Performance vs. Cost On High Density Commodity Disk Servers
10 pages
Business Functions - Chaper 1-9-162-191
No ratings yet
Business Functions - Chaper 1-9-162-191
30 pages
Dividend Policy Veddanta
No ratings yet
Dividend Policy Veddanta
14 pages
CS2510 00 Distributed Storage Overview
No ratings yet
CS2510 00 Distributed Storage Overview
53 pages
Install Oracle Apex 4.0 On Oracle 11g EE R2
0% (1)
Install Oracle Apex 4.0 On Oracle 11g EE R2
11 pages
Ceph Commands
No ratings yet
Ceph Commands
4 pages
Ceph, Storage For CERN Cloud
No ratings yet
Ceph, Storage For CERN Cloud
10 pages
BLP Front Page
No ratings yet
BLP Front Page
14 pages
Btree Report
No ratings yet
Btree Report
29 pages
Guidelines For Writing Thesis SCF 2023-2024
No ratings yet
Guidelines For Writing Thesis SCF 2023-2024
5 pages
AFC Notes Last Year
No ratings yet
AFC Notes Last Year
81 pages
1.rakitanprinter 20 Januari 2020-1 1
No ratings yet
1.rakitanprinter 20 Januari 2020-1 1
1 page
Preface To IGAS and IGFRS
No ratings yet
Preface To IGAS and IGFRS
5 pages
The Virtual File System (VFS)
No ratings yet
The Virtual File System (VFS)
60 pages
Storage Ceph 5 Documentation IBM
No ratings yet
Storage Ceph 5 Documentation IBM
1,252 pages
Plant Maintenance
No ratings yet
Plant Maintenance
14 pages
Bhatti 062014
No ratings yet
Bhatti 062014
41 pages
OCS 4.X Troubleshooting
No ratings yet
OCS 4.X Troubleshooting
96 pages
TCP/IP
100% (15)
TCP/IP
286 pages
LINUX File System: Slides Adopted From
No ratings yet
LINUX File System: Slides Adopted From
41 pages
EOS On CephFS
No ratings yet
EOS On CephFS
15 pages
End 1 End 2: Intralox, Inc. P.O. Box 50699 New Orleans, LA 70150 USA Fax: (504) 734-0063
No ratings yet
End 1 End 2: Intralox, Inc. P.O. Box 50699 New Orleans, LA 70150 USA Fax: (504) 734-0063
2 pages
File Syetem
No ratings yet
File Syetem
33 pages
57 Brochure
No ratings yet
57 Brochure
42 pages
Substantive Testing in The Revenue Cycle
No ratings yet
Substantive Testing in The Revenue Cycle
3 pages
Storage Donvito Chep 2013
No ratings yet
Storage Donvito Chep 2013
43 pages
Kernel VFS and File System Efficiency
No ratings yet
Kernel VFS and File System Efficiency
28 pages
Ceph
No ratings yet
Ceph
168 pages
Btrfs File System Seminar Overview
No ratings yet
Btrfs File System Seminar Overview
22 pages
Virtual File System - Linux
No ratings yet
Virtual File System - Linux
4 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
Ceph Cheatsheet - MD
No ratings yet
Ceph Cheatsheet - MD
41 pages
Ceph Workshop: Gridka School 2015
No ratings yet
Ceph Workshop: Gridka School 2015
56 pages
HY Syllabus Class 12 - 2024-25
No ratings yet
HY Syllabus Class 12 - 2024-25
4 pages
Lecture 2 Advanced File Systems
No ratings yet
Lecture 2 Advanced File Systems
66 pages
Linux Devs: BTRFS Filesystem Guide
No ratings yet
Linux Devs: BTRFS Filesystem Guide
14 pages
Scale15x-2017-Postgresql Zfs Best Practices
No ratings yet
Scale15x-2017-Postgresql Zfs Best Practices
110 pages
Ceph
No ratings yet
Ceph
40 pages
Guidance Transcutaneous Electrical Stimulators
No ratings yet
Guidance Transcutaneous Electrical Stimulators
18 pages
Kirin PDF
No ratings yet
Kirin PDF
28 pages
NetApp - TR3749
No ratings yet
NetApp - TR3749
118 pages
SAP MM - Purchase Info Record
100% (1)
SAP MM - Purchase Info Record
6 pages
Coca Colas Marketing Plan
No ratings yet
Coca Colas Marketing Plan
57 pages
Ceph: A Scalable, High-Performance Distributed File System
No ratings yet
Ceph: A Scalable, High-Performance Distributed File System
14 pages
Ceph File System
100% (1)
Ceph File System
13 pages
Btrfs - The Next Generation Filesystem On Linux: Neependra Khare
No ratings yet
Btrfs - The Next Generation Filesystem On Linux: Neependra Khare
23 pages
Ups Lyonn Rackeable
No ratings yet
Ups Lyonn Rackeable
2 pages
How To Post Bail For Your Temporary Liberty
No ratings yet
How To Post Bail For Your Temporary Liberty
4 pages
NFS Protocol Overview & Evolution
No ratings yet
NFS Protocol Overview & Evolution
48 pages
Case Study
No ratings yet
Case Study
44 pages
DFS Design and Implementation: Brent R. Hafner
No ratings yet
DFS Design and Implementation: Brent R. Hafner
40 pages
Insight Outsourcing Report
100% (1)
Insight Outsourcing Report
32 pages
Ext4 Foss
No ratings yet
Ext4 Foss
25 pages
Rotary Valve Fast Cycle Pressure Swing Adsorption Paper
No ratings yet
Rotary Valve Fast Cycle Pressure Swing Adsorption Paper
14 pages
Google File System for Developers
No ratings yet
Google File System for Developers
28 pages
Virtual Machine Block Storage With The Distributed Storage System
No ratings yet
Virtual Machine Block Storage With The Distributed Storage System
40 pages
Solaris Dynamic File System: Sun Microsystems, Inc
No ratings yet
Solaris Dynamic File System: Sun Microsystems, Inc
26 pages
Cloud File System for Developers
No ratings yet
Cloud File System for Developers
27 pages
The Google File System: Kenneth Chiu
No ratings yet
The Google File System: Kenneth Chiu
40 pages
Ceph Cookbook - Sample Chapter
No ratings yet
Ceph Cookbook - Sample Chapter
28 pages
Analytical VaR VaR Mapping
No ratings yet
Analytical VaR VaR Mapping
13 pages
Tutorials Point, Simply Easy Learning: Java Tutorial
No ratings yet
Tutorials Point, Simply Easy Learning: Java Tutorial
17 pages
Lecture 16
No ratings yet
Lecture 16
15 pages
Red Hat Ceph Storage-1.2.3-Red Hat Ceph Architecture-En-US
No ratings yet
Red Hat Ceph Storage-1.2.3-Red Hat Ceph Architecture-En-US
24 pages
Stor Admin
No ratings yet
Stor Admin
162 pages
Linux VFS for Developers
No ratings yet
Linux VFS for Developers
32 pages

Scalable OpenSource Storage

Uploaded by

Scalable OpenSource Storage

Uploaded by

GRAU DATA Scalable OpenSource Storage CEPH, LIO, OPENARCHIVE

GRAU DATA: More than 20 years experience in storage

2009 2007 1992 2000 2004

OPENARCHIVE: 2nd generation of archiving software

OPENARCHIVE: internal architecture

HSMnet Event Manager

Client File System

Client File System

Resource Manager Client File Disks System

Back End Agent(s) Back End Agent(s )

Current limitations of OPENARCHIVE

Number of files and performance is limited by native

High performance SCSI Target: LIO

LIO performance comparision (IOPS)

Based on 2 prozesses: 25% read 75% write

Next generation ClusterFS: CEPH

Developed by Sage Weil (Dreamhost) based on Ph.D. Theses

Ceph design goals

Avoid manual workload partition

Avoid passive storage devices

Cluster of servers store all objects

Low-level storage infrastructure

Ceph data placement

Objects mapped to placement

PGs mapped to sets of OSDs

Ceph storage servers

Actively collaborate with peers

It's all about object placement

OSD map completely specifies data placement

cosd will peer on startup or map change

Identical, robust process for any map change

Leverage internal transactions, snapshots

Hooks into copy-on-write infrastructure

ext[34] can also work...

Object storage interfaces: librados

C, C++, Python, Ruby, Java, PHP bindings

Object storage interfaces: radosgw

RDB: RADOS block device

Per-image snapshots Layering (WIP)

RDB: RADOS block device

Linux kernel driver (2.6.37+)

Start with basic object methods

Dynamically loadable object classes

Moves computation to data

containing directory content

arbitrating file system access

read(fd, buf, 1024)

Dynamic subtree partitioning

Busy directory fragmented across many MDSs

Extreme shifts in workload result in redistribution of metadata across cluster

POSIX filesystem client

POSIX; strong consistency semantics

Linux kernel client

Integration of CEPH, LIO, OpenArchive

Filesystem Filesystem Filesystem CEPH

Blocklevel Blocklevel Blocklevel Initiator

CEPH MDS CEPH OSD

LIO Target RDB

LIO Target RDB

You might also like