Raid Lba
Raid Lba
(*) Notice: Subject to any disclaimer, the term of this * cited by examiner
patent is extended or adjusted under 35 y
U.S.C. 154(b) by 307 days.
Primary Examiner Kimberly McLean
(21) Appl. No.: 10/681,757 74) Attorney,y, Ag
PC Agent, or Firm—Christopher
p P. Maiorana,
(22) Filed: Oct. 8, 2003 -N--
(57) ABSTRACT
(65) Prior Publication Data
US 2005/0080990 A1 Apr. 14, 2005 An apparatus generally having a plurality of disk drives and
a controller is disclosed. Each of the disk drives may have
(51) Int. Cl. a first region and a second region. The first regions may have
G06F 12/00 (2006.01) a performance parameter faster than the second regions. The
(52) U.S. Cl. ........................... 711/114,711/112,711/4; controller may be configured to (i) Write a plurality of data
71.4/100 items in the first regions and (ii) write a plurality of fault
(58) Field of Classification Search ................ 711/114, tolerance items for the data items in the second regions.
711/112, 4: 714/100, 1, 6
See application file for complete search history. 16 Claims, 7 Drawing Sheets
" EEEEE
104 104 ::::
in 4P NP
104
U.S. Patent Sep. 19, 2006 Sheet 1 of 7 US 7,111,118 B2
10- 12- 14
RAID1 VIRTUAL DISK -> DATA DRIVE PARITY DRIVE
DATALBAO PARITY LEA:0
PARITY LBAN
120
DATA ADDR STATUS
CONTROLLER
122
DISKARRAY
124
FIG. 3
U.S. Patent Sep. 19, 2006 Sheet 2 of 7 US 7,111,118 B2
3
U.S. Patent Sep. 19, 2006 Sheet 3 of 7 US 7,111,118 B2
i.
8
s
U.S. Patent Sep. 19, 2006 Sheet 4 of 7 US 7,111,118 B2
902
C?
VE
S-081 +WSZETI
U.S. Patent Sep 19, 2006 Sheet S of 7 US 7,111,118 B2
Solz LS
\/
|
ST
Z
U.S. Patent US 7,111,118 B2
EINSHWO wgTC)
U.S. Patent Sep. 19, 2006 Sheet 7 Of 7 US 7,111,118 B2
ZBAIRHGJ
US 7,111,118 B2
1. 2
HGH PERFORMANCE RAID MAPPNG FIG. 3 is a block diagram of an example implementation
of a disk array apparatus;
FIELD OF THE INVENTION FIG. 4 is a block diagram of an example implementation
of a high performance RAID 1 apparatus;
The present invention relates to a disk drive arrays 5 FIG. 5 is a block diagram of an example implementation
generally and, more particularly, to an apparatus and method for a RAID 5 apparatus;
for mapping in a high performance redundant array of FIG. 6 is a block diagram of an example implementation
inexpensive disks. of a RAID 6 apparatus;
FIG. 7 is a block diagram of an example implementation
BACKGROUND OF THE INVENTION 10 of a RAID 10 apparatus; and
FIG. 8 is a block diagram of an example implementation
Circular disk media devices rotate at a constant angular of a RAID 0+1 apparatus.
velocity while accessing the media. Therefore, a read/write DETAILED DESCRIPTION OF THE
rate to and from the media depends on the particular track PREFERRED EMBODIMENTS
being accessing. Access rates conventionally increase as 15
distance increases from a center of rotation for the media.
Referring to FIG. 1, a block diagram illustrating a conven Referring to FIG. 2, a block diagram of a mapping for a
tional mapping for a redundant array of inexpensive disks virtual disk 100 is shown in accordance with a preferred
(RAID) level 1 system is shown. The RAID 1 system is embodiment of the present invention. The present mapping
commonly viewed by software as a virtual disk 10 having scheme is generally based on a physical orientation and one
multiple contiguous logical block addresses (LBA). Each or more properties of circular disk media. The virtual disk
virtual LBA may be stored on a data drive 12 at a physical 100 may have an overall address range 102 divided into N
LBA. Each physical LBA may also be mirrored to and stored logical block addresses (LBA) 104a–104n. Generally, the
in a parity drive 14. Therefore, each virtual LBA is stored at LBAs 104a–104m may be disposed within the address range
the same physical LBA location in both the data drive 12 and 25 102 with a first LBA 104a having a lowest address number
the parity drive 14. and a last LSB 104m having a highest address number. Other
The RAID 1 system can be implemented with more than addressing arrangements may be implemented to meet the
two disk drives. An equal number of data drives 12 and criteria of a particular application.
parity drives 14 will exist for the RAID 1 virtual disk 10 with The virtual disk 100 may be mapped into two or more disk
each of the parity drives 14 containing a mirror image of a 30 drives 106 and 108. The disk drives 106 and 108 may be
data drive 12. Furthermore, access time to the virtual disk 10 arranged and operated as a level 1 redundant array of
will increase as a position of the LBA number moves closer inexpensive disks (RAID). The disk drive 106 may be
(i.e., increases) to the axis of rotation for the disk drive designated as first drive (e.g., DRIVE 1). The disk drive 108
media. In the LBA mapping scheme illustrated, a random may be designed as a second drive (e.g., DRIVE 2). The
read performance of the RAID 1 virtual disk 10 matches that 35 mapping may be organized such that the primary virtual to
of either single disk 12 or 14. physical association may locate data in higher performance
areas (e.g., a data region) of the disk drives 106 and 108
SUMMARY OF THE INVENTION while parity or fault tolerance information is located in
lower performance areas (e.g., a parity region) of the disk
40 drives 106 and 108.
The present invention concerns an apparatus generally Each block of data for a particular LBA (e.g., 104x) of the
comprising a plurality of disk drives and a controller. Each virtual drive 100 may be primarily mapped in either the first
of the disk drives may have a first region and a second drive 106 or the second drive 108, depending upon the
region. The first regions may have a performance parameter address value for the particular LBA 104.x. A mirror of the
faster than the second regions. The controller may be 45 data for the particular LBA 104x may be mapped at a
configured to (i) write a plurality of data items in the first different location in the other drive. In one embodiment, the
regions and (ii) write a plurality of fault tolerance items for LBAs 104a–104n within a first address range 110 of the
the data items in the second regions. overall address range 102 may be mapped primarily to the
The objects, features and advantages of the present inven first drive 106 and mirrored to the second drive 108. The
tion include providing an apparatus and/or method that may 50 LBAs 104a–104m within a second address range 112 of the
(i) improve random access read performance, (ii) improve overall address range 102 may be mapped primarily to the
overall read performance, (iii) utilize different access rates in second drive 108 and mirrored to the first drive 106.
different regions of a medium, (iv) provide a fault detection Each of the disk drives 106 and 108 is generally arranged
capability and/or (V) enable data recovery upon loss of a Such that one or more performance parameters of a media
drive.
55 within may be better in the first address range 110 as
BRIEF DESCRIPTION OF THE DRAWINGS
compared with the second address range 112. In one
embodiment, a bit transfer rate to and from the media may
be faster in the first address range 110 than in the second
These and other objects, features and advantages of the address range 112. Therefore, data may be read from and
present invention will be apparent from the following 60 written to the media at different rates depending upon the
detailed description and the appended claims and drawings address. For rotating media, the performance generally
in which: increases linearly as a distance from an axis of rotation
FIG. 1 is a block diagram illustrating a conventional increases.
mapping for RAID 1 system; Mapping may be illustrated by way of the following
FIG. 2 is a block diagram of a mapping for a virtual disk 65 example. Data (e.g., 5) at the LBA 104e from the virtual disk
in accordance with a preferred embodiment of the present 100 may be mapped to the same LBA 104e for the first disk
invention; 106 since the address value for the LBA 104e is in the first
US 7,111,118 B2
3 4
address range 110. A mirror image of the data (e.g., 5) from tolerant information may be mirrored (copied) versions of
the virtual disk 100 LBA 104e may be mapped to a different the data information. In another embodiment, the fault
address (e.g., 104g) in the second drive 108. Furthermore, tolerance information may include error detection and/or
data (e.g., N-4) at the LBA 104i for the virtual disk 100 may error correction items, for example parity values.
be mapped to the LBA 104i in the second disk 108 since the Referring to FIG. 4, a block diagram of an example
address value of the LBA 104i is within the second address implementation of a high performance RAID 1 apparatus
range 112. A mirror image of the data (e.g., N-4) from the 140 is shown. In the example, the disk array 124 may be
virtual disk 100 LBA 104i may be mapped to the same implemented with a first disk drive 142 and a second disk
address (e.g., LBA 104f) in the first disk 106. drive 144 (e.g., collectively a disk array 124a). Additional
Any subsequent read for the data 5 or the data N-4 10 disk drives may be included in the disk array 124a to meet
generally accesses the first drive 106 or the second drive 108 the criteria of a particular application. The controller 122
respectively from within the faster first address range 110. may be include a circuit (or block) 145 and a multiplexer
By comparison, a conventional RAID 1 system would have 146.
mapped the data N-4 into the second address range 112 of The high performance RAID 1 mapping method generally
the first drive 106. Therefore, conventional accessing of the 15 does not assign a distinct data drive and a parity drive.
data N-4 within the second address range is generally slower Instead, the fault tolerance/parity items may be rotated
than accessing the data N-4 within the first address range of among all of the drives. The data items may be stored in the
the second drive 108 per the present invention. Overall, a higher performance regions (e.g., faster address ranges) and
random read performance and/or a general read performance the parity items may be stored in the lower performance
of the RAID 1 virtual disk 100 may be better than the regions (e.g., slower address ranges) of the media. Each disk
performance of the individual disk drives 106 and 108. drive 142 and 144 generally comprises one or more disk
Experiments performed on a two-disk RAID 1 system media 148. Each medium 148 may have an outer edge 150
implementing the present invention generally indicate that a and an axis of rotation 152.
performance gain of approximately 20% to 100% may be Each medium 148 may be logically divided into two
achieved for random reads as compared with a conventional 25 regions 154 and 156 based upon the addressing scheme used
RAID 1 mapping. by the drive. The first region 154 generally occupies an
Referring to FIG. 3, a block diagram of an example annular area proximate the outer edge 150 of the medium
implementation of a disk array apparatus 120 is shown. The 148. The first region 154 may be addressable within the first
apparatus 120 generally comprises a circuit (or device) 122 address range 110. Due to a high bit transfer rate, the first
and a circuit (or device) 124. The circuit 122 may be 30 region 154 may be referred to as a high performance region.
implemented as a disk array controller. The circuit 124 may The second region 156 may occupy an annular area
be implemented as a disk array (e.g., a RAID configuration). between the first region 154 and the axis of rotation 152. The
A signal (e.g., DATA) may transfer data items to and from second region 156 may be addressable within the second
the controller 122. A signal (e.g., ADDR) may transfer an address range 112. Hereafter, the second region 156 may be
address associated with the data to the controller 122. One 35 referred to as a low performance region. In one embodiment,
or more optional signals (e.g., STATUS) may present status the high performance region 154 and the low performance
information from the controller 122. One or more signals region 156 may be arranged to have approximately equal
(e.g., D) may exchange the data items between the controller storage capacity on each active Surface of each media 148.
122 and the disk array 124. One or more signals (e.g., FT) In another embodiment, the storage capacity of the high
may exchange fault tolerance items between the controller 40 performance region 156 may be greater than the storage
122 and the disk array 124. capacity of the low performance region 156. For example, in
The controller 122 may be operational to map the infor a RAID 5 configuration having n drives, the high perfor
mation in the signal DATA to the individual disk drives mance region 154 may occupy an (n-1)/n fraction of the
within the disk array 124. The mapping may be dependent medium 148 and the low performance region 156 may
on the particular configuration of disk drives than make up 45 occupy a 1/n fraction of the medium 148. Generally, the
the disk array 124. The disk array 124 may be configured as delineating of the high performance region 154 and the low
a level 1 RAID, a level 5 RAID, a level 6 RAID, a level 10 performance region 156 may be determined by criteria of the
RAID or a level 0+ 1 RAID. Other RAID configurations may particular RAID process being implemented.
be implemented to meet the criteria of a particular applica Other partitions may be made on the media 148 to account
tion. 50 for multiple virtual drives on the physical drives. For
The signal DATA may carry user data and other data to example, a first high performance region 154 and a first low
and from the apparatus 120. The data items within the signal performance region 156 allocated to a first virtual RAID
DATA may be arranged in blocks, segments or the like. may be physically located adjoining the outer edge 150, with
Addressing for the data items may be performed in the signal the first high performance region 156 outside the first low
ADDR using logical blocks, sectors, cylinders, heads, tracks 55 performance region 156. A second high performance region
or other addressing scheme suitable for use with the disk (not shown) and a second low performance region (not
drives. The signal STATUS may be deasserted (e.g., a shown) may be physically located between the first low
logical FALSE level) when error detection circuitry within performance region 156 and the axis 152. In another
the controller 122 detects an error in the data read from the example, the first and the second high performance regions
disk array 124. In situations where no errors are detected, the 60 may be located outside the first and the second low perfor
signal STATUS may be asserted (e.g., a logical TRUE level). mance regions. Other partitions among the high perfor
The signal D may carry the data information. The data mance regions and the low performance regions may be
information may be moved as blocks or stipes to and from created to meet the criteria of a particular application.
the disk array 124. The signal FT may carry fault tolerance The data signal D may carry a block (B) of data (A) to be
information related to the data information. The fault toler 65 written to the disk array 124a (e.g., D BA). The data item
ant information may be moved as blocks or stipes to and D BA may be written to a track 160, sector or other
from the disk array 124. In one embodiment, the fault appropriate area of the first drive 142. The track 160 may be
US 7,111,118 B2
5 6
physically located within the high performance region 154 first drive 182, the second drive 184 and the third drive 186.
of the medium 148 for the first drive 142. The track 160 is In particular, the parity generator circuit 190 may receive the
generally addressable in the first address range 110. data item D BA being written to a track 200 in the high
The circuit 144 may be implemented as a mirror circuit. performance region 154 of the first drive 182. A block for a
The mirror circuit 144 may generate a copy (e.g., FT BA) second data item (e.g., D BB), previously stored in a track
of the data item D BA. The data item copy FT BA may be 202 within the high performance region 154 of the second
written to a track 162 of the second drive 144. The track 162 drive 184, may also be presented to the parity generator
may be physically located within the low performance circuit 190. A block for a third data item (e.g., D BC),
region 156 of the medium 148 for the second drive 144. The previously stored in a track 204 within the high performance
track 162 is generally addressable in the second address 10 region 154 of the third drive 186, may be received by the
range 112. Therefore, a write to the disk array 124a may parity generator circuit 190. The fault tolerance block
store the data item D BA in the higher performance region FT B123 may be conveyed to the fourth drive 188 for
154 of the first disk drive 142 and the data item copy FT BA storage in a track 206 within the low performance region
in the low performance region 156 of the second disk drive 156. The parity generator circuit 192 may include a first
144. An access to the disk array 124a to read the data item 15 in-first-out buffer (not shown) to temporarily queue portions
D BA may primarily access the track 160 from the first disk of the fault tolerance block FT B123 while the fault toler
drive 142 instead of the track 162 from the second disk drive ance block FT B123 is being written to the relatively slower
144. If the first drive 142 fails, the multiplexer 146 may low performance region 156 of the fourth drive 188. Since
generate the signal D by routing the data item copy FT BA the track 206 is at a different radius from the axis of rotation
from the second drive 144. 152 than the tracks 200, 202 and 204, writing of the fault
As the high performance region 154 of the first drive 142 tolerance block FT B123 may be performed asynchro
becomes full, additional data items may be written to the nously with respect to writing the data item D BA.
high performance region 154 of the second drive 144. The parity generator circuit 192 and the compare circuit
Conversely, additional mirrored (fault tolerance) data items 194 may be utilized to read from the disk array 124b. For
may be stored in the low performance region 156 of the first 25 example, a data item (e.g., D BF) may be read from the
drive 142. For example, a second data item (e.g., D XB) fourth drive 188. Reading the data item D BF may include
may be read from a track 164 in the high performance region reading other data items (e.g., D BD and D BE) from the
154 of the second drive 144. Substantially simultaneously, a second drive 182 and the third drive 186 at the same rank.
second data item copy (e.g., FT XB) may be read from a The parity generator circuit 192 may generate a block for a
track 166 within the low performance region 156 of the first 30 parity item (e.g., PARITY) based upon the three received
drive 142. The multiplexer 146 generally returns the second data items D BD, D BE and D BF. Substantially simulta
data item D XB to the controller circuit 122 as a block (B) neously, a fault tolerance block (e.g., FT B234) may be read
for the second data item (B) (e.g., D BB). If the second from the low performance region 156 of the first drive 182.
drive 144 fails, the multiplexer 146 may route the mirrored The compare circuit 194 may compare the fault tolerance
second data item FT XB to present the second data item 35 block FT B234 with the calculated parity item PARITY. If
D BB. the fault tolerance block FT B234 is the same as the parity
Since the (primary) tracks 160 and 164 are located in the item PARITY, the compare circuit 194 may assert the signal
high performance regions 154 and the (fault tolerance) STATUS in the logical TRUE state. If the compare circuit
tracks 162 and 166 are located in the low performance 194 detects one or more discrepancies between the fault
regions 156 of the drives 142 and 144, respectively, random 40 tolerance block FT B234 and the parity item PARITY, the
accesses to the data stored in the tracks 160 and 164 are signal STATUS may be deasserted to the logical FALSE
generally faster than random accesses to the mirror data state. Repair of a faulty data item D BF may be performed
stored in the tracks 162 and 166. As such, a read perfor per conventional RAID 5 procedures. Replacement of a
mance of the apparatus 140 may be improved as compared drive 182—188 may also be performed per conventional
with conventional RAID 1 systems. 45 RAID 5 procedures with the exceptions that reconstructed
Referring to FIG. 5, a block diagram of an example data may only be stored in the high performance regions 154
implementation for a RAID 5 apparatus 180 is shown. The and fault tolerance (parity) information may only be stored
disk array 124 for the RAID 5 apparatus 180 generally in the low performance regions 156 of the replacement disk.
comprises a drive 182, a drive 184, a drive 186 and a drive Referring to FIG. 6, a block diagram of an example
188 (e.g., collectively a disk array 124b). The controller 122 50 implementation of a RAID 6 apparatus 210 is shown. The
for the RAID 5 apparatus 180 generally comprises a circuit disk array 124 for the RAID 6 apparatus 210 generally
(or block) 190, a circuit (or block) 192 and a circuit (or comprises a drive 212, a drive 214, a drive 216 and a drive
block) 194. The circuit 190 may be implemented as a parity 218 (e.g., collectively a disk array 124c). The controller 122
generator circuit. The circuit 192 may also be implemented for the RAID 6 apparatus 210 generally comprises a circuit
as a parity generator circuit. The circuit 194 may be imple 55 (or block) 220, a circuit (or block) 222, a circuit (or block)
mented as a compare circuit. In one embodiment, the parity 224, a circuit (or block) 226, a circuit (or block) 228 and a
generator circuit 190 and the parity generator circuit 192 circuit (or block) 230. The circuits 220, 222, 224 and 228
may be the same circuit. may each be implemented as parity generator circuits. The
The parity generator circuit 190 may be operational to circuits 226 and 230 may each be implemented as compare
generate a parity item from three tracks at the same rank 60 circuits. In one embodiment, the parity generator circuit 222,
(e.g., address) from three of the drives 182-188. The parity the parity generator circuit 228 and the compare circuit 230
item may be error detection information and optionally an may be implemented as part of each drive 212-218.
error correction block of information. The parity item may An example write of the data item D BA generally
be written to the fourth of the drives 182-188. involves writing to the high performance region 154 of the
As illustrated in the example, the parity generator circuit 65 first drive 212. Substantially simultaneously, the parity gen
190 may generate a parity item, also referred to as a fault erator circuit 220 may receive the data item D BA and
tolerance block (e.g., FT B123), based on data stored in the additional data items (e.g., D BB and D BC) at the same
US 7,111,118 B2
7 8
rank (e.g., address) from two other drives (e.g., 214 and and X SA2) by routing the individual data stripes D SA1
216). Similar to the RAID 5 apparatus 180 (FIG. 5), the and D SA2, respectively. The combine circuit 260 may
parity generator circuit 220 may generate a fault tolerance regenerate the data item block D BA from the stripes
block (e.g., FT B123A) that is subsequently stored in the X SA1 (e.g., D SA1) and X SA2 (e.g., D SA2).
low performance region 156 of the fourth drive (e.g., 218). During a data recovery read, the first fault tolerance stripe
Furthermore, the parity generator circuit 222 may generate FT SA1 may be read from the second drive 244 and the
a second fault tolerance block (e.g., FT B1A) based on the second fault tolerance stripe FT SA2 may be read from the
data item block D BA and other data item blocks (e.g., fourth drive 248. The multiplexers 256 and 258 may route
D BD and D BE) previously stored in the same tracks of the fault tolerance stripes FT SA1 and FT SA2 to the
the first drive 212 on different disks. The fault tolerance 10
combine circuit 260. The combine circuit 260 may recon
block FT B1A may be stored within the low performance struct the data item block D BA from the stipes X SA1
region 156 of the first drive 212. (e.g., FT SA1) and X SA2 (e.g., FT SA2).
An example read of a data item (e.g., D BH) from the first Referring to FIG. 8, a block diagram of an example
drive 212 generally includes reading other data items (e.g., implementation of a RAID 0+1 apparatus 270 is shown. The
D BI and D BJ) at the same rank from two other drives 15
disk array 124 for the RAID 0+1 apparatus 270 generally
(e.g., 214 and 216), a first fault tolerance block (e.g., comprises a drive 272, a drive 274, a drive 276 and a drive
FT B123B) from the low performance region 156 of the 278 (e.g., collectively a disk array 124e). The controller 122
fourth drive (e.g., 218) and a second fault tolerance block for the RAID 0+1 apparatus 270 generally comprises a
(e.g., FT B1B) from the low performance region 156 of the circuit (or block) 280, a circuit (or block) 282, a circuit (or
first drive 212. The parity generator circuit 224 may be block) 284, a circuit (or block) 286, a circuit (or block) 288
operational to compare the data items D DH, D BI and and a multiplexer 290.
D BJ at the same rank to generate a first parity item (e.g., The circuit 280 may be implemented as a mirror circuit.
PARITY1). The compare circuit 226 may compare the first The mirror circuit 280 may generate a mirror data item (e.g.,
parity item PARITY1 with the first fault tolerance block FT BA) by copying the data item D BA. Each circuit 282
FT B123B to determine a first portion of the status signal 25
and 284 may be implemented as stripe a circuit. The stripe
(e.g., STATUS1). Substantially simultaneously, the parity
generator circuit 228 may generate a second parity item circuit 282 may stripe the data item D BA to generate
(e.g., PARITY2) based on the data items D BH, DBF and multiple data stripes (e.g., D SA1 and D SA2). The stripe
D BG stored in the same tracks of the first drive 212 on circuit 284 may stripe the fault tolerance block FT BA to
different disks. The compare circuit 230 may compare the 30 generate multiple parity stripes (e.g., FT SA1 and FT SA2).
second parity item PARITY2 with the fault tolerance block The first data stripe D SA1 may be stored in the high
FT B1B read from the low performance region 156 of the performance region 154 of the first drive 272. The second
first drive 212 to determine a second portion of the status data stripe D SA2 may be stored in the high performance
signal (e.g., STATUS2). region 154 of the second drive 274. The first parity stripe
Referring to FIG. 7, a block diagram of an example 35 FT SA1 may be stored in the low performance region 156
implementation of a RAID 10 apparatus 240 is shown. The of the third drive 276. The second parity stripe FT SA2 may
disk array 124 for the RAID 10 apparatus 240 generally be stored in the low performance region 156 of the fourth
comprises a drive 242, a drive 244, a drive 246 and a drive drive 278. Once the drives 272-278 are approximately half
248 (e.g., collectively a disk array 124d). The controller 122 full of data, the mirror circuit 280 may be operational to
for the RAID 10 apparatus 230 generally comprises a circuit 40 route the data item D BA to the stripe circuit 284 for storage
(or block) 250, a circuit (or block) 252, a circuit (or block) in the high performance regions 154 of the third drive 276
254, a multiplexer 256, a multiplexer 258 and a circuit (or and the fourth drive 278. Likewise, the mirrored data
block) 260. FT BA may be sent to the stipe circuit 282 for storage in the
The block 250 may be implemented as a stripe circuit. The low performance regions 156 of the first drive 272 and the
second drive 274.
stripe circuit 250 may be operational to convert the data item 45
D BA from a block form to a stipe form. Each circuit 252 Each circuit 286 and 288 may be implemented as a
and 254 may be implemented as a mirror circuit. The circuit combine circuit that reassembles stripes back into blocks.
260 may be operational to transform data in stripe form back The combine circuit 286 may be operational to regenerate
into the block form. the data item D BA from the data stripes D SA1 and
The stripe circuit 250 may transform the data item D BA 50 D SA2. The combine circuit 288 may be operational to
into a first stripe data (e.g., D SA1) and a second stripe data regenerate the fault tolerance block FT BA from the stripes
(e.g., D SA2). The mirror circuit 252 may generate a first FT SA1 and FT SA2. The multiplexer 290 may be config
fault tolerance stripe (e.g., FT SA1) by copying the first ured to return one of the regenerated data item D BA or the
stripe data D SA1. The mirror circuit 254 may generate a regenerated fault tolerance block FT BA as the read data
second fault tolerance stripe (e.g., FT SA2) by copying the 55 item D BA.
second stripe data D SA2. The first drive 242 may store the The mapping scheme of the present invention may be
first stripe data D SA1 in the high performance region 154. generally applied to any RAID level. However, the perfor
The second drive 244 may store the first fault tolerance mance gain may depend on the amount of media used to
stripe FT SA1 in the low performance region 156. The third store parity/fault tolerance information. The various signals
drive 246 may store the second stripe data D-SA2 in the high 60 of the present invention are generally TRUE (e.g., a digital
performance region 154. The fourth drive 248 may store the HIGH, “on” or 1) or FALSE (e.g., a digital LOW, “off” or
second fault tolerance stripe FT SA2 in the low perfor O). However, the particular polarities of the TRUE (e.g.,
mance region 156. asserted) and FALSE (e.g., de-asserted) states of the signals
During a normal read, the first data stripe D SA1 may be may be adjusted (e.g., reversed) accordingly to meet the
read from the first drive 242 and the second data stripe 65 design criteria of a particular implementation. As used
D SA2 may be read from the third drive 246. The multi herein, the term “simultaneously” is meant to describe
plexers 256 and 258 may generate stripe items (e.g., X SA1 events that share some common time period but the term is
US 7,111,118 B2
10
not meant to be limited to events that begin at the same point said first drive, (iii) calculate a second parity item based on
in time, end at the same point in time, or have the same said first data block and said third data block and (iv) store
duration. said second parity item in said second region of said first
While the invention has been particularly shown and drive.
described with reference to the preferred embodiments 9. The apparatus according to claim 8, wherein said
thereof, it will be understood by those skilled in the art that controller is further configured to write said second parity
various changes in form and details may be made without item in said first disk of said first drive.
departing from the spirit and scope of the invention. 10. The method according to claim 7, wherein said
The invention claimed is: performance parameter is a bit transfer rate to a storage
1. An apparatus comprising: 10 medium within said disk drives.
a plurality of disk drives each having a first region and a 11. The method according to claim 7, wherein said first
second region, wherein said first regions have a per data block is written in a first disk of said first drive, the
formance parameter faster than said second regions; method further comprising the steps of:
and reading a third data block from said predetermined
a controller configured to (i) write a first data block at a 15 address of a second disk of said first drive;
particular address in said first region of a first drive of calculating a second parity item based on said first data
said disk drives, (ii) read a second data block from said block and said third data block; and
particular address of a second drive of said disk drives, storing said second parity item in said second range of
(iii) calculate a first parity item based on said first data said first drive.
block and said second data block and (iv) write said 12. The method according to claim 11, wherein said
first parity item in said second region of a third drive of second parity item is written in said first disk of said first
said disk drives. drive.
2. The apparatus according to claim 1, wherein said first 13. A method for operating a plurality of disk drives,
region for each of said disk drives comprises an annular area comprising the steps of:
of a storage medium proximate an outer edge of said storage 25
(A) partitioning an address range for said disk drives into
media.
3. The apparatus according to claim 2, wherein said a first range and a second range, where said first range
has a performance parameter faster than said second
second region for each of said disk drives comprise an area range.
of said storage medium between said first region and a (B) generating both a second data block and a third data
rotational axis of said storage medium. 30
block by stripping a first data block;
4. The apparatus according to claim 1, wherein said disk (C) writing said second data block in said first range of a
drives comprise a redundant array of inexpensive disks level first drive of said disk drives;
5.
5. The apparatus according to claim 1, wherein said disk (D) writing said third data block in said first range of a
drives comprise a redundant array of inexpensive disks level 35 third drive of said disk drives;
6. (E) generating a first mirrored data block by mirroring
6. The apparatus according to claim 1, wherein said said first data block; and
performance parameter is a bit transfer rate to a storage (F) writing said first mirrored data block in said second
medium within said disk drives. range of a second drive of said disk drives.
7. A method for operating a plurality of disk drives, 40 14. The method according to claim 13, further comprising
comprising the steps of the step of:
(A) partitioning an address range for said disk drives into generating both a second mirrored data block and a third
a first range and a second range, where said first range mirrored data block by stripping said first mirrored data
has a performance parameter faster than said second block, wherein the step of writing said first mirrored
range. 45 data block comprises the Sub-steps of
(B) writing a first data block at a particular address in said writing said second mirrored data block in said second
first range of a first drive of said disk drives; drive; and
(C) reading a second data block from said particular writing said third mirrored data block in a fourth drive of
address of a second drive of said disk drives; said disk drives.
(D) calculating a first parity item based on said first data 50 15. The method according to claim 13, wherein said
block and said second data block; and performance parameter is a bit transfer rate to a storage
(E) writing said first parity item in said second range of a medium within said disk drives.
third drive of said disk drives. 16. The method according to claim 13, wherein said disk
8. The apparatus according to claim 1, wherein said drives comprise a redundant array of inexpensive disks level
controller is further configured to (i) write said first data 55 O+1.
block in a first disk of said first drive, (ii) read a third data
block from said predetermined address of a second disk of