@@ -17,6 +17,25 @@ All code snippets on this page assume that the following has been executed:
17
17
- {Parsing from an IO Stream}[#label-Parsing+from+an+IO+Stream]
18
18
- {Recipe: Parse from IO Stream with Headers}[#label-Recipe-3A+Parse+from+IO+Stream+with+Headers]
19
19
- {Recipe: Parse from IO Stream Without Headers}[#label-Recipe-3A+Parse+from+IO+Stream+Without+Headers]
20
+ - {RFC 4180 Compliance}[#label-RFC+4180+Compliance]
21
+ - {Row Separator}[#label-Row+Separator]
22
+ - {Recipe: Handle Compliant Row Separator}[#label-Recipe-3A+Handle+Compliant+Row+Separator]
23
+ - {Recipe: Handle Non-Compliant Row Separator}[#label-Recipe-3A+Handle+Non-Compliant+Row+Separator]
24
+ - {Column Separator}[#label-Column+Separator]
25
+ - {Recipe: Handle Compliant Column Separator}[#label-Recipe-3A+Handle+Compliant+Column+Separator]
26
+ - {Recipe: Handle Non-Compliant Column Separator}[#label-Recipe-3A+Handle+Non-Compliant+Column+Separator]
27
+ - {Quote Character}[#label-Quote+Character]
28
+ - {Recipe: Handle Compliant Quote Character}[#label-Recipe-3A+Handle+Compliant+Quote+Character]
29
+ - {Recipe: Handle Non-Compliant Quote Character}[#label-Recipe-3A+Handle+Non-Compliant+Quote+Character]
30
+ - {Recipe: Allow Liberal Parsing}[#label-Recipe-3A+Allow+Liberal+Parsing]
31
+ - {Special Handling}[#label-Special+Handling]
32
+ - {Special Line Handling}[#label-Special+Line+Handling]
33
+ - {Recipe: Ignore Blank Lines}[#label-Recipe-3A+Ignore+Blank+Lines]
34
+ - {Recipe: Ignore Selected Lines}[#label-Recipe-3A+Ignore+Selected+Lines]
35
+ - {Special Field Handling}[#label-Special+Field+Handling]
36
+ - {Recipe: Strip Fields}[#label-Recipe-3A+Strip+Fields]
37
+ - {Recipe: Handle Null Fields}[#label-Recipe-3A+Handle+Null+Fields]
38
+ - {Recipe: Handle Empty Fields}[#label-Recipe-3A+Handle+Empty+Fields]
20
39
- {Converting Fields}[#label-Converting+Fields]
21
40
- {Converting Fields to Objects}[#label-Converting+Fields+to+Objects]
22
41
- {Recipe: Convert Fields to Integers}[#label-Recipe-3A+Convert+Fields+to+Integers]
@@ -164,6 +183,143 @@ Output:
164
183
["bar", "1"]
165
184
["baz", "2"]
166
185
186
+ === RFC 4180 Compliance
187
+
188
+ By default, \CSV parses data that is compliant with
189
+ {RFC 4180}[https://tools.ietf.org/html/rfc4180]
190
+ with respect to:
191
+ - Row separator.
192
+ - Column separator.
193
+ - Quote character.
194
+
195
+ ==== Row Separator
196
+
197
+ RFC 4180 specifies the row separator CRLF (Ruby "\r\n").
198
+
199
+ Although the \CSV default row separator is "\n",
200
+ the parser also by default handles row seperator "\r" and the RFC-compliant "\r\n".
201
+
202
+ ===== Recipe: Handle Compliant Row Separator
203
+
204
+ For strict compliance, use option +:row_sep+ to specify row separator "\r\n",
205
+ which allows the compliant row separator:
206
+ source = "foo,1\r\nbar,1\r\nbaz,2\r\n"
207
+ CSV.parse(source, row_sep: "\r\n") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]]
208
+ But rejects other row separators:
209
+ source = "foo,1\nbar,1\nbaz,2\n"
210
+ CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError
211
+ source = "foo,1\rbar,1\rbaz,2\r"
212
+ CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError
213
+ source = "foo,1\n\rbar,1\n\rbaz,2\n\r"
214
+ CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError
215
+
216
+ ===== Recipe: Handle Non-Compliant Row Separator
217
+
218
+ For data with non-compliant row separators, use option +:row_sep+.
219
+ This example source uses semicolon (';') as its row separator:
220
+ source = "foo,1;bar,1;baz,2;"
221
+ CSV.parse(source, row_sep: ';') # => [["foo", "1"], ["bar", "1"], ["baz", "2"]]
222
+
223
+ ==== Column Separator
224
+
225
+ RFC 4180 specifies column separator COMMA (Ruby ',').
226
+
227
+ ===== Recipe: Handle Compliant Column Separator
228
+
229
+ Because the \CSV default comma separator is ',',
230
+ you need not specify option +:col_sep+ for compliant data:
231
+ source = "foo,1\nbar,1\nbaz,2\n"
232
+ CSV.parse(source) # => [["foo", "1"], ["bar", "1"], ["baz", "2"]]
233
+
234
+ ===== Recipe: Handle Non-Compliant Column Separator
235
+
236
+ For data with non-compliant column separators, use option +:col_sep+.
237
+ This example source uses TAB ("\t") as its column separator:
238
+ source = "foo,1\tbar,1\tbaz,2"
239
+ CSV.parse(source, col_sep: "\t") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]]
240
+
241
+ ==== Quote Character
242
+
243
+ RFC 4180 specifies quote character DQUOTE (Ruby '"').
244
+
245
+ ===== Recipe: Handle Compliant Quote Character
246
+
247
+ Because the \CSV default quote character is '"',
248
+ you need not specify option +:quote_char+ for compliant data:
249
+ source = "\"foo\",\"1\"\n\"bar\",\"1\"\n\"baz\",\"2\"\n"
250
+ CSV.parse(source) # => [["foo", "1"], ["bar", "1"], ["baz", "2"]]
251
+
252
+ ===== Recipe: Handle Non-Compliant Quote Character
253
+
254
+ For data with non-compliant quote characters, use option +:quote_char+.
255
+ This example source uses SQUOTE ("'") as its quote character:
256
+ source = "'foo','1'\n'bar','1'\n'baz','2'\n"
257
+ CSV.parse(source, quote_char: "'") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]]
258
+
259
+ ==== Recipe: Allow Liberal Parsing
260
+
261
+ Use option +:liberal_parsing+ to specify that \CSV should
262
+ attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields:
263
+ source = 'is,this "three, or four",fields'
264
+ CSV.parse(source) # Raises MalformedCSVError
265
+ CSV.parse(source, liberal_parsing: true) # => [["is", "this \"three", " or four\"", "fields"]]
266
+
267
+ === Special Handling
268
+
269
+ You can use parsing options to specify special handling for certain lines and fields.
270
+
271
+ ==== Special Line Handling
272
+
273
+ Use parsing options to specify special handling for blank lines, or for other selected lines.
274
+
275
+ ===== Recipe: Ignore Blank Lines
276
+
277
+ Use option +:skip_blanks+ to ignore blank lines:
278
+ source = <<-EOT
279
+ foo,0
280
+
281
+ bar,1
282
+ baz,2
283
+
284
+ ,
285
+ EOT
286
+ parsed = CSV.parse(source, skip_blanks: true)
287
+ parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"], [nil, nil]]
288
+
289
+ ===== Recipe: Ignore Selected Lines
290
+
291
+ Use option +:skip_lines+ to ignore selected lines.
292
+ source = <<-EOT
293
+ # Comment
294
+ foo,0
295
+ bar,1
296
+ baz,2
297
+ # Another comment
298
+ EOT
299
+ parsed = CSV.parse(source, skip_lines: /^#/)
300
+ parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
301
+
302
+ ==== Special Field Handling
303
+
304
+ Use parsing options to specify special handling for certain field values.
305
+
306
+ ===== Recipe: Strip Fields
307
+
308
+ Use option +:strip+ to strip parsed field values:
309
+ CSV.parse_line(' a , b ', strip: true) # => ["a", "b"]
310
+
311
+ ===== Recipe: Handle Null Fields
312
+
313
+ Use option +:nil_value+ to specify a value that will replace each field
314
+ that is null (no text):
315
+ CSV.parse_line('a,,b,,c', nil_value: 0) # => ["a", 0, "b", 0, "c"]
316
+
317
+ ===== Recipe: Handle Empty Fields
318
+
319
+ Use option +:empty_value+ to specify a value that will replace each field
320
+ that is empty (\String of length 0);
321
+ CSV.parse_line('a,"",b,"",c', empty_value: 'x') # => ["a", "x", "b", "x", "c"]
322
+
167
323
=== Converting Fields
168
324
169
325
You can use field converters to change parsed \String fields into other objects,
@@ -180,49 +336,49 @@ There are built-in field converters for converting to objects of certain classes
180
336
- \DateTime
181
337
182
338
Other built-in field converters include:
183
- - <tt> :numeric</tt> : converts to \Integer and \Float.
184
- - <tt> :all</tt> : converts to \DateTime, \Integer, \Float.
339
+ - + :numeric+ : converts to \Integer and \Float.
340
+ - + :all+ : converts to \DateTime, \Integer, \Float.
185
341
186
342
You can also define field converters to convert to objects of other classes.
187
343
188
344
===== Recipe: Convert Fields to Integers
189
345
190
- Convert fields to \Integer objects using built-in converter <tt> :integer</tt> :
346
+ Convert fields to \Integer objects using built-in converter + :integer+ :
191
347
source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
192
348
parsed = CSV.parse(source, headers: true, converters: :integer)
193
349
parsed.map {|row| row['Value'].class} # => [Integer, Integer, Integer]
194
350
195
351
===== Recipe: Convert Fields to Floats
196
352
197
- Convert fields to \Float objects using built-in converter <tt> :float</tt> :
353
+ Convert fields to \Float objects using built-in converter + :float+ :
198
354
source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
199
355
parsed = CSV.parse(source, headers: true, converters: :float)
200
356
parsed.map {|row| row['Value'].class} # => [Float, Float, Float]
201
357
202
358
===== Recipe: Convert Fields to Numerics
203
359
204
- Convert fields to \Integer and \Float objects using built-in converter <tt> :numeric</tt> :
360
+ Convert fields to \Integer and \Float objects using built-in converter + :numeric+ :
205
361
source = "Name,Value\nfoo,0\nbar,1.1\nbaz,2.2\n"
206
362
parsed = CSV.parse(source, headers: true, converters: :numeric)
207
363
parsed.map {|row| row['Value'].class} # => [Integer, Float, Float]
208
364
209
365
===== Recipe: Convert Fields to Dates
210
366
211
- Convert fields to \Date objects using built-in converter <tt> :date</tt> :
367
+ Convert fields to \Date objects using built-in converter + :date+ :
212
368
source = "Name,Date\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2001-02-03\n"
213
369
parsed = CSV.parse(source, headers: true, converters: :date)
214
370
parsed.map {|row| row['Date'].class} # => [Date, Date, Date]
215
371
216
372
===== Recipe: Convert Fields to DateTimes
217
373
218
- Convert fields to \DateTime objects using built-in converter <tt> :date_time</tt> :
374
+ Convert fields to \DateTime objects using built-in converter + :date_time+ :
219
375
source = "Name,DateTime\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2020-05-07T14:59:00-05:00\n"
220
376
parsed = CSV.parse(source, headers: true, converters: :date_time)
221
377
parsed.map {|row| row['DateTime'].class} # => [DateTime, DateTime, DateTime]
222
378
223
379
===== Recipe: Convert Assorted Fields to Objects
224
380
225
- Convert assorted fields to objects using built-in converter <tt> :all</tt> :
381
+ Convert assorted fields to objects using built-in converter + :all+ :
226
382
source = "Type,Value\nInteger,0\nFloat,1.0\nDateTime,2001-02-04\n"
227
383
parsed = CSV.parse(source, headers: true, converters: :all)
228
384
parsed.map {|row| row['Value'].class} # => [Integer, Float, DateTime]
@@ -265,12 +421,12 @@ then refer to the converter by its name:
265
421
==== Using Multiple Field Converters
266
422
267
423
You can use multiple field converters in either of these ways:
268
- - Specify converters in option <tt> :converters</tt> .
424
+ - Specify converters in option + :converters+ .
269
425
- Specify converters in a custom converter list.
270
426
271
- ===== Recipe: Specify Multiple Field Converters in Option <tt> :converters</tt>
427
+ ===== Recipe: Specify Multiple Field Converters in Option + :converters+
272
428
273
- Apply multiple field converters by specifying them in option <tt> :conveters</tt> :
429
+ Apply multiple field converters by specifying them in option + :conveters+ :
274
430
source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n"
275
431
parsed = CSV.parse(source, headers: true, converters: [:integer, :float])
276
432
parsed['Value'] # => [0, 1.0, 2.0]
@@ -291,21 +447,21 @@ Apply multiple field converters by defining and registering a custom converter l
291
447
You can use header converters to modify parsed \String headers.
292
448
293
449
Built-in header converters include:
294
- - <tt> :symbol</tt> : converts \String header to \Symbol.
295
- - <tt> :downcase</tt> : converts \String header to lowercase.
450
+ - + :symbol+ : converts \String header to \Symbol.
451
+ - + :downcase+ : converts \String header to lowercase.
296
452
297
453
You can also define header converters to otherwise modify header \Strings.
298
454
299
455
==== Recipe: Convert Headers to Lowercase
300
456
301
- Convert headers to lowercase using built-in converter <tt> :downcase</tt> :
457
+ Convert headers to lowercase using built-in converter + :downcase+ :
302
458
source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
303
459
parsed = CSV.parse(source, headers: true, header_converters: :downcase)
304
460
parsed.headers # => ["name", "value"]
305
461
306
462
==== Recipe: Convert Headers to Symbols
307
463
308
- Convert headers to downcased Symbols using built-in converter <tt> :symbol</tt> :
464
+ Convert headers to downcased Symbols using built-in converter + :symbol+ :
309
465
source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
310
466
parsed = CSV.parse(source, headers: true, header_converters: :symbol)
311
467
parsed.headers # => [:name, :value]
@@ -334,12 +490,12 @@ then refer to the converter by its name:
334
490
==== Using Multiple Header Converters
335
491
336
492
You can use multiple header converters in either of these ways:
337
- - Specify header converters in option <tt> :header_converters</tt> .
493
+ - Specify header converters in option + :header_converters+ .
338
494
- Specify header converters in a custom header converter list.
339
495
340
496
===== Recipe: Specify Multiple Header Converters in Option :header_converters
341
497
342
- Apply multiple header converters by specifying them in option <tt> :header_conveters</tt> :
498
+ Apply multiple header converters by specifying them in option + :header_conveters+ :
343
499
source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n"
344
500
parsed = CSV.parse(source, headers: true, header_converters: [:downcase, :symbol])
345
501
parsed.headers # => [:name, :value]
0 commit comments