Slow performance due to lack of buffering for GzipFile.write

#### Problem description

When writing a `.csv.gz` file to S3, performance was slow as it's calling `GzipFile.write()` for every single line in the CSV file. This is not the issue itself but there is also a lack of buffering in `GzipFile.write()` (see: https://github.com/python/cpython/issues/89550) and I was able to improve performance by implementing a solution similar to https://github.com/python/cpython/pull/101251 (i.e., register a custom compression handler for the `'.gz'` file extension in which `GzipFile.write` does have buffering).

I'm opening this issue for the smart_open library to discuss:

- Should the smart_open library make it easier to enable and/or configure buffering for `GzipFile.write()` for Python versions that don't yet have the above fix?
- If not, should we document this or make it clear to users of the smart_open library that performance can be improved by adding buffering yourself?

#### Steps/code to reproduce the problem

Use `smart_open.open` in write mode for a `.csv.gz` file to S3:

```python
import csv

import smart_open

column_names = ("a", "b")
my_data = [.......]
with smart_open.open("s3://.../example.csv.gz", "wb", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=column_names)
    writer.writerows(my_data)
```

(`writerows`, in C, calls `writerow` for each line, which in turn call `Gzip.write()` for each line)

#### Versions

- Python 3.8
- smart_open 6.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Slow performance due to lack of buffering for GzipFile.write #760

Problem description

Steps/code to reproduce the problem

Versions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Slow performance due to lack of buffering for GzipFile.write #760

Description

Problem description

Steps/code to reproduce the problem

Versions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions