Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Upload Module : Uploading image on a file storage is not determinist #118

@nhuray

Description

@nhuray

Hi Globo Team,

I think there's an issue on upload module when it is used with the file storage.

Indeed, when you upload an image on thumbor :

curl -XPOST -F '[email protected];filename=cat-eye.jpg' http://thumbor-server/upload

The response would be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: 2012/08/12/cat-eye.jpg
Server: TornadoServer/2.1.1

and the image is created on a file system :

└── 2012
    └── 08
        └── 12
            └── cat-eye.jpg

If we want to replace the image the next day (ie 2012-08-13), we make this request :

curl -XPUT -F '[email protected];filename=cat-eye.jpg' http://thumbor-server/upload

The response would be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: 2012/08/13/cat-eye.jpg
Server: TornadoServer/2.1.1

and a second image is created on a file system instead of replace the first image :

└── 2012
    └── 08
        ├── 12
        │   └── cat-eye.jpg
        └── 13
            └── cat-eye.jpg

This issue is due to the implementation of the distribution algorithm of the file storage which is based on time as discussed in the issue #113.

So I think we have to change the strategy for filesystem distribution.

We may use a strategy similar to the strategy used by Git to store his objects using the 2 first digit of sha1(path) to create a directory and the remaining to create the file.

Following this strategy the normalize_path in the file_storage.py should be :

    def normalize_path(self, path):
        digest = hashlib.sha1(path).hexdigest()
        return join(self.context.config.FILE_STORAGE_ROOT_PATH.rstrip('/'), digest[:2] + '/' + digest[2:])

With this strategy files should be distributed like that on the filesystem :

├── 6e
│   └── 7ea22ec6a03708fc2ac674580ee2c2fed26f36
├── 73
│   └── dc4c10a915fb41578a0e9dcaf3a99d53e2a785
├── 75
│   └── 47efb441e2bc461b54603e584cd936745b5935
├── 77
│   └── 4555f047b92136a4e65b7f5034f8faeb79a76b
├── 78
│   ├── 4008d66d4b9675a58e5b8faa2ec09b0c7bdb49
│   ├── 9260f3a7034ca116e063388cc33e65941d398b
│   └── b545ac7b9d8ac6d15a7bd2b54d42795c3405ad

So if we choose this implementation we should remove the resolve_original_photo_path from the file_storage.py and implement the normalize_path like above.

More generally each storage system MAY implements a method path_on_storage according to its constraints.

Nicolas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions