Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Mountchicken
Copy link
Collaborator

Multiscale training is an attractive trick for text detection since the scale of text is highly variable.

Supporting multi-scale training is simple, we only need to modify the generation of text target to use data_sample.batch_input_shape instead of data_sample.img_shape. This modification will not affect the existing detectors in mmocr, because their input size is fixed, i.e. data_sample.img_shape=data_sample.batch_input_shape.

To use multi-scale training, here is a simple config

train_pipeline = [
    dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
    dict(
        type='LoadOCRAnnotations',
        with_bbox=True,
        with_polygon=True,
        with_label=True,
    ),
    dict(
        type='RandomResize',
        scale=[(1280, 800), (1280, 1024)],
        keep_ratio=True),
    dict(
        type='PackTextDetInputs',
        meta_keys=('img_path', 'ori_shape', 'img_shape'))
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants