EDF

Description: This package includes the python code of the EDF algorithm for multi-view fusion and its application in chemical structure recognition [1] (close-set and open-set). It solves the multi-view features fusion problem by searching an optimal combination scheme of different basic fusion operators.

A simple implementation of EDF version is available in https://github.com/xinyanliang/EDFv0.1.

@ARTICLE{Liang2021Evolutionary,
      author={X. {Liang} and Q. {Guo} and Y. {Qian} and W. {Ding} and Q. {Zhang}}, 
      journal={IEEE Transactions on Evolutionary Computation},
      title={Evolutionary Deep Fusion Method and Its Application in Chemical Structure Recognition},
      year={2021},  
      volume={25},  
      number={5},  
      pages={883-893}, 
      doi={10.1109/TEVC.2021.3064943}}

Requirement: The package was developed with python3 and tensorflow-gpu(2.0.3).

ATTN: This package is free for academic usage. You can run it at your own risk. For other purposes, please contact Dr. Xinyan Liang ([email protected]).

Data preparation

Download data from the following links.

Datasets	URL	提取码
ChemBook-10k	https://pan.baidu.com/s/1G1P-_YyDhBTeWXhTeyOaBw	4fcj
ChEMBL-10k	https://pan.baidu.com/s/1ZcPyJq8C7EEV0Trmc37U8g	69n3
PubChem-10k	https://pan.baidu.com/s/1ha8a119gyMul2rzT_aoUlA	olhr
tiny-imagenet200	https://pan.baidu.com/s/1v5g9j_drRYNK9M3lOjXCqg	tacd

Take dataset "ChemBook-10k" for example,

Download "ChemBook-10k" data set;
Put the data set into "ChemBook-10k" folder;
Modify paramenter 'data_name'='ChemBook-10k' in config.py file.

The folder structure should be:

.
├── EDF                    
│   ├── ChemBook-10k                 
│   │   └── view   
│   ├── ChEMBL-10k                
│   │   └── view 
│   ├── PubChem-10k                
│   │   └── view 
│   └── ...
└── ...

Reproduce our results on ChemBook-10k, ChEMBL-10k and PubChem-10k

Before running train_EDF.py, you have to set some parameters in config.py file.

def get_configs():
    paras = {
        'data_name': 'ChemBook',
        'fusion_ways': ['add', 'mul', 'cat', 'max', 'avg'],
        'fused_nb_feats': 128,
        'nb_view': 5,
        'pop_size': 28,
        'nb_iters': 20,
        'idx_split': 1,
        # training parameter settings
        'result_save_dir': 'EDF-True' + '-128-5' + 'result-1',
        'gpu_list': [0, 1, 2, 3, 4, 5, 6],
        'epochs': 100,
        'batch_size': 64,
        'patience': 10,
        # EDF
        'is_remove': True,
        'crossover_rate': 0.9,
        'mutation_rate': 0.2,
        'noisy': True,
        'max_len': 40,
        # data set information
        'image_size': {
            'w': 230, 'h': 230, 'c': 1},
        'classes': 10000,

    }
    return paras
  
   options:
         data_name <string>   the dataset name to process currently, options ChemBook, Chembl, PubChem and tiny-imagenet200
         gpu_list  <list>   GPU id list to train EDF. More the number of GPUs is, less time EDF takes.
                            The maximum number of GPUs is equal to the size of population.

    $python train_EDF.py

Reproduce our results on the open-set scenario

Setting: EDF and view extracters are trained on PubChem-10k dataset; retrieve database is constructed using training set of ChEMBL-10k dataset; these images from test set of ChEMBL-10k dataset are used query images.

Download the trained EDF and view exteacter models from the link https://pan.baidu.com/s/1RtV3QACJpTPtWLJ7NdlSzg, then put them into the models folder;
Download the datasets from the link https://pan.baidu.com/s/1LIE2ti2c9f4r9wuW3oUARQ 提取码：ejx5, then put them into the database folder;
Running the following script open_set_report.py.

# open_set_report.py
from sklearn.metrics import pairwise_distances
import numpy as np
import os
from retrieve import databaseUtil
databaseUtil.construct_retrieve_database_test()
data_dir = 'database'
code = '3-2-0-1-0-4-0'


def get_data():
    train_x = np.load(os.path.join(data_dir, code+'train_X.npy'))
    train_y = np.load(os.path.join(data_dir, 'train_Y.npy'))
    test_x = np.load(os.path.join(data_dir, code+'test_X.npy'))
    test_y = np.load(os.path.join(data_dir, 'test_Y.npy'))
    return train_x, train_y, test_x, test_y


def cal_dist(topk=[1, 5, 10], metric="euclidean"):
    train_x, train_y, test_x, test_y = get_data()
    num_test = test_x.shape[0]
    train_y = np.tile(train_y, (num_test, 1))
    dis = pairwise_distances(X=test_x, Y=train_x, metric=metric, n_jobs=-1)
    sort_idx1 = np.argsort(dis, axis=1)

    def report_topk(k):

        sort_idx = sort_idx1[:, :k]
        count = 0
        for i in range(num_test):
            if test_y[i] in train_y[i, sort_idx[i, :]]:
                count += 1
        print(count/num_test)
    for ki in topk:
        report_topk(ki)

if __name__ == '__main__':
    os.environ["CUDA_VISIBLE_DEVICES"] = '7'
    cal_dist(topk=[1, 5, 10, 15, 20, 50], metric="euclidean")

run the above open_set_report.py script as follows:

$python open_set_report.py

Build your recognition system based on your own dataset

Step 1 Train your own view extrtractor using your own dataset or download our trained EDF and view extracter models from the link https://pan.baidu.com/s/1RtV3QACJpTPtWLJ7NdlSzg, then put them into the models folder. Note：The performance may be better by training your all model based on your own dataset
Step 2 Find a proper deep fusion model based on your own generated view features
Step 3 Generate retrieve database by extracting fusion layer of deep fusion network
Step 4 Provide the query service by api

Example Usage: Training your own view extractor using your own dataset

Prerpocess your dataset. Save data in the format of numpy array by running following imgs2npy() function.

   from features import feature
   from data_utils import npy_util
   import os
   import numpy as np
   from data_utils import data_uitl
      
   def imgs2npy(imgs_file_list, save_dir='database', save_name='x'):
       '''
       Read images according to their path, and then save them in the format of npy
       :param imgs_file_list: path of images to read
       :param save_name: path of npy file to save
       :return: images in the format of array of numpy
       '''
       imgs = []
       for img_fn in imgs_file_list:
           imgs.append(npy_util.read_image(img_fn))
       imgs = np.array(imgs)
       np.save(os.path.join(save_dir, save_name), imgs)
       return imgs

Train view extractor models using your own dataset by running python train_view_extractor.py.

python train_view_extractor.py -g 0 -m 0
options:
     -g,--gpus  <int> GPU id on which model runs
     -m,--model <int> view extractor id, it take one value from 0-9. Support 10 type models as extractor ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3','resnet18', 'resnet34', 'desnet169', 'desnet201', 'NASNetMobile']

Extract multi-view features by calling the following extract_multi_view_feats() function based on trained view extractor models.

def extract_multi_view_feats():
    x = data_uitl.preprocess_input(data_saved_dir='database', save_name='x')
    view_models = ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3']
    Feats = feature.Feature(model_dir='models', save_data_dir='database', database_name='database')
    views = Feats.get_feats_multi_views(view_models, x=x, save_data_suffix=None)

Example Usage: Finding a proper deep fusion model based on your own generated view features

$python train_EDF.py

Example Usage: Generate retrieve database

Generate retrieve database by calling the following construct_retrieve_database() function.

    from features import feature
    from data_utils import npy_util
    import os
    import numpy as np
    from data_utils import data_uitl
    
    def construct_retrieve_database():
        x = data_uitl.preprocess_input(data_saved_dir='database', save_name='x')
        view_models = ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3']
        Feats = feature.Feature(model_dir='models', save_data_dir='database', database_name='database')
        views = Feats.get_feats_multi_views(view_models, x=x, save_data_suffix=None)
        Feats.get_feats_by_edf(views=views, save_data_suffix=None, edf_model_name='3-2-0-1-0-4-0')

Example Usage: Provide the query service by api

Call query() api function to use the service by passing an image url to it.

import os
import numpy as np
from data_utils.npy_util import read_image
from features import feature
from retrieve import retrieve

database_vecs = np.load(os.path.join('database', 'database.npy'))
database_vecs = np.load(os.path.join('database', 'x.npy'))

edf_model_name = '3-2-0-1-0-4-0'

def query(query_img_url):
    query_img = read_image(query_img_url)
    query_img = np.expand_dims(query_img, axis=0)
    query_img = np.expand_dims(query_img, axis=-1)
    query_img = (query_img / 127.5) - 1.
    view_models = ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3']
    Feats = feature.Feature()
    views = Feats.get_feats_multi_views(view_models, x=query_img, save_data_suffix=None)
    x_feats = Feats.get_feats_by_edf(views=views, save_data_suffix=None, edf_model_name=edf_model_name)
    topk_imgs = retrieve.topk_imgs(query_img=x_feats,
                                   database_vecs=database_vecs,
                                   database_imgs=database_imgs,
                                   topk=10,
                                   metric="euclidean")
    return topk_imgs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EDF

Data preparation

Reproduce our results on ChemBook-10k, ChEMBL-10k and PubChem-10k

Reproduce our results on the open-set scenario

Build your recognition system based on your own dataset

Example Usage: Training your own view extractor using your own dataset

Example Usage: Finding a proper deep fusion model based on your own generated view features

Example Usage: Generate retrieve database

Example Usage: Provide the query service by api

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data_utils		data_utils
database		database
edf		edf
features		features
models		models
retrieve		retrieve
single_model		single_model
tools		tools
README.md		README.md
config.py		config.py
get_model_predict_result.py		get_model_predict_result.py
readme		readme
train_EDF.py		train_EDF.py
train_view_extractor.py		train_view_extractor.py

xinyanliang/EDF

Folders and files

Latest commit

History

Repository files navigation

EDF

Data preparation

Reproduce our results on ChemBook-10k, ChEMBL-10k and PubChem-10k

Reproduce our results on the open-set scenario

Build your recognition system based on your own dataset

Example Usage: Training your own view extractor using your own dataset

Example Usage: Finding a proper deep fusion model based on your own generated view features

Example Usage: Generate retrieve database

Example Usage: Provide the query service by api

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages