Description: This package includes the python code of the EDF algorithm for multi-view fusion and its application in chemical structure recognition [1] (close-set and open-set). It solves the multi-view features fusion problem by searching an optimal combination scheme of different basic fusion operators.
A simple implementation of EDF version is available in https://github.com/xinyanliang/EDFv0.1.
@ARTICLE{Liang2021Evolutionary,
author={X. {Liang} and Q. {Guo} and Y. {Qian} and W. {Ding} and Q. {Zhang}},
journal={IEEE Transactions on Evolutionary Computation},
title={Evolutionary Deep Fusion Method and Its Application in Chemical Structure Recognition},
year={2021},
volume={25},
number={5},
pages={883-893},
doi={10.1109/TEVC.2021.3064943}}
Requirement: The package was developed with python3 and tensorflow-gpu(2.0.3).
ATTN: This package is free for academic usage. You can run it at your own risk. For other purposes, please contact Dr. Xinyan Liang ([email protected]).
Download data from the following links.
| Datasets | URL | 提取码 |
|---|---|---|
| ChemBook-10k | https://pan.baidu.com/s/1G1P-_YyDhBTeWXhTeyOaBw | 4fcj |
| ChEMBL-10k | https://pan.baidu.com/s/1ZcPyJq8C7EEV0Trmc37U8g | 69n3 |
| PubChem-10k | https://pan.baidu.com/s/1ha8a119gyMul2rzT_aoUlA | olhr |
| tiny-imagenet200 | https://pan.baidu.com/s/1v5g9j_drRYNK9M3lOjXCqg | tacd |
Take dataset "ChemBook-10k" for example,
- Download "ChemBook-10k" data set;
- Put the data set into "ChemBook-10k" folder;
- Modify paramenter 'data_name'='ChemBook-10k' in config.py file.
The folder structure should be:
.
├── EDF
│ ├── ChemBook-10k
│ │ └── view
│ ├── ChEMBL-10k
│ │ └── view
│ ├── PubChem-10k
│ │ └── view
│ └── ...
└── ...
Before running train_EDF.py, you have to set some parameters in config.py file.
def get_configs():
paras = {
'data_name': 'ChemBook',
'fusion_ways': ['add', 'mul', 'cat', 'max', 'avg'],
'fused_nb_feats': 128,
'nb_view': 5,
'pop_size': 28,
'nb_iters': 20,
'idx_split': 1,
# training parameter settings
'result_save_dir': 'EDF-True' + '-128-5' + 'result-1',
'gpu_list': [0, 1, 2, 3, 4, 5, 6],
'epochs': 100,
'batch_size': 64,
'patience': 10,
# EDF
'is_remove': True,
'crossover_rate': 0.9,
'mutation_rate': 0.2,
'noisy': True,
'max_len': 40,
# data set information
'image_size': {
'w': 230, 'h': 230, 'c': 1},
'classes': 10000,
}
return paras
options:
data_name <string> the dataset name to process currently, options ChemBook, Chembl, PubChem and tiny-imagenet200
gpu_list <list> GPU id list to train EDF. More the number of GPUs is, less time EDF takes.
The maximum number of GPUs is equal to the size of population.
$python train_EDF.pySetting: EDF and view extracters are trained on PubChem-10k dataset; retrieve database is constructed using training set of ChEMBL-10k dataset; these images from test set of ChEMBL-10k dataset are used query images.
- Download the trained EDF and view exteacter models from the link https://pan.baidu.com/s/1RtV3QACJpTPtWLJ7NdlSzg, then put them into the models folder;
- Download the datasets from the link https://pan.baidu.com/s/1LIE2ti2c9f4r9wuW3oUARQ 提取码:ejx5, then put them into the database folder;
- Running the following script
open_set_report.py.
# open_set_report.py
from sklearn.metrics import pairwise_distances
import numpy as np
import os
from retrieve import databaseUtil
databaseUtil.construct_retrieve_database_test()
data_dir = 'database'
code = '3-2-0-1-0-4-0'
def get_data():
train_x = np.load(os.path.join(data_dir, code+'train_X.npy'))
train_y = np.load(os.path.join(data_dir, 'train_Y.npy'))
test_x = np.load(os.path.join(data_dir, code+'test_X.npy'))
test_y = np.load(os.path.join(data_dir, 'test_Y.npy'))
return train_x, train_y, test_x, test_y
def cal_dist(topk=[1, 5, 10], metric="euclidean"):
train_x, train_y, test_x, test_y = get_data()
num_test = test_x.shape[0]
train_y = np.tile(train_y, (num_test, 1))
dis = pairwise_distances(X=test_x, Y=train_x, metric=metric, n_jobs=-1)
sort_idx1 = np.argsort(dis, axis=1)
def report_topk(k):
sort_idx = sort_idx1[:, :k]
count = 0
for i in range(num_test):
if test_y[i] in train_y[i, sort_idx[i, :]]:
count += 1
print(count/num_test)
for ki in topk:
report_topk(ki)
if __name__ == '__main__':
os.environ["CUDA_VISIBLE_DEVICES"] = '7'
cal_dist(topk=[1, 5, 10, 15, 20, 50], metric="euclidean")run the above open_set_report.py script as follows:
$python open_set_report.py- Step 1 Train your own view extrtractor using your own dataset or download our trained EDF and view extracter models from the link https://pan.baidu.com/s/1RtV3QACJpTPtWLJ7NdlSzg, then put them into the models folder. Note:The performance may be better by training your all model based on your own dataset
- Step 2 Find a proper deep fusion model based on your own generated view features
- Step 3 Generate retrieve database by extracting fusion layer of deep fusion network
- Step 4 Provide the query service by api
Prerpocess your dataset. Save data in the format of numpy array by running following
imgs2npy() function.
from features import feature
from data_utils import npy_util
import os
import numpy as np
from data_utils import data_uitl
def imgs2npy(imgs_file_list, save_dir='database', save_name='x'):
'''
Read images according to their path, and then save them in the format of npy
:param imgs_file_list: path of images to read
:param save_name: path of npy file to save
:return: images in the format of array of numpy
'''
imgs = []
for img_fn in imgs_file_list:
imgs.append(npy_util.read_image(img_fn))
imgs = np.array(imgs)
np.save(os.path.join(save_dir, save_name), imgs)
return imgs Train view extractor models using your own dataset by running python train_view_extractor.py.
python train_view_extractor.py -g 0 -m 0
options:
-g,--gpus <int> GPU id on which model runs
-m,--model <int> view extractor id, it take one value from 0-9. Support 10 type models as extractor ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3','resnet18', 'resnet34', 'desnet169', 'desnet201', 'NASNetMobile']Extract multi-view features by calling the following extract_multi_view_feats() function based on trained view extractor models.
def extract_multi_view_feats():
x = data_uitl.preprocess_input(data_saved_dir='database', save_name='x')
view_models = ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3']
Feats = feature.Feature(model_dir='models', save_data_dir='database', database_name='database')
views = Feats.get_feats_multi_views(view_models, x=x, save_data_suffix=None)$python train_EDF.pyGenerate retrieve database by calling the following construct_retrieve_database() function.
from features import feature
from data_utils import npy_util
import os
import numpy as np
from data_utils import data_uitl
def construct_retrieve_database():
x = data_uitl.preprocess_input(data_saved_dir='database', save_name='x')
view_models = ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3']
Feats = feature.Feature(model_dir='models', save_data_dir='database', database_name='database')
views = Feats.get_feats_multi_views(view_models, x=x, save_data_suffix=None)
Feats.get_feats_by_edf(views=views, save_data_suffix=None, edf_model_name='3-2-0-1-0-4-0')
Call query() api function to use the service by passing an image url to it.
import os
import numpy as np
from data_utils.npy_util import read_image
from features import feature
from retrieve import retrieve
database_vecs = np.load(os.path.join('database', 'database.npy'))
database_vecs = np.load(os.path.join('database', 'x.npy'))
edf_model_name = '3-2-0-1-0-4-0'
def query(query_img_url):
query_img = read_image(query_img_url)
query_img = np.expand_dims(query_img, axis=0)
query_img = np.expand_dims(query_img, axis=-1)
query_img = (query_img / 127.5) - 1.
view_models = ['resnet50', 'desnet121', 'MobileNetV2', 'Xception', 'InceptionV3']
Feats = feature.Feature()
views = Feats.get_feats_multi_views(view_models, x=query_img, save_data_suffix=None)
x_feats = Feats.get_feats_by_edf(views=views, save_data_suffix=None, edf_model_name=edf_model_name)
topk_imgs = retrieve.topk_imgs(query_img=x_feats,
database_vecs=database_vecs,
database_imgs=database_imgs,
topk=10,
metric="euclidean")
return topk_imgs