MolCycleGAN-Optimator is an open-source generative model tailored for molecular optimization. By leveraging CycleGAN architecture and Junction Tree Variational Autoencoder (JT-VAE), it provides advanced capabilities to enhance molecular designs efficiently.
We highly recommend using conda for package management -- the environment.yml file is provided.
The environment can be created by running:
conda env create -f environment.yml
After cloning this repository, execute the following script to initialize submodules:
./scripts/init_repo.sh
All necessary datasets for aromatic ring experiments are provided.
Downloading input data (ZINC 250k dataset and JT-VAE encodings):
./scripts/download_input_data.sh
Downloading aromatic rings experiment data (train/test splits, returned molecules, SMILES data):
./scripts/download_ar_data.sh
Train the model by running:
python train.py
Specify appropriate training parameters for the selected dataset.
Once the model is trained and translations for the test set are generated, use JT-VAE decoding:
python decode.py
Specify your decoding parameters appropriately.
The repository includes all data and code required to reproduce the aromatic rings experiment.
-
Dataset Creation: Use the notebook
data/input_data/aromatic_rings/datasets_generator_aromatic_rings.ipynbto create train/test sets for experiments. -
Model Training: Perform training using:
./scripts/run_aromatic_rings_training.shThis runs
train.pywith predefined parameters for aromatic rings data. -
Decoding Molecules: Decode molecules using:
./scripts/run_aromatic_rings_decoding.shThis executes
decode.pywith base parameters preconfigured for aromatic rings. -
Analysis: Analyze output data using
experiments/aromatic_rings.ipynb.
The MolCycleGAN-Optimator code is written in Python3, while the JT-VAE package relies on Python2. To simplify usage, the environment is configured to work seamlessly within a single setting using downgraded library versions. Please construct the environment strictly using the environment.yml file to avoid compatibility issues.