About

This started as a CNN model for transcription factor binding prediction in DNA. Ported to PyTorch from the Jax implementation of chapter 3 of "Deep Learning for Biology" (Ravarani et al). The book informs that this task is inspired by one of the evaluation challenges presented in [1] and which sourced the dataset from [2].

In this version, a basic two layer CNN with batch normalization is trained on 10 transcription factors binding datasets, resulting in 10 models. The mean AUC is calculated and written to metrics.

We then use a modified program.md from Andrej Karpathy's autoresaerch project to instruct an LLM to make sequential improvements to the model to improve the mean validation auc (mean auc for all 10 models).

And therefore also the training loop runs for a fixed number of seconds (defined in constants.py) rather than a fixed number of training steps.

Usage

source a venv and install requirements.
python download.py # downloads the dataset from the deeplearningforbiology website
check if the base model training works:

#this trains 10 models using the same , produces loss logs in ./metrics
python main.py --op train 

# run prediction, produces auc validation metrics json in ./metrics
python main.py --op predict

Kick off your dev agent with confirmations disabled such as

claude -p "Hi have a look at program.md and let's kick off a new experiment! let's do the setup first." --dangerously-skip-permissions

Of course run this on a sandbox if using that flag. I use https://ResearchBox.ai

References

[1] Evaluating the representational power of pre-trained DNA language models for regulatory genomics. Ziqi Tang, Nirali Somia, Yiyang Yu, Peter K Koo (2024). Biorxiv doi: https://doi.org/10.1101/2024.02.29.582810 Now published in Genome Biology doi: 10.1186/s13059-025-03674-8

[2] Majdandzic, A., Rajesh, C. & Koo, P.K. Correcting gradient-based interpretations of deep neural networks for genomics. Genome Biol 24, 109 (2023). https://doi.org/10.1186/s13059-023-02956-3

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
compare.py		compare.py
constants.py		constants.py
download.py		download.py
main.py		main.py
program.md		program.md
requirements.txt		requirements.txt
tf_predict.py		tf_predict.py
tfbind.py		tfbind.py
tfbind_utils.py		tfbind_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Usage

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Usage

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages