Genome modelling and design across all domains of life with Evo 2

  • Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Merchant, A. T., King, S. H., Nguyen, E. & Hie, B. L. Semantic design of functional de novo genes from a genomic language model. Nature 649, 749–758 (2026).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Avsec, Ž et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 57, 949–961 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ku, J. et al. Systems and algorithms for convolutional multi-hybrid language models at scale. Preprint at https://doi.org/10.48550/arXiv.2503.01868 (2025).

  • Vaswani, A. et al. Attention is all you need. In Adv. Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (NIPS, 2017).

  • Kaplan, J. et al. Scaling laws for neural language models. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020).

  • Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).


    Google Scholar
     

  • Gao, T., Wettig, A., Yen, H. & Chen, D. How to train long-context language models (effectively). In Proc. 63rd Annual Meeting of the Association for Computational Linguistics 1, 7376–7399 (ACL, 2025).

  • Dubey, A. et al. The Llama 3 herd of models. Preprint at https://doi.org/10.48550/arXiv.2407.21783 (2024).

  • Liu, S. J. et al. In vivo perturb-seq of cancer and microenvironment cells dissects oncologic drivers and radiotherapy responses in glioblastoma. Genome Biol. 25, 256 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Poli, M. et al. Hyena hierarchy: towards larger convolutional language models. In Proc. 40th International Conference on Machine Learning (eds Karuse, A. et al.) 28043–28078 (2023).

  • Poli, M. et al. Mechanistic design and scaling of hybrid architectures. In Proc. 41st International Conference on Machine Learning 235, 40908–40950 (2024); https://proceedings.mlr.press/v235/poli24a.html.

  • Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. Preprint at bioRxiv https://doi.org/10.1101/2021.07.09.450648 (2021).

  • Notin, P. et al. ProteinGym: large-scale benchmarks for protein design and fitness prediction. Adv. Neural Inf. Process. Syst. 36, 64331–64379 (2023).

  • Benegas, G., Albors, C., Aw, A. J., Ye, C. & Song, Y. S. A DNA language model based on multispecies alignment predicts the effects of genome-wide variants. Nat. Biotechnol. 43, 1960–1965 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Shine, J. & Dalgarno, L. The 3′-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl Acad. Sci. USA 71, 1342–1346 (1974).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kozak, M. The scanning model for translation: an update. J. Cell Biol. 108, 229–241 (1989).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978 (2022).

    Article 

    Google Scholar
     

  • Li, F.-Z., Amini, A. P., Yue, Y., Yang, K. K. & Lu, A. X. Feature reuse and scaling: understanding transfer learning with protein language models. In Proc. 41st International Conference on Machine Learning 235, 27351–27375 (2024).

  • Weinstein, E. N., Amin, A. N., Frazer, J. & Marks, D. Non-identifiability and the blessings of misspecification in models of molecular fitness. In Adv. Neural Information Processing Systems https://proceedings.neurips.cc/paper_files/paper/2022/file/247e592848391fe01f153f179c595090-Paper-Conference.pdf (2022).

  • Dalla-torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • de Almeida, B. P. et al. SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models. Nat. Methods 22, 2301–2315 (2025).

  • Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Huang, H. et al. Functional evaluation and clinical classification of BRCA2 variants. Nature 638, 528–537 (2025).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Patel, A. et al. DART-Eval: a comprehensive DNA language model evaluation benchmark on regulatory DNA. Neural Inf. Process. Syst. 37, 62024–62061 (2024).


    Google Scholar
     

  • Cunningham, H., Ewart, A., Smith, L. R., Huben, R. & Sharkey, L. Sparse autoencoders find highly interpretable features in language models. Preprint at https://doi.org/10.48550/arXiv.2309.08600 (2023).

  • Bricken, T. et al. Towards monosemanticity: decomposing language models with dictionary learning. Transformer Circuits Thread https://transformer-circuits.pub/2023/monosemantic-features (2023).

  • Templeton, A. et al. Scaling monosemanticity: extracting interpretable features from Claude 3 Sonnet. Transformer Circuits Thread https://transformer-circuits.pub/2024/scaling-monosemanticity/ (2024).

  • Bussmann, B., Leask, P. & Nanda, N. BatchTopK Sparse Autoencoders. Preprint at https://doi.org/10.48550/arXiv.2412.06410 (2024).

  • Camargo, A. et al. Identification of mobile genetic elements with geNomad. Nat. Biotechnol. 42, 1303–1312 (2023).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Vorontsov, I. E. et al. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res. 52, D154–D163 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cells 38, 576–589 (2010).

    Article 
    CAS 

    Google Scholar
     

  • Sandoval-Velasco, M. et al. Three-dimensional genome architecture persists in a 52,000-year-old woolly mammoth skin sample. Cell 187, 3541–3562.e51 (2023).

    Article 

    Google Scholar
     

  • Meng, G., Li, Y., Yang, C. & Liu, S. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic Acids Res. 47, e63 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gibson, D. G. et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319, 1215–1220 (2008).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Karr, J. R. et al. A whole-cell computational model predicts phenotype from genotype. Cell 150, 389–401 (2012).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518 (2019).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, Y. et al. Competition-level code generation with AlphaCode. Science 378, 1092–1097 (2022).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Brown, B. et al. Large language monkeys: scaling inference compute with repeated sampling. Preprint at https://doi.org/10.48550/arXiv.2407.21787 (2024).

  • Allis, C. D. & Jenuwein, T. The molecular hallmarks of epigenetic control. Nat. Rev. Genet. 17, 487–500 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Schreiber, J., Lu, Y. Y. & Noble, W. S. Ledidi: Designing genomic edits that induce functional activity. Preprint at bioRxiv https://doi.org/10.1101/2020.05.21.109686 (2020).

  • Linder, J. & Seelig, G. Fast activation maximization for molecular sequence design. BMC Bioinformatics 22, 510 (2020).

    Article 

    Google Scholar
     

  • Zrimec, J. et al. Controlling gene expression with deep generative design of regulatory DNA. Nat. Commun. 13, 5099 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • de Almeida, B. P. et al. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 626, 207–211 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • DaSilva, L. F. et al. DNA-diffusion: leveraging generative models for controlling chromatin accessibility and gene expression via synthetic regulatory elements. Nat. Genet. 58, 180–194 (2026).

  • Sarkar, A. et al. Designing DNA with tunable regulatory activity using score-entropy discrete diffusion. Preprint at bioRxiv https://doi.org/10.1101/2024.05.23.595630 (2024).

  • Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Verkuil, R. et al. Language models generalize beyond natural proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521521 (2022).

  • Bloomfield, D. et al. AI and biosecurity: The need for governance. Science 385, 831–833 (2024).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pathak, A. K. et al. Pervasive ancestry bias in variant effect predictors. Preprint at bioRxiv https://doi.org/10.1101/2024.05.20.594987 (2025).

  • Schubach, M., Maass, T., Nazaretyan, L., Röner, S. & Kircher, M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res. 52, D1143–D1154 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Pampari, A. et al. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. Preprint at bioRxiv https://doi.org/10.1101/2024.12.25.630221 (2025).

  • Durrant, M. G. et al. Bridge RNAs direct programmable recombination of target and donor DNA. Nature 630, 984–993 (2024).

  • About AI Writer

    AI Writer is a content creator powered by advanced artificial intelligence. Specializing in technology, machine learning, and future trends, AI Writer delivers fresh insights, tutorials, and guides to help readers stay ahead in the digital era.

    Check Also

    Highguard is shutting down this month

    Another high-profile live-service game is shutting down soon after launch: this time it’s the free-to-play …

    Leave a Reply

    Your email address will not be published. Required fields are marked *