Friday, March 21, 2025
No menu items!
HomeNatureOptimizing generative AI by backpropagating language model feedback

Optimizing generative AI by backpropagating language model feedback

  • Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

  • Trinh, T. H., Wu, Y., Le, Q. V., He, H. & Luong, T. Solving olympiad geometry without human demonstrations. Nature 625, 476–482 (2024).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, Y. et al. Competition-level code generation with alphacode. Science 378, 1092–1097 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    MATH 

    Google Scholar
     

  • Yang, J. et al. SWE-agent: agent–computer interfaces enable automated software engineering. In Adv. Neural Inf. Process. Syst. 37 (NeurIPS, 2024).

  • Khattab, O. et al. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. In The Twelfth International Conference on Learning Representations (2024).

  • Zaharia, M. et al. The shift from models to compound AI systems. BAIR https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/ (2024).

  • Zhou, Y. et al. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations (2023).

  • Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Adv. Neural Inf. Process. Syst. 25 (NeurIPS, 2012).

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Fawzi, A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 47–53 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Mankowitz, D. J. et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature 618, 257–263 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar
     

  • Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  • Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

    Article 
    ADS 
    MATH 

    Google Scholar
     

  • Pryzant, R. et al. Automatic prompt optimization with “gradient descent” and beam search. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 7957–7968 (Association for Computational Linguistics, 2023).

  • Zheng, L. et al. Judging LLM-as-a-judge with MH-bench and chatbot arena. Adv. Neural Inf. Process. Syst. 36, 46595–46623 (2023).

  • Li, X. et al. AlpacaEval: an automatic evaluator of instruction-following models. GitHub https://github.com/tatsu-lab/alpaca_eval (2023).

  • Bai, Y. et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. Preprint at https://arxiv.org/abs/2204.05862 (2022).

  • Madaan, A. et al. Self-refine: iterative refinement with self-feedback. In Adv. Neural Inf. Process. Syst. 36 (NeurIPS, 2023).

  • Stiennon, N. et al. Learning to summarize with human feedback. In Adv. Neural Inf. Process. Syst. 33, 3008–3021 (2020).

  • Yuan, W. et al. Self-rewarding language models. In Forty-first International Conference on Machine Learning (2024).

  • Dubois, Y. et al. AlpacaFarm: a simulation framework for methods that learn from human feedback. In Adv. Neural Inf. Process. Syst. 36 (NeurIPS, 2023).

  • Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K. & Yao, S. Reflexion: language agents with verbal reinforcement learning. Adv. Neural Inf. Process. Syst. 36, 8634–8652 (2023).

  • Rein, D. et al. GPQA: a graduate-level Google-proof Q&A benchmark. In First Conference on Language Modeling (2024).

  • Hendrycks, D. et al. Measuring massive multitask language understanding. In The Ninth International Conference on Learning Representations (2021).

  • Lu, P. et al. MathVista: evaluating mathematical reasoning of foundation models in visual contexts. In The Twelfth International Conference on Learning Representations (2024).

  • Lu, P. et al. Learn to explain: multimodal reasoning via thought chains for science question answering. Adv. Neural Inf. Process. Syst. 35, 2507–2521 (2022).

  • Liu, P. et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55, 1–35 (2023).

    MATH 

    Google Scholar
     

  • Suzgun, M. et al. Challenging BIG-bench tasks and whether chain-of-thought can solve them. In Findings of the Association for Computational Linguistics: ACL 2023 13003–13051 (Association for Computational Linguistics, 2023).

  • Cobbe, K. et al. Training verifiers to solve math word problems. Preprint at https://arxiv.org/abs/2110.14168 (2021).

  • Yang, C. et al. Large language models as optimizers. In The Twelfth International Conference on Learning Representations (2024).

  • Dubey, A. et al. The Llama 3 herd of models. Preprint at https://arxiv.org/abs/2407.21783 (2024).

  • Yang, A. et al. Qwen2 technical report. Preprint at https://arxiv.org/abs/2407.10671 (2024).

  • Khan, F. M., Gibbons, J. P. & Sperduto, P. W. Khan’s Treatment Planning in Radiation Oncology (Lippincott Williams & Wilkins (Wolters Kluwer), 2016).

  • Hussein, M., Heijmen, B. J. M., Verellen, D. & Nisbet, A. Automation in intensity modulated radiotherapy treatment planning—a review of recent innovations. Br. J. Radiol. 91, 20180270 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kisling, K. et al. Radiation planning assistant-a streamlined, fully automated radiotherapy treatment planning system. J. Vis. Exp. 134, e57411 (2018).

  • Huang, C., Nomura, Y., Yang, Y. & Xing, L. Meta-optimization for fully automated radiation therapy treatment planning. Phys. Med. Biol. 67, 055011 (2022).

    Article 
    CAS 
    MATH 

    Google Scholar
     

  • Yang, Y. & Xing, L. Clinical knowledge-based inverse treatment planning. Phys. Med. Biol. 49, 5101 (2004).

    Article 
    PubMed 
    MATH 

    Google Scholar
     

  • Liu, S. et al. Automated radiotherapy treatment planning guided by gpt-4vision. Preprint at https://arxiv.org/abs/2406.15609 (2024).

  • Lu, P. et al. Chameleon: plug-and-play compositional reasoning with large language models. Adv. Neural Inf. Process. Syst. 36, 43447–43478 (2023).

  • Yan, B., Zhang, J., Yuan, Z., Shan, S. & Chen, X. Evaluating the quality of hallucination benchmarks for large vision-language models. Preprint at https://arxiv.org/abs/2406.17115 (2024).

  • Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).

  • Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proc. COMPSTAT’2010 (eds Lechevallier, Y. & Saporta, G.) 177–186 (Physica-Verlag, 2010).

  • Wang, Q. et al. High-dimensional automated radiation therapy treatment planning via bayesian optimization. Med. Phys. 50, 3773–3787 (2023).

    Article 
    PubMed 
    MATH 

    Google Scholar
     

  • Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: a next-generation hyperparameter optimization framework. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019).

  • Bianchi, F. et al. zou-group/textgrad: v0.1.6. Zenodo https://doi.org/10.5281/zenodo.14497017 (2024).

  • RELATED ARTICLES

    Most Popular

    Recent Comments