LLEMA: Accelerating Materials Design via {LLM}-Guided Evolutionary Search

Abhyankar, Nikhil; Kabra, Sanchit; Desai, Saaketh; Reddy, Chandan K.

LLEMA: Accelerating Materials Design via LLM-Guided Evolutionary Search

Nikhil Abhyankar^1*, Sanchit Kabra^1*, Saaketh Desai², Chandan K. Reddy¹

¹Virginia Tech ²Sandia National Laboratories

ICLR 2026

^*Equal Contribution

Paper

Dataset Code

Abstract

Materials discovery requires navigating vast chemical and structural spaces while satisfying multiple, often conflicting, objectives. We present LLEMA (LLM-guided Evolution for MAterials design), a unified framework that couples the scientific knowledge embedded in large language models with chemistry-informed evolutionary rules and memory-based refinement. At each iteration, an LLM proposes crystallographically specified candidates under explicit property constraints; a surrogate-augmented oracle estimates physicochemical properties; and a multi-objective scorer updates success/failure memories to guide subsequent generations. Evaluated on 14 realistic tasks spanning electronics, energy, coatings, optics, and aerospace, LLEMA discovers candidates that are chemically plausible, thermodynamically stable, and property-aligned, achieving higher hit-rates and stronger Pareto fronts than generative and LLM-only baselines. Ablation studies confirm the importance of rule-guided generation, memory-based refinement, and surrogate prediction. By enforcing synthesizability and multi-objective trade-offs, LLEMA delivers a principled pathway to accelerate practical materials discovery.

Novel Materials Discovery Benchmark

Overview of the 14 comprehensive materials discovery tasks and their associated properties.

We introduce a new benchmark for multi-objective materials discovery, evaluating LLEMA on 14 comprehensive materials discovery tasks spanning diverse industrial applications across electronics (semiconductors, transparent conductors, thermoelectric materials), energy (battery electrodes, photovoltaics, fuel cell components), coatings (corrosion-resistant alloys, wear-resistant ceramics), optics (high-refractive materials, optical filters), and aerospace (high-temperature alloys, lightweight structural materials). Each task requires optimizing multiple competing objectives while maintaining chemical plausibility, thermodynamic stability, and synthesizability constraints. This benchmark provides a standardized evaluation framework for future materials discovery methods.

Method Overview

LLEMA is formulated as an agentic AI system for materials discovery, consisting of four interconnected components that interact in a closed-loop optimization process. Central to the system is a large language model that operates as an autonomous hypothesis-generation agent, proposing candidate materials at each iteration. Its behavior is governed by dynamically constructed prompts that encode task objectives, chemistry-informed design constraints, demonstrations from prior system trajectories, and structured output specifications, enabling systematic and adaptive exploration of the materials design space.

Overview of the LLEMA framework. An LLM proposes material candidates under task constraints, which are then evaluated and refined using chemistry-informed rules, memory-based guidance, and surrogate property prediction. The iterative process balances exploration and exploitation, enhancing multi-objective materials discovery.

LLM-Guided Generation

Leverages scientific knowledge embedded in large language models to propose chemically plausible material candidates

Chemistry-Informed Rules

Integrates domain-specific evolutionary operators for compositional substitution and crystal structure manipulation

Memory-Based Refinement

Maintains reward and error buffers to guide exploration and exploitation across generations

Multi-Objective Optimization

Balances multiple property constraints including thermodynamic stability and synthesizability

Island-Based Evolution

Manages independent evolution across multiple islands for diverse candidate exploration

Qualitative Results

Impact of Surrogate Predictors

Integrating surrogate models for property prediction in LLEMA achieves higher overall performance.

Diverity of Candidates

LLEMA achieves higher diversity of candidates compared to base LLMs.

Stronger Pareto Plot

LLEMA achieves stronger Pareto plot over other baselines.

Ablation Study

LLEMA achieves the highest stability, hit-rate and the lowest memorization rate by incorporating memory-based evolution and chemistry-informed design principles.

Lower Memorization

LLEMA achieves lowest memorization leading to novel candidate generation.

Quantitative Results

BibTeX

@inproceedings{abhyankar2026llema,
        title={LLEMA: Accelerating Materials Design via {LLM}-Guided Evolutionary Search},
        author={Abhyankar, Nikhil and Kabra, Sanchit and Desai, Saaketh and Reddy, Chandan K},
        booktitle={The Fourteenth International Conference on Learning Representations (ICLR)},
        year={2026},
        url={https://openreview.net/forum?id=TIqzhBvCNB}}

More Works from Our Lab

LLM-FE: Automated Feature Engineering with Large Language Models