LLEMA: Accelerating Materials Design via LLM-Guided Evolutionary Search

1Virginia Tech 2Sandia National Laboratories
ICLR 2026 ICLR 2026

*Equal Contribution

Abstract

Materials discovery requires navigating vast chemical and structural spaces while satisfying multiple, often conflicting, objectives. We present LLEMA (LLM-guided Evolution for MAterials design), a unified framework that couples the scientific knowledge embedded in large language models with chemistry-informed evolutionary rules and memory-based refinement. At each iteration, an LLM proposes crystallographically specified candidates under explicit property constraints; a surrogate-augmented oracle estimates physicochemical properties; and a multi-objective scorer updates success/failure memories to guide subsequent generations. Evaluated on 14 realistic tasks spanning electronics, energy, coatings, optics, and aerospace, LLEMA discovers candidates that are chemically plausible, thermodynamically stable, and property-aligned, achieving higher hit-rates and stronger Pareto fronts than generative and LLM-only baselines. Ablation studies confirm the importance of rule-guided generation, memory-based refinement, and surrogate prediction. By enforcing synthesizability and multi-objective trade-offs, LLEMA delivers a principled pathway to accelerate practical materials discovery.

Novel Materials Discovery Benchmark

Materials Discovery Benchmark

Overview of the 14 comprehensive materials discovery tasks and their associated properties.

We introduce a new benchmark for multi-objective materials discovery, evaluating LLEMA on 14 comprehensive materials discovery tasks spanning diverse industrial applications across electronics (semiconductors, transparent conductors, thermoelectric materials), energy (battery electrodes, photovoltaics, fuel cell components), coatings (corrosion-resistant alloys, wear-resistant ceramics), optics (high-refractive materials, optical filters), and aerospace (high-temperature alloys, lightweight structural materials). Each task requires optimizing multiple competing objectives while maintaining chemical plausibility, thermodynamic stability, and synthesizability constraints. This benchmark provides a standardized evaluation framework for future materials discovery methods.

Method Overview

LLEMA is formulated as an agentic AI system for materials discovery, consisting of four interconnected components that interact in a closed-loop optimization process. Central to the system is a large language model that operates as an autonomous hypothesis-generation agent, proposing candidate materials at each iteration. Its behavior is governed by dynamically constructed prompts that encode task objectives, chemistry-informed design constraints, demonstrations from prior system trajectories, and structured output specifications, enabling systematic and adaptive exploration of the materials design space.

LLEMA Framework Diagram

Overview of the LLEMA framework. An LLM proposes material candidates under task constraints, which are then evaluated and refined using chemistry-informed rules, memory-based guidance, and surrogate property prediction. The iterative process balances exploration and exploitation, enhancing multi-objective materials discovery.

LLM-Guided Generation

Leverages scientific knowledge embedded in large language models to propose chemically plausible material candidates

Chemistry-Informed Rules

Integrates domain-specific evolutionary operators for compositional substitution and crystal structure manipulation

Memory-Based Refinement

Maintains reward and error buffers to guide exploration and exploitation across generations

Multi-Objective Optimization

Balances multiple property constraints including thermodynamic stability and synthesizability

Island-Based Evolution

Manages independent evolution across multiple islands for diverse candidate exploration

Qualitative Results

Impact of Surrogate Predictors

Integrating surrogate models for property prediction in LLEMA achieves higher overall performance.

Higher Stability and Hit-Rate

Diverity of Candidates

LLEMA achieves higher diversity of candidates compared to base LLMs.

Diverity of Candidates

Stronger Pareto Plot

LLEMA achieves stronger Pareto plot over other baselines.

Stronger Pareto Plot

Ablation Study

LLEMA achieves the highest stability, hit-rate and the lowest memorization rate by incorporating memory-based evolution and chemistry-informed design principles.

Memory Analysis

Lower Memorization

LLEMA achieves lowest memorization leading to novel candidate generation.

Lower Memorization

Quantitative Results

Quantitative Results

BibTeX

@inproceedings{abhyankar2026llema,
        title={LLEMA: Accelerating Materials Design via {LLM}-Guided Evolutionary Search},
        author={Abhyankar, Nikhil and Kabra, Sanchit and Desai, Saaketh and Reddy, Chandan K},
        booktitle={The Fourteenth International Conference on Learning Representations (ICLR)},
        year={2026},
        url={https://openreview.net/forum?id=TIqzhBvCNB}}