LLM-FE: Automated Feature Engineering
with Large Language Models

Virginia Tech

Abstract

Automated feature engineering plays a critical role in improving predictive model performance for tabular learning tasks. Traditional automated feature engineering methods are limited by their reliance on pre-defined transformations within fixed, manually designed search spaces, often neglecting domain knowledge. Recent advances using Large Language Models (LLMs) have enabled the integration of domain knowledge into the feature engineering process. However, existing LLM-based approaches use direct prompting or rely solely on validation scores for feature selection, failing to leverage insights from prior feature discovery experiments or establish meaningful reasoning between feature generation and data-driven performance. To address these challenges, we propose LLM-FE, a novel framework that combines evolutionary search with the domain knowledge and reasoning capabilities of LLMs to automatically discover effective features for tabular learning tasks. LLM-FE formulates feature engineering as a program search problem, where LLMs propose new feature transformation programs iteratively, and data-driven feedback guides the search process. Our results demonstrate that LLM-FE consistently outperforms state-of-the-art baselines, significantly enhancing the performance of tabular prediction models across diverse classification and regression benchmarks.

Method Overview

LLM-FE casts feature engineering as a program search problem. Given dataset metadata and task objectives, a large language model generates executable feature-transformation programs using structured prompts and in-context examples. Each program is executed to augment the dataset and evaluated by training a downstream model on a validation split. High-performing programs are stored in an island-based memory buffer and reused as demonstrations to guide subsequent generations. This closed-loop process enables iterative refinement, balancing exploration and exploitation while producing interpretable, reusable features.

LLM-FE Method Overview

Overview of the LLM-FE Framework. For a given dataset, LLM-FE follows these steps: (a) New Hypothesis Generation, where an LLM generates feature transformation hypotheses as programs; (b) Feature Engineering, where the program is applied to create a modified dataset; (c) Model Fitting, where a prediction model is fitted and evaluated on validation data; (d) Multi-Population Memory, which maintains high-scoring programs as in-context samples for iterative refinement.

Key Features

Program-Based Feature Search
Features are represented as executable transformation programs rather than static formulas.
LLM-Guided Evolution
Large language models act as evolutionary optimizers using performance feedback.
Model-in-the-Loop Evaluation
Feature quality is assessed via downstream learning performance.
Interpretable Features
Generated features are human-readable and reusable.

Quantitative Results

Across 16 classification and 10 regression datasets, LLM-FE consistently improves performance over raw features and outperforms AutoFeat, OpenFE, and prior LLM-based baselines.

Classification Dataset Performance

Classification Results

Regression Dataset Performance

Regression Results

Qualitative Results

Ablation studies confirm the importance of domain-aware prompting and evolutionary memory in driving performance gains.

Impact of Domain Knowledge

Integrating domain knowledge helps in generating interpretable features to improve downstream model performance.

Domain Analysis

Ablation Study

Ablation study showing the contribution of each component to overall performance.

Ablation Study

Computational Efficiency

LLM-FE has the best trade-off between computational cost and performance improvement.

Computational Efficiency

Feature Analysis

LLM-FE successfully generates features using simple as well as complex transformations.

Feature Analysis

BibTeX

@article{abhyankar2025llmfe,
  title={LLM-FE: Automated Feature Engineering with Large Language Models},
  author={Abhyankar, Nikhil and Shojaee, Parshin and Reddy, Chandan K.},
  journal={arXiv preprint arXiv:2503.14434},
  year={2025}
}