VGG-Flow

NeurIPS 2025

Value Gradient Guidance for Flow Matching Alignment

Fast and prior-preserving reward finetuning for flow matching models via value gradient matching.

Zhen Liu1,†, Tim Z. Xiao2,*, Carles Domingo-Enrich3,*, Weiyang Liu4, Dinghuai Zhang3,5,†

1The Chinese University of Hong Kong (Shenzhen) | 2University of Tuebingen | 3Microsoft Research | 4The Chinese University of Hong Kong | 5Mila - Quebec AI Institute

*Equal contribution | Corresponding author

Overview

VGG-Flow pipeline and finetuning trajectory visualization
VGG-Flow aligns a pretrained flow model by matching the residual velocity field with a value-gradient guidance signal.

Abstract

VGG-Flow is a gradient-matching method for aligning pretrained flow matching models with reward functions while preserving the base prior. The method is motivated by optimal control: the residual velocity field of the finetuned model should match the gradient of a value function. VGG-Flow uses this principle to directly inject first-order reward information and combines it with a practical forward-looking value-gradient parameterization for fast adaptation. On Stable Diffusion 3 reward finetuning tasks (Aesthetic Score, HPSv2, and PickScore), the method shows strong reward improvement with better diversity and prior preservation.

Method Explained

VGG-Flow starts from an optimal-control objective, derives value-gradient matching from the HJB equation, then trains a value-gradient model and flow model jointly with three losses.

0. Flow Matching Preliminaries

A flow matching model starts from Gaussian noise and maps it to a data sample by integrating a learned ODE velocity field.

\[ x_0 \sim \mathcal{N}(0, I), \qquad \dot{x}_t = v_{\theta}(x_t,t), \quad t \in [0,1], \qquad x_1 = x(t=1). \]

VGG-Flow finetunes this velocity field while preserving the base prior through a control-style regularization.

1. Alignment Objective

Finetune a base velocity field with a reward objective plus a quadratic control cost that keeps the model close to the pretrained prior.

\[ \min_{\theta}\,\mathbb{E}_{x_0\sim p_0,\dot{x}_t=v_{\theta}(x_t,t)} \left[\frac{\lambda}{2}\int_{0}^{1}\left\lVert \tilde{v}_{\theta}(x_t,t)\right\rVert^2\,dt-r(x_1)\right],\quad v_{\theta}(x_t,t)\triangleq v_{\mathrm{base}}(x_t,t)+\tilde{v}_{\theta}(x_t,t). \]

2. HJB Condition and Control Law

Applying HJB yields the optimal residual velocity and links it directly to the value-function gradient.

\[ \partial_tV(x,t)+\min_{\tilde{v}}\left[\nabla V(x,t)\cdot \big(v_{\mathrm{base}}(x,t)+\tilde{v}(x,t)\big) +\frac{\lambda}{2}\|\tilde{v}(x,t)\|^2\right]=0. \]

Direct HJB result: Value Gradient Matching

\[ \tilde{v}^{\star}(x,t)=-\frac{1}{\lambda}\nabla V(x,t). \]

Direct HJB result: Value Consistency

\[ \partial_tV(x,t)=\frac{1}{2\lambda}\|\nabla V(x,t)\|^2-\nabla V(x,t)\cdot v_{\mathrm{base}}(x,t). \]

3. Value-Gradient Parameterization

The value gradient is parameterized with a forward-looking term: reward gradient from a one-step Euler prediction plus a learnable residual.

\[ g_{\phi}(x,t)\triangleq\nabla V_{\phi}(x,t). \]

\[ g_{\phi}(x,t)\triangleq-\eta_t\cdot\operatorname{stop\mbox{-}gradient} \left(\nabla_{x_t}r\!\left(\hat{x}_1(x_t,t)\right)\right)+\nu_{\phi}(x_t,t). \]

4. Training Losses

The final training objective combines matching loss, consistency loss, and boundary loss.

\[ \mathcal{L}_{\mathrm{matching}}(\theta)= \mathbb{E}_{x_0\sim\mathcal{N}(0,I),\dot{x}_t=v(x_t,t)} \left\lVert\tilde{v}_{\theta}(x_t,t)+\beta g_{\phi}(x_t,t)\right\rVert^2. \]

\[ \mathcal{L}_{\mathrm{consistency}}(\phi)= \mathbb{E}_{x_0\sim\mathcal{N}(0,I),\dot{x}_t=v(x_t,t)} \left\lVert\frac{\partial}{\partial t}g_{\phi} +[\nabla g_{\phi}]^{\top}(v_{\mathrm{base}}-\beta g_{\phi}) +[\nabla v_{\mathrm{base}}]^{\top}g_{\phi}\right\rVert^2. \]

\[ \mathcal{L}_{\mathrm{boundary}}(\phi)= \mathbb{E}_{x_0\sim\mathcal{N}(0,I),\dot{x}_t=v(x_t,t)} \left\lVert g_{\phi}(x_1,1)+\nabla r(x_1)\right\rVert^2. \]

\[ \mathcal{L}_{\mathrm{total}}(\theta,\phi)= \mathcal{L}_{\mathrm{matching}}(\theta)+ \mathcal{L}_{\mathrm{consistency}}(\phi)+ \alpha\mathcal{L}_{\mathrm{boundary}}(\phi). \]

Results

Main quantitative result table
Quantitative comparison on Aesthetic Score, HPSv2, and PickScore.

Aesthetic Score

Aesthetic Score convergence curves
Convergence during finetuning.
Aesthetic Score Pareto front
Reward, diversity, and prior-preservation trade-off.

HPSv2

HPSv2 convergence
Convergence during finetuning.
HPSv2 Pareto front
Reward, diversity, and prior-preservation trade-off.

PickScore

PickScore convergence
Convergence during finetuning.
PickScore Pareto front
Reward, diversity, and prior-preservation trade-off.

Temperature Ablation

Aesthetic Score temperature ablation convergence
Effect of inverse temperature on reward and preservation metrics.

Qualitative Comparisons

Aesthetic Score qualitative comparison
Aesthetic Score qualitative results.
HPSv2 qualitative comparison
HPSv2 qualitative results.
PickScore qualitative comparison
PickScore qualitative results.

BibTeX

@inproceedings{liu2025vggflow,
  title={Value Gradient Guidance for Flow Matching Alignment},
  author={Liu, Zhen and Xiao, Tim Z. and Liu, Weiyang and Domingo-Enrich, Carles and Zhang, Dinghuai},
  booktitle={NeurIPS},
  year={2025},
}