Adaptive Hilbert Diffusion Models for Controllable Smoothness in Continuous Function Generation

* Equal contribution    † Corresponding authors
1 Department of Artificial Intelligence, Korea University   2 Department of Data Science, Seoul National University of Science and Technology   3 Department of Statistics, Korea University
Accepted at Computational Visual Media

Abstract

In this paper, we introduce a novel approach to achieve additional controllability in generating continuous functions, such as human motions, using diffusion models. Specifically, we focus on controlling the smoothness of the generated motion without relying on smoothness labels in the dataset. Our approach leverages Hilbert Diffusion Models (HDM), which modify the underlying Hilbert space during the inference phase to regulate smoothness. By estimating smoothness information in a self-supervised manner, we address two key questions: the benefits of incorporating Hilbert space structures during training and the feasibility of controlling smoothness without explicit labels. Our method employs multiple kernels to comprehensively model diverse temporal dependencies, addressing the limitations of single-parameter approaches. Experimental results show that our method significantly enhances training efficiency and successfully controls smoothness in both 1D synthetic data and human motion generation without compromising quality. This approach shows a possibility for fine-grained control in generative models.


Overview

Overview of Adaptive-HDM

Overview of the proposed Adaptive-HDM framework. Our approach uses a Length Prediction Module (LPM) to estimate the smoothness of the input sequence and to determine the kernel-induced correlated noise structure used during diffusion. During training, this enables adaptation to diverse temporal characteristics, while at inference time the generation process can be conditioned either on the LPM-predicted length or on a user-specified value, allowing explicit control over output smoothness.


Motion Velocity Controllability

Adaptive-HDM injects correlated noise into the root XZ trajectory using a Squared Exponential kernel parameterized by the length parameter ℓ. A small ℓ produces rapidly-varying (dynamic) correlated noise, which drives the denoising process toward high-velocity root trajectories. A large ℓ produces slowly-varying (smooth) correlated noise, resulting in low-velocity, subtle root motion. At inference time, the user simply specifies ℓ — no retraining or velocity labels required.

◆ High velocity (small ℓ)
Low velocity (large ℓ) ◆
ℓ = 0.03
High velocity
ℓ = 0.24
Mod. high vel.
ℓ = 0.46
Mod. low vel.
ℓ = 1.00
Low velocity
"A man walks
forward."
walks forward ℓ=0.03 walks forward ℓ=0.24 walks forward ℓ=0.46 walks forward ℓ=1.00
"A person kicks
with his right leg."
kicks ℓ=0.03 kicks ℓ=0.24 kicks ℓ=0.46 kicks ℓ=1.00
"A man slowly
strolls backward."
backward ℓ=0.03 backward ℓ=0.24 backward ℓ=0.46 backward ℓ=1.00
"A man quickly
walks to the left."
walks left ℓ=0.03 walks left ℓ=0.24 walks left ℓ=0.46 walks left ℓ=1.00

Velocity Distribution Comparison

To verify that the length parameter ℓ meaningfully controls motion characteristics, we compare the velocity distributions of generated motions across three training configurations. Only Adaptive-HDM (trained with 1000 diffusion steps and adaptive kernel selection) produces well-separated velocity distributions for each ℓ value, demonstrating genuine controllability. Models trained with a random or fixed length parameter fail to separate the distributions — the generated velocity is independent of the specified ℓ.

Random ℓ (Baseline)

Velocity distributions overlap — no control

velocity random
Fixed ℓ (Baseline)

Velocity distributions overlap — no control

velocity fixed
Adaptive-HDM (Ours)

Velocity distributions are well-separated by ℓ

velocity ours

BibTeX

Citation information will be updated upon publication.