Scaling Transferable Coarse-Graining with Mean Force Matching

A data-efficient route to thermodynamically consistent, transferable protein coarse-grained models.

Abigail Park^† and Grant M. Rotskoff^†,‡

We compare force matching, score matching, and mean force matching for training neural coarse-grained protein potentials. Mean force matching substantially lowers noise in the objective, enabling better zero-shot generalization with far less data and compute.

GitHub Repository Download Training Data (Hugging Face) Read the Paper

Figure 1: Project overview. Mean force matching reduces label noise, improves data efficiency, and delivers stronger zero-shot transfer across protein sequence space.

Key Results

50x

fewer training samples needed by mean force matching

87%

less total atomistic simulation time for data generation

10x

faster MACE training epoch (MFM vs. FM on comparable setup)

20x

faster MACE training epoch (MFM vs. score matching)

Architecture and Objective Benchmark

Model	MFM	MFM 100K	FM	SM
SchNet	32.53	31.31	39.83	47.34
MACE	26.20	22.62	34.38	38.78
eSEN	19.05	14.89	24.28	25.55

Held-out mean-force MSE across 50 unseen CATH domains (kcal/mol A)^2. Lower is better.

Generalization Highlights

On trp-cage, models trained with force matching and score matching fail to stabilize the folded basin and confuse metastable states. In contrast, MFM-trained MACE and eSEN recover folded, misfolded, and unfolded basins with much closer agreement to atomistic references.

Beyond single chains, the transferable model generalizes to the ParD-ParE complex, preserving structural stability and reproducing low-error backbone dihedral behavior in structured regions.

Usage (Minimal)

After setup from the README, run commands with Hydra-style overrides (for example, key=value):

make_u_dataset ... to build CG datasets from atomistic trajectories.
u_train ... to train a neural CG potential, then u_test ... to evaluate checkpoints.
cg_sim ... to run CG molecular dynamics with a trained model.

Full argument reference and examples: make_u_dataset, u_train, u_test, cg_sim.

Takeaway

Mean force matching offers a practical scaling path for transferable coarse-grained protein potentials: lower-variance supervision, stronger thermodynamic fidelity, and substantially reduced compute requirements.