Abstract
Model editing techniques are essential for efficiently updating knowledge in large language models (LLMs). However, the effectiveness of existing approaches degrades in massive editing scenarios, particularly when evaluated with practical metrics. Their robustness is also limited in context-rich settings or when editing multiple facts of the same subject simultaneously. We attribute these failures to the embedding misalignment among knowledge items, which undermines editing reliability at scale. To address this, we propose EAMET (Embedding Alignment Model Editing in Transformers), which addresses this issue by aligning the space of key and residual embeddings. Extensive experiments across six LLMs and three datasets demonstrate that EAMET consistently outperforms existing methods, achieving about 90\% editing efficacy when editing 10k facts.
Method at a Glance

EAMET mitigates misalignment by encouraging the structural similarity between key and residual spaces.
- Compute key-key cosine similarities to form Pk(i).
- During sequential optimization, save each optimized residual and form Pr(i) for earlier items.
- Minimize LKL = KL(Pr(i) || Pk(i)) and top-M LMSE between corresponding pairs.
- Jointly optimize the target residual ri to maximize the model's confidence on the target object under random-prefix prompts.
Key Findings

Finding 1. EAMET Promotes More Aligned Embeddings. The residual embeddings generated by EAMET are more aligned with the key embeddings, while those produced by MEMIT and PMET are more likely to cause inconsistency in the key and residual embeddings space.

Finding 2. EAMET Consistently Achieves Superior Editing Performance Across All Datasets and Model Architectures. Across all evaluated datasets, EAMET demonstrates the highest levels of editing efficacy and generalization.

Finding 3. EAMET Preserves the General Abilities of the Edited models. In addition to achieving state-of-the-art editing performance, EAMET does not impair the base model's fluency or reasoning abilities.

Finding 4. EAMET Remains Effective When Edits Are Preceded by Long Prefixes. EAMET achieves the highest editing efficacy across all models, with at most a 7% drop at 200-token prefixes. In contrast, MEMIT suffers a much larger decline, from 84.75% to 66.50% on LLaMA2- 7B and from 94.2% to 82.25% on DeepSeek-7B.

Finding 5. EAMET Remains Effective When Multiple Facts of the Same Subject Are Edited Simultaneously. EAMET consistently achieves the highest editing efficacy across nearly all settings. Its performance remains stable when editing multiple samples associated with the same subject.
BibTeX
@misc{dai2025eamethrobustmassivemodel,
title={EAMET: Robust Massive Model Editing via Embedding Alignment Optimization},
author={Yanbo Dai and Zhenlan Ji and Zongjie Li and Shuai Wang},
year={2025},
eprint={2505.11876},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.11876},
}