
Task Arithmetic for Model Editing
Author(s): Ayo Akinkugbe
Originally published on Towards AI.

Introduction
In the 2004 film βEternal Sunshine of the Spotless Mind,β Clementine (played by Kate Winslet) and Joel (played by Jim Carey) visit Lacuna Inc. to undergo a revolutionary procedure after a breakup: the selective erasure of painful memories. The process begins with a βmemory mappingβ phase, where the technicians Patrick (Elijah Wood), Stan (Mark Ruffalo), and Mary(Kirsten Dunst) precisely identify and map specific memories before surgically removing them from the patientsβ minds. What makes this fictional procedure so compelling isnβt just its ability to delete unwanted memories, but its surgical precision β removing targeted experiences while leaving the rest of the personβs memories intact.
This concept isnβt just science fiction anymore β at least with large language models. In the world of machine learning, we face a remarkably similar challenge: How do we selectively modify what our models have βlearnedβ without destroying everything else they know?
Consider this scenario: Youβve deployed a sophisticated language model for your e-commerce platform that excels at product categorization, customer service, and content generation. But then you discover it has learned some biased associations about certain product categories, or you need it to understand entirely new product types that didnβt exist when you first trained it. Traditional approaches would require you to either:
- Retrain from scratch β this is expensive, time-consuming, and you lose all the valuable knowledge the model has accumulated.
- Fine-tune on new data β this risks catastrophic forgetting where the model loses its previous capabilities
- Use with the limitations β suboptimal performance is accepted rather than risk breaking what works
Enter model editing β our real-world equivalent of Lacuna Inc.βs memory mapping technology.
What is Model Editing
Model editing is the process of changing a modelβs behavior without retraining it from scratch. This includes:
- Adding new capabilities (learning new product categories)
- Removing unwanted behaviors (eliminating bias or harmful outputs from models)
- Modifying existing skills and tasks(adjusting the confidence or style of responses)
- Combining capabilities and modals (merging specialized models into a single multi-task system)
Task arithmetic, introduced by Ilharco et al. in βEditing Models with Task Vectorsβ treats model capabilities as vectors that can be combined, scaled, and manipulated with mathematical precision. Just as the fictional technicians could map and manipulate specific memories, task arithmetic allows us to map learned behaviors as mathematical vectors that can be added, subtracted, scaled, and combined. Unlike Clementine and Joelβs irreversible memory erasure, task arithmetic operations are reversible β you can always subtract what youβve added or add back what youβve removed. For instance
- To remove biased behavior β Subtract that task vector.
- Need to add new capabilities? β Add the corresponding task vectors.
- Want to fine-tune the strength of a particular skill? β Scale its vector up or down.
This approach opens up exciting possibilities that were previously impractical or impossible: creating models that can be updated as surgically as editing a document, combining the best capabilities from multiple specialized models, and maintaining fine-grained control over model behavior in production systems.
This post explores how task arithmetic works, dives deep into implementation techniques, and examines practical applications.
Understanding Task Vectors
At its core, task arithmetic is surprisingly elegant yet intuitive. When we fine-tune a model on a specific task, weβre essentially teaching it new patterns and behaviors. The difference between the fine-tuned modelβs parameters and the original modelβs parameters captures exactly what the model learned from that task.
Task vector = Fine tuned model parameters β Base model parameters
Think of this as isolating the βmemoryβ of a specific skill. Just as Lacuna Inc. could map Joelβs memories of Clementine, we can map a modelβs learned behaviors into discrete, manipulable vectors.
The Linear Superposition Hypothesis
Task arithmetic works under the assumption that different capabilities in neural networks exist in a kind of linear superposition β they can be added together without destructive interference. This is similar to how different radio frequencies can coexist in the same space without canceling each other out in electromagnetism.
A Pseudocode Example
For instance if we intended to create a bi-task model that both classifies sentiment and detect spam using task arithmetic, we would:
- Choose a base model
- Train separate specialized model. In this case that would be a sentiment model and a spam model separately
- Extract the tasks vectors by performing element wise subtraction of the base model parameters from each specialized model parameters
- Create a multimodal model through addition of both vectors to the base model
def multimodal_task_arithmetic():
"""
Example: A model that can both classify sentiment AND detect spam
"""
base_model = load_pretrained_model("bert-base")
# Train separate specialists
sentiment_model = fine_tune(base_model, sentiment_data)
spam_model = fine_tune(base_model, spam_data)
# Extract task vectors
sentiment_vector = sentiment_model.parameters - base_model.parameters
spam_vector = spam_model.parameters - base_model.parameters
# Create multi-task model through addition
multi_task_model = base_model.parameters + sentiment_vector + spam_vector
return multi_task_model
Recent research (Tam et. al, 2024) suggests that neural networks learn different tasks in different subspaces of the parameter space. When these subspaces donβt significantly overlap, we can add and subtract learned behaviors without interference. Below is a visual representation.
import numpy as np
import matplotlib.pyplot as plt
def visualize_task_spaces():
"""
A conceptual visualization of how different tasks occupy different
regions of parameter space
"""
# Define the color palette
colors = ['#4E79A7', '#F28E2B', '#E15759', '#76B7B2', '#59A14F']
# Simulate parameter space (reduced to 2D for visualization)
base_point = np.array([0, 0])
# Different tasks learn in different directions
task_a_vector = np.array([3, 1]) # Sentiment analysis
task_b_vector = np.array([1, 3]) # Spam detection
# Combined model is sum of vectors
combined = base_point + task_a_vector + task_b_vector
plt.figure(figsize=(10, 8))
plt.arrow(0, 0, task_a_vector[0], task_a_vector[1],
head_width=0.2, head_length=0.2, fc=colors[0], ec=colors[0],
label='Sentiment Task')
plt.arrow(0, 0, task_b_vector[0], task_b_vector[1],
head_width=0.2, head_length=0.2, fc=colors[1], ec=colors[1],
label='Spam Detection Task')
plt.arrow(0, 0, combined[0], combined[1],
head_width=0.2, head_length=0.2, fc=colors[3], ec=colors[3],
label='Combined Model')
plt.scatter([0], [0], c='black', s=100, label='Base Model')
plt.grid(True, alpha=0.3)
plt.legend()
plt.title('Task Vectors in Parameter Space')
plt.xlabel('Parameter Dimension 1')
plt.ylabel('Parameter Dimension 2')
plt.show()
visualize_task_spaces()

Case Study β Knowledge Transfer Across Domains for Sentiment Classification
Letβs say you have a good Amazon review sentiment classifier, and want to classify Yelp reviews. To predict sentiment on Yelp (even if you donβt have Yelp sentiment labels), take the Amazon sentiment vector and adjust it based on the difference in language modeling between Yelp and Amazon. This creates a new Yelp-specific sentiment vector:

This equation allows you to adapt sentiment knowledge from Amazon to Yelp without labeled Yelp data β just LM data. This is how task analogies work βa technique to generate new tasks by relating known ones using vector arithmetic, especially when no labels are available.
Step by Step Implementation
1. Get Language Modeling (LM) Vectors: You first train or fine-tune a model (like T5 or BERT) on language modeling objectives for both datasets:
- Amazon LM vector: Ο_amazon,lm β Train on Amazon text with an unsupervised objective (e.g., masked language modeling).
- Yelp LM vector: Ο_yelp,lm β Do the same for the Yelp text.
These give you task vectors representing how the model adapts to the language style and patterns of each dataset.
2. Subtract the Vectors:

This captures the domain shift β how Yelp differs from Amazon in language patterns.
3. Apply the Shift to a Known Task Vector – Add the difference to the Amazon sentiment vector, trained with sentiment labels:

This gives you a new task vector you can use for sentiment analysis on Yelp, even without Yelp sentiment labels.
A quick recap on how task arithmetic is implemented:
- Choose your base model: e.g. T5-small or LLaMA.
- Fine-tune on Task A: (e.g., sentiment analysis on Amazon)
- Fine-tune on Task B: (e.g., LM on Yelp, or sentiment on Yelp)
- Compute task vectors: subtract weights layer-by-layer
- Apply vector to base model or another model:
Conclusion
Factual knowledge could also be treated as a task: fine-tune on a small factual correction, extract the vector, and apply it. In some cases, this can replace model editing methods like ROME or MEMIT. However the task arithmetic method of model editing has its challenges as not all tasks are linearly composable as the Superposition Hypothesis doesnβt always hold in practice. Additionally model instabilities can emerge when applying large parameter deltas across different architectures. The key to successful implementation lies in understanding these limitations.
References
- Ilharco, G., Ribeiro, M. T., Wortsman, M., Gururangan, S., Schmidt, L., Hajishirzi, H., & Farhadi, A. (2022). Editing models with task arithmetic. arXiv preprint arXiv:2212.04089. https://arxiv.org/abs/2212.04089
- Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and editing factual associations in GPT. arXiv preprint arXiv:2202.05262. https://arxiv.org/abs/2202.05262
- Meng, K., Sharma, A. S., Andonian, A., Belinkov, Y., & Bau, D. (2022). Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229. https://arxiv.org/abs/2210.07229
- Tam, D., Bansal, M., & Raffel, C. (2023). Merging by matching models in task parameter subspaces. arXiv preprint arXiv:2312.04339. https://arxiv.org/abs/2312.04339
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI