Migrations at aquanode across VMs

Back
Team Aquanode

Team Aquanode

Arpit Bansal

MARCH 07, 2026

Migrations at Aquanode across VMs

So i am training a translation model for a low-resource akkadain language. But using a single gpu whole time, doesn't make sense.

TLDR

  1. Used A100 for data processing
  2. Used 5090 for training
  3. Resumed training on another VM
  4. Uploading Final Weights to HuggingFace on my repo

Data Processing on A100

I needed to do some quality translations on my data, hence using Qwen/Qwen3-30B-A3B-Instruct-2507

I will just have migrations on my data directory: /root/data

I processed my stuff.

Took manual snapshot, and closing the instance.

migration-data-process

My training can happen on 5090, so i will continue it there.

To get the processed data here, just get the migrations. All data available on same path /root/data

Now let's start training.

Training

Sometimes your shell may exit or ssh connection drops, that can kill the process associated with shell.

Using tmux to create a persistent process, so we can detah from shell

tmux new -s train

source .venv/bin/activate 

python train.py --lr 1e-4 --batch_size 4

train-phase-1

Ctrl + B, then D (to detach, doesn't kill the process, now you can safely exit)

tmux attach -t train

Mean time, let's invoke migrations for /root choose time based on how long this training will go for.

After the desired training is done, or even if closed

Here now after getting things done, i closed my VM.

And now let's resume our training

Resume Training

Get the migrations for the last snapshot.

restore

It took 5 minutes for the 33GB of Data.

Acivate venv

tmux new -s train
source .venv/bin/activate
python train.py --lr 1e-5 --batch_size 4 --resume_from_checkpoint latest

train-phase-2

Uploading model to HuggingFace

hf auth login

hf upload <user-name/repo> <path to upload from>
hf upload Arpit-Bansal/Akkadian-experiments models/
#migration#VMs#GPU#checkpointing#backup#aquanode#nvidia
Ready when you are

Stop paying for
idle GPUs.

Sign up in 60 seconds. Pay only for the GPU minutes you actually use.

Aquanode LogoAquanode

© 2026 Aquanode. All rights reserved.

All trademarks, logos and brand names are the
property of their respective owners.