Researchers recently demonstrated how a modest investment in fine-tuning can turn a cutting-edge large language model into a personalized version of yourself. Using only 6,128 personal Telegram messages, a team trained a DoRA adapter on Qwen3-8B in just 3.5 hours on a single RTX 3090 instance, totaling less than $1.50 in compute costs. When pitted against the untuned model in a blind A/B test, the customized version outperformed the stock Qwen3-8B 100% of the time while avoiding catastrophic forgetting on 50 general-knowledge tasks.
Building the dataset from real conversations
The training data came directly from the author’s Telegram export, a JSON file containing 1,047 personal chats. To balance representation, the team capped each chat at 12 message pairs—ensuring that a few highly active conversations did not dominate the dataset. After deduplication, the final collection consisted of 6,128 training examples and 322 validation pairs. Each example paired a message from another person with the author’s response, creating a direct mapping of conversational style.
Customizing Qwen3 with DoRA instead of LoRA
The team opted for DoRA (Weight-Decomposed Low-Rank Adaptation), a technique introduced in 2024 that decomposes pretrained model weights into magnitude and direction components. Unlike traditional LoRA, which updates both components, DoRA applies low-rank updates only to the directional parameters while learning the magnitude as a separate trainable vector. This approach reportedly achieves performance closer to full fine-tuning while maintaining the efficiency of low-rank adaptation.
The training configuration leveraged the PEFT library for DoRA setup:
from peft import LoraConfig
from transformers import TrainingArguments
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
use_dora=True,
task_type="CAUSAL_LM",
)
training_args = TrainingArguments(
learning_rate=2e-4,
lr_scheduler_type="cosine",
warmup_steps=50,
num_train_epochs=3,
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
max_seq_length=1024,
bf16=True,
gradient_checkpointing=True,
optim="adamw_torch_fused",
)With these settings, the adapter introduced only 30 million trainable parameters out of the full 8-billion-parameter model—a mere 0.4% increase. The final adapter file weighed just 63 MB on disk, and the entire training process completed in 3.5 hours on a Vast.ai RTX 3090 spot instance priced at $0.30 per hour. The key to preserving the author’s voice, however, was a critical tweak: masking the loss function to apply only to the assistant’s tokens in the author’s responses. Without this step, the model would waste capacity learning how other people speak, diluting the personalization signal.
Blind testing reveals clear superiority in voice mimicry
The most revealing evaluation involved 30 hold-out prompts—recent messages where the team knew the author’s actual reply. Each prompt generated three responses: one from the stock Qwen3-8B, one from the DoRA-tuned model, and one from the human author. The responses were labeled randomly (A, B, or C) and presented to a rater who knew the author well, tasked with selecting the response that sounded most like them.
In the head-to-head comparison between DoRA and stock Qwen3, the customized model won every single test. When all three options were presented, the real human response was selected 71% of the time, DoRA 29%, and the stock model 0%. On one specific prompt (p07), the DoRA output even outperformed the author’s own reply. The author commented, "Honestly the DoRA one sounds more like a representative thing I’d say than what I actually wrote that day." This suggests that DoRA’s outputs may represent a smoothed, averaged version of the author’s voice—capturing the essence rather than the specifics of a single conversation.
To ensure the model retained general knowledge, the team tested catastrophic forgetting using a 50-task benchmark covering capitals, math, code, and translations. DoRA maintained performance with zero percentage point drop, demonstrating that personalization did not come at the expense of general capabilities.
Pitfalls to avoid when fine-tuning Qwen3 for chat
Several unexpected challenges emerged during the process. First, the enable_thinking=False flag proved mandatory. Qwen3 defaults to a reasoning model that emits thinking traces (denoted by ...) before responses. Since the training data contained no such traces, the base model’s prior pulled it toward reasoning prefixes during inference, producing hybrid outputs that mixed reasoning with chat-style replies. Setting enable_thinking=False in both training and inference aligned the prefixes and eliminated this issue.
Second, version mismatches caused delays. The Qwen3 implementation required Transformers version 4.51, but later versions demanded PyTorch 2.5 or higher. After two hours of debugging on a Vast.ai RTX 3090 image, the team pinned transformers==4.53.0 to resolve compatibility issues.
Third, adapter compatibility varied across platforms. Cerebras-hosted inference did not support runtime loading of LoRA or DoRA adapters, limiting deployment options. For production use, the team now relies on a self-hosted vLLM instance, estimated at $300 per month for a single RTX 3090 running 24/7. Alternatively, a system prompt combined with retrieval-augmented generation (RAG) serves as a stopgap until user demand justifies the infrastructure investment.
Reproducibility and next steps
The trained adapter is available on Hugging Face under a CC BY-NC 4.0 license, though access is gated due to the private nature of the training data. To replicate the experiment on your own messaging history, you’ll need a single RTX 3090 with 24 GB of VRAM, approximately 3.5 hours of GPU time, and your own Telegram JSON export. The team estimates a total cost of $1 to $3 for most users, depending on the volume of message pairs extracted.
This experiment reinforces a growing thesis: the ideal unit of personalization is the individual, not broad user segments. While many companies attempt personalization by clustering users into 50 personas, the future likely lies in one small adapter per user, trained on continuous, personal data streams and fully controlled by the user. The economics are already viable—$1.50 in compute can transform a frontier model into a reflection of your own voice, with no measurable trade-offs in general capabilities.
AI summary
Qwen3-8B modelini DoRA ile sadece $1,50 maliyetle kişisel bir sese dönüştürün. Kör A/B testlerinde %100 performans artışı ve sıfır unutkanlık. Kurulum ve eğitim adımları burada.