Google Cloud’s fractional GPUs slash costs for creative and AI workloads

In early 2025, Google Cloud expanded its hardware lineup by introducing the G4 VM family, built on NVIDIA’s RTX PRO 6000 Blackwell Server Edition GPUs. These instances were designed to handle demanding workloads such as AI training, real-time rendering, physics simulations, and high-end gaming. Each full-sized G4 instance, labeled g4-standard-48, delivers 48 CPU cores, 180GB of RAM, and 96GB of GPU memory—resources that exceed the needs of most daily creative and computational tasks.

For professionals who rely on GPU acceleration but don’t require maximum performance at all times, the original G4 pricing model proved inefficient. Paying for an entire high-end VM often meant leaving significant compute power unused, driving up cloud costs without proportional value. Recognizing this gap, Google introduced fractional G4 VMs, enabling users to share a single physical GPU across multiple virtual machines.

How fractional GPUs reduce costs and improve flexibility

At Google Cloud Next 2026, the company announced general availability of fractional G4 VMs, becoming the first cloud provider to support virtual GPU (vGPU) functionality on RTX PRO 6000 accelerators. vGPU technology lets a single physical GPU be divided into 2, 4, or 8 separate virtual accelerators, each capable of running independent workloads. This innovation aligns computing resources more closely with actual usage patterns, eliminating the need to overprovision expensive hardware.

Google introduced three new machine types to support this feature:

g4-standard-6: 6 CPU cores, 22.5GB RAM, and 12GB GPU memory
g4-standard-12: 12 CPU cores, 45GB RAM, and 24GB GPU memory
g4-standard-24: 24 CPU cores, 90GB RAM, and 48GB GPU memory

These configurations allow teams to match compute resources precisely to their project requirements, reducing both operational costs and environmental footprint.

Who benefits most from fractional GPUs?

Fractional G4 VMs are ideal for organizations transitioning from on-premises workstations to the cloud. Previously, businesses running workloads like image processing, video editing, 3D modeling, or AI inference had to maintain expensive physical hardware or overpay for underutilized cloud instances. The new fractional options make it financially viable to migrate these workflows entirely to the cloud.

Creative studios, architectural firms, and research labs can now access high-performance GPUs on demand without the capital expenditure of purchasing and maintaining dedicated hardware. Similarly, startups and AI teams working on inference models or lightweight training tasks can scale resources dynamically, paying only for what they use.

Getting started with fractional G4 VMs

To create a fractional G4 VM, users can deploy it through Compute Engine or Google Kubernetes Engine (GKE), depending on their workflow. Before launching an instance, ensure the correct GPU drivers are installed to enable vGPU support. Google provides comprehensive documentation covering driver installation and configuration.

Here’s how to create a fractional G4 VM using the gcloud command-line tool:

gcloud compute instances create $VM_NAME \
  --machine-type=g4-standard-6 \
  --zone=us-central1-b \
  --boot-disk-size=$DISK_SIZE \
  --maintenance-policy=TERMINATE \
  --restart-on-failure

Alternatively, users can configure fractional VMs through the Google Cloud Console by selecting the appropriate machine type during instance creation. For containerized workloads, fractional G4 instances are supported in GKE node pools. The following command demonstrates how to create a new node pool with a g4-standard-24 machine type:

gcloud container node-pools create POOL_NAME \
  --cluster CLUSTER_NAME \
  --location=CONTROL_PLANE_LOCATION \
  --machine-type=g4-standard-24

Currently, fractional G4 VMs are available only in the us-central1-b zone, with broader regional rollouts expected in the coming months. Organizations should monitor the official Google Cloud documentation and announcements to track availability in additional zones. Staying updated ensures access to the latest features and performance optimizations as they become available.

As cloud computing continues to evolve, fractional GPU offerings like these make high-performance infrastructure more accessible and cost-effective. By aligning resource allocation with actual workload demands, teams can optimize performance while reducing unnecessary expenses—ushering in a new era of efficient, scalable computing.

AI summary

Google Cloud’un yeni fractional G4 sanal makineleri sayesinde GPU maliyetlerini %75’e kadar azaltın. Küçük ekipler ve bireysel kullanıcılar için ideal performans ve maliyet dengesi.

Google Cloud’s fractional GPUs slash costs for creative and AI workloads

How fractional GPUs reduce costs and improve flexibility

Who benefits most from fractional GPUs?

Getting started with fractional G4 VMs

Comments

Boost GitHub Copilot CLI with language servers for precise code insights

Build a Robust Express + TypeScript Backend with Zero Boilerplate

Why Linters Outperform AI for Clean, Reliable Code