Unlocking AI Through a Financial Lens (Part 1)

Last Updated on April 17, 2025 by Editorial Team

Author(s): Yi H.

Originally published on Towards AI.

Unlocking AI Through a Financial Lens (Part 1) — Source: Generated using OpenAI’s DALL·E

Compute and infrastructure for an AI company serves as the backbone of its operations — essential and COSTLY.

Let’s discuss and uncover,

– The core financial cost drivers of its key components

– Decision-making metrics to evaluate the system as a whole

Let’s crunch the basics and numbers!

What comprises the compute and infrastructure system? What are its cost drivers?

GPUs Cost Driver: ⭐⭐⭐⭐⭐ Costly

Multi-GPU (Graphics Processing Units) clusters are often used to distribute the intensive computational workload involved in training and inference.

Capital Expenditure: Costly Upfront Purchase

One of the top costs. A single piece of high-performance industrial-level GPUs like NVIDIA H100 ($25k–$45k from a hardware vendor) and H200 ($30k+, higher prices), comes with a premium price tag.

Enterprise bulk purchase terms can offer a lower price (still expensive!), and the cost also depends on config and supply chain mark-ups and fluctuations.

Operational Costs: Major Energy Consumption

Given its heavy computational nature, energy costs add up quickly.

Running an NVIDIA H200 Tensor Core GPU (Max Thermal Design Power TDP of 700W, source: NVIDIA) for 24 hours at California’s 2025 electricity rate (32 cents/kWh, source: energysage) costs 0.7 kW × 24 hrs × 0.32 USD/kWh = 5.04 USD daily.

Major companies operating hundreds of thousands of GPUs can exceed $500K daily easily in energy costs along, with resulting CO2 impacts.

Cloud Pricing

Renting from cloud infrastructure providers is usage-based, with H200 rates from $2–$10/hr (source: vast.ai). Plus additional fees for network storage and data transfer.

CPUs Cost Driver: ⭐⭐⭐⭐ Costly

High-performance CPUs handle orchestration, pre-processing, and system management, supporting GPU-driven computations.

Purchase Cost: Material

Like GPUs, CPUs also have a significant purchase price, especially in multi-core configurations for servers.

While the standalone price not disclosed, NVIDIA Grace Hopper MGX System equipped with NVIDIA’s Grace CPU listed at $65k+ (source: hyperscalers).

Operational Cost: Material but Lower Energy Consumption than GPUs

In industrial settings, high-performance CPUs can be power-hungry, some with a TDP of ~350W, as half of the H200 GPU’s 700W Max TDP.

One of the most powerful CPUs mentioned above, NVIDIA’s Grace CPU, has a TDP of 500W including memory (source: NVIDIA).

The CPU-to-GPU ratio is typically far below 1, as one piece of Grace CPU’s 128 PCIe Gen 5 lanes can support up to 8 H200 GPUs when using a PCIe Gen5 x16 interface.

Storage Systems and Networking Cost Driver: ⭐⭐⭐ Costly

Efficient storage systems are essential for managing datasets, model weights, logs, and other operational data.

On-Premise Storage: cost depends on capacity (TBs) and redundancy mechanisms (RAID configurations).

SSDs and high-performance storage solutions like NVMe drives come with higher upfront costs, such as Samsung’s 9100 Pro series, priced at ~$550 for 4TB (~$137.5/TB).

Cloud Storage: Cloud storage providers (AWS, Google Cloud Storage) typically charge based on stored data volume, data retrieval frequency, and the region where data is stored.

Tiered pricing models based on access frequency (hot, cold, archive storage) is common.

For example, AWS S3 Standard Storage charges approximately $0.023 per GB (~$23/TB/month) for the first 50 TB per month, with reduced rates for higher usage.

Networking Cost Driver: ⭐⭐ Costly

A fast and high-bandwidth network is crucial for ensuring that all parts of the LLM infrastructure communicate effectively.

In cloud environments, costs depends on data transfer volumes.

For example, AWS charges $0.09 per GB transferred out of storage in most US regions after the first 100 GB per month (source: AWS).

On-premises setups incur expenses related to maintaining network hardware and the network capacity, like 10GbE or 100GbE.

Financial Metrics to Monitor, from Servers to Data Centers

Managing servers to data centers housing vast compute and infrastructure: requires continuous resource monitoring + customer satisfaction and retention.

What metrics should we be using?

First, Metrics for Computational Efficiency.

FLOPS (Floating-Point Operations per Second): including both peak theoretical FLOPS and actual operations executed FLOPS.
Petaflop/s-day: represents performing 10¹⁵ (peta) operations per second continuously over one day, totaling ~10²⁰ operations, usually a standardized unit to quantify computational effort in trainings.

Over the past 15 years, hardware computational performance has grown 41% annually, doubling every 2 years in 16-bit and 32-bit FLOPS (source: epoch.ai).

Performance at FP32 (TFLOP/s), source: epoch.ai

Second, the Energy Cost Efficiency Metrics.

FLOPS/Watt or FLOPS/$ (Floating-Point Operations per Second per $): depends on the architecture and power efficiency of GPU. From data of GPUs lauched during 2006 and 2021, FLOPS/$ doubles every ~2.5 years (source: epoch.ai).

GPU FLOP per second per dollar, source: epoch.ai

What about User Engagement vs. System Throughout Performance?

While DAU (Daily Active Users) and MAU (Monthly Active Users) are important, we also focus on how user engagement and usage impact compute and infrastructure efficiency. To measure this, we could use metrics like below:

Users QPS (Queries Per Second per User): the average number of queries each active user generates per second. It helps monitor individual demand and system load for capacity planning and performance optimization.
Per Machine/Cluster/Rack QPS: the average number of queries processed per second by each machine, cluster, or rack. This metric evaluates infrastructure efficiency and load distribution.

❤️ Thank you for taking the time to read my article ❤️

Any feedback=🎁 Gift to me.

I have more inventory stories on AI × Finance lined up. More to come!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Unlocking AI Through a Financial Lens (Part 1)

Author(s): Yi H.

What comprises the compute and infrastructure system? What are its cost drivers?

GPUs Cost Driver: ⭐⭐⭐⭐⭐ Costly

CPUs Cost Driver: ⭐⭐⭐⭐ Costly

Storage Systems and Networking Cost Driver: ⭐⭐⭐ Costly

Networking Cost Driver: ⭐⭐ Costly

Financial Metrics to Monitor, from Servers to Data Centers

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Unlocking AI Through a Financial Lens (Part 1)

Author(s): Yi H.

What comprises the compute and infrastructure system? What are its cost drivers?

GPUs Cost Driver: ⭐⭐⭐⭐⭐ Costly

CPUs Cost Driver: ⭐⭐⭐⭐ Costly

Storage Systems and Networking Cost Driver: ⭐⭐⭐ Costly

Networking Cost Driver: ⭐⭐ Costly

Financial Metrics to Monitor, from Servers to Data Centers

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement