Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Premium SSD vs Ultra SSD: Azure Storage Performance for Distributed Databases
Data Engineering   Latest   Machine Learning

Premium SSD vs Ultra SSD: Azure Storage Performance for Distributed Databases

Last Updated on March 4, 2025 by Editorial Team

Author(s): Richie Bachala

Originally published on Towards AI.

When building distributed systems in the cloud, storage performance can make or break your application’s success. In this post, we’ll explore how different Azure disk types perform under distributed database workloads, using YugabyteDB as our distributed database. We’ll dive deep into benchmarking methodologies and reveal practical insights about Azure storage performance characteristics.

The Azure Storage Landscape

Azure offers several managed disk types, each designed for different workloads and performance requirements. We’ll focus on three key offerings:

  1. Premium SSD: The traditional performance-tier offering, providing consistent performance with burstable IOPS
  2. Premium SSD v2: A newer generation offering higher performance and more flexible scaling
  3. Ultra SSD: Azure’s highest-performance offering with configurable IOPS and throughput

Each of these options presents different performance characteristics and price points, making the choice non-trivial for database workloads.

Understanding Distributed Database Workloads

Before diving into performance numbers, it’s essential to understand what makes distributed database workloads unique. Unlike traditional single-node databases, distributed databases like YugabyteDB handle data differently:

  1. Write Operations:
  • Require consensus across multiple nodes
  • Need to maintain consistency across replicas
  • Often involve both WAL (Write-Ahead Log) and data file writes

2. Read Operations:

  • May contact multiple nodes depending on consistency requirements
  • Utilize caching at various levels
  • Can be affected by data locality

These characteristics mean that storage performance impacts database operations in complex ways, often not directly proportional to raw disk performance metrics.

Benchmarking Methodology

To thoroughly evaluate storage performance, we need a comprehensive testing approach. We employed two industry-standard benchmarking tools:

TPC-C Benchmark

TPC-C is a database benchmark that simulates a complete order-processing environment. It’s valuable because:

  • Models real-world business operations
  • Generates mixed read-write workloads
  • Tests multiple transaction types with varying complexity
  • Provides insights into real-world performance expectations

Our implementation focuses on the following transactions:

  • New Order: Complex write-heavy transaction
  • Payment: Mixed read-write transaction
  • Order Status: Read-only transaction
  • Delivery: Write-heavy batch transaction
  • Stock Level: Read-heavy transaction

Each of this transaction is a set of queries that are fired to carry out the business use case. For e.g. the following are the queries that are fired for New Order transaction

  • Get records describing a warehouse, customer, & district
  • Update the district
  • Increment next available order number
  • Insert record into Order and New-Order tables
  • For 5–15 items, get Item record, get/update Stock record
  • Insert Order-Line Record

For TPC-C, we focus primarily on NewOrder latencies as number of NewOrder transactions define the efficiency. So if the NewOrder latency is 50ms, it means it took 50ms to carry out all the queries listed above.

Sysbench

Sysbench is a micro benchmarking workload. It creates a bunch of similar tables and the workloads are uniformly distributed across all keys of all the tables. Following are the two workloads that we use most:

oltp_read_only β€” There are 10 selects in one transaction to random tables and random keys. So if the latency of the transaction is let’s say 10 ms, it means each select is taking 1 ms. And if the throughput is 100 ops/second, it means it is doing 1000 selects per second.

oltp_multi_insert β€” There are 10 inserts in one transaction to random tables and random keys. So if the latency of the transaction is let’s say 50 ms, it means each insert is taking 5 ms. And if the throughput is 100 ops/second, it means it is doing 1000 inserts per second.

While TPC-C provides a high-level view, Sysbench allows us to examine specific performance characteristics:

  • Enables focused testing of individual operation types
  • Provides precise control over workload parameters
  • Helps isolate storage performance impacts
  • Allows scaling tests with different table counts and sizes

We configured Sysbench tests to examine:

  • Point selects (read performance)
  • Insert operations (write performance)
  • Different data set sizes (20 and 30 tables)

● TPCC git repo: https://github.com/yugabyte/tpcc/releases/tag/2.0

● Sysbench git repo : https://github.com/yugabyte/sysbench/

Image by Author

Azure Disk Performance Comparison Tables

Test Environment Configuration

Cluster Configuration

Benchmark Results

Benchmark Configuration Details

Image by Author

Key Findings and Recommendations

Based on our comprehensive testing, we can make several recommendations:

For Read-Heavy Workloads

Premium SSD v2 provides the best balance of performance and cost. The performance gap between Premium SSD v2 and Ultra SSD is minimal for read operations, making Ultra SSD harder to justify purely for read performance.

For Write-Heavy Workloads

Ultra SSD shows its value in write-intensive scenarios, particularly with larger datasets. The consistent performance and lower latencies can justify the higher cost for write-critical applications.

For Mixed Workloads

Premium SSD v2 emerges as the most cost-effective option for most mixed workloads. The performance improvements over Premium SSD are significant, while the cost remains lower than Ultra SSD.

Conclusion

Our testing reveals that Azure disk performance isn’t simply about raw IOPS and throughput numbers. The interaction between storage and distributed database workloads is complex, with CPU often becoming the limiting factor before storage performance is fully utilized.

● If the workload requires low latency, then Ultra SSD would be the best choice. If the workload requires high throughput, then Ultra SSD would also be the best choice. If the workload does not have any specific latency or throughput requirements, then Premium SSD V2 would be a good choice.

● Ultra SSD has the lowest latency and throughput of all three types of disks. However, it is also the most expensive. Premium SSD V2 is a good choice if you need high throughput and are on a budget. Premium SSD is a good choice if you do not have any specific latency or throughput requirements.

For most distributed database deployments, Premium SSD v2 provides the sweet spot of performance and cost.

Ultra SSD becomes compelling primarily for:

  • Write-heavy workloads with strict latency requirements
  • Large datasets with unpredictable access patterns
  • Mission-critical applications requiring consistent performance

When selecting Azure disk types for your distributed database, consider:

  1. Your workload characteristics (read/write ratio)
  2. Dataset size and growth expectations
  3. Performance requirements and budgetary constraints
  4. The actual bottlenecks in your current system

Remember that storage performance is just one piece of the puzzle. A well-designed distributed database system needs to consider network topology, CPU resources, and memory configuration alongside storage performance for optimal results.

Image by author

Thanks for reading

x.com

Edit description

twitter.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓