Querying AI and Cloud Trends: Azure and OpenAI Growth Slows, Amazon Growth Peaked in June

Last Updated on September 2, 2024 by Editorial Team

Author(s): Jonathan Bennion

Originally published on Towards AI.

Cutting through the AI hype to query actual developer usage (as new repos, so with presumptions) for prioritization of safety tools and partnerships.

TLDR (with caveats noted below):

Public AI repos now appear as linear growth, not exponential (surge in March 2024 followed by rapid decline, now slower but steady).
Azure/OpenAI public repo dominance: Azure shows 20x more new repos each month than the next leading hyperscaler, with OpenAI usage also dominating.
Amazon Bedrock public repo growth may have peaked in June 2024 (slightly exponential until then).

Introduction — what did I query?

I leveraged GitHub repository creation data to analyze adoption trends in AI and cloud computing adoption. Code below, analysis follows.

Note on caveats:

Despite obvious bias and limitations (public packages and public repos containing only the names of these packages), this method offers a unique view to developer adoption. Google Cloud and/or Microsoft formerly enabled querying of code within pages, which would have enabled a count of distinct import statements, but at some point recently this was disabled, therefore only leaving the repo names as queryable.

While imperfect, looking at repo creation provides enough data to challenge prevailing market narratives.

First, the notebook setup:

It’s only possible to use Google Cloud Platform (GCP) and BigQuery to access and query the GitHub data archive, so installed these packages (used colab initially, now parked in github).

# Install packages 
!pip install -q pandas seaborn matplotlib google-cloud-bigquery 

# Imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from google.cloud import bigquery
from google.oauth2 import service_account

Query from GCP out of BigQuery:

The following SQL extracts relevant data by categorizing repositories related to specific AI and cloud technologies, then aggregates repository creation counts by creation month.

Dependent on some manual investigation of the right python package names.

query = """
WITH ai_repos AS (
 SELECT
 repo.name AS repo_name,
 EXTRACT(DATE FROM created_at) AS creation_date,
 CASE
 WHEN LOWER(repo.name) LIKE '%bedrock%' THEN 'bedrock'
 WHEN LOWER(repo.name) LIKE '%vertex%' THEN 'vertex'
 WHEN LOWER(repo.name) LIKE '%openai%' THEN 'openai'
 WHEN LOWER(repo.name) LIKE '%anthropic%' THEN 'anthropic'
 WHEN LOWER(repo.name) LIKE '%langchain%' THEN 'langchain'
 WHEN LOWER(repo.name) LIKE '%azure%' THEN 'azure'
 WHEN LOWER(repo.name) LIKE '%llamaindex%' THEN 'llamaindex'
 WHEN LOWER(repo.name) LIKE '%neo4j%' THEN 'neo4j'
 WHEN LOWER(repo.name) LIKE '%pymongo%' THEN 'pymongo'
 WHEN LOWER(repo.name) LIKE '%elasticsearch%' THEN 'elasticsearch'
 WHEN LOWER(repo.name) LIKE '%boto3%' THEN 'boto3'
 WHEN LOWER(repo.name) LIKE '%ayx%' THEN 'ayx'
 WHEN LOWER(repo.name) LIKE '%snowflake-connector-python%' THEN 'snowflake'
 WHEN LOWER(repo.name) LIKE '%c3-toolset%' THEN 'c3ai'
 WHEN LOWER(repo.name) LIKE '%dataiku-api-client%' THEN 'dataiku'
 WHEN LOWER(repo.name) LIKE '%salesforce-einstein-vision-python%' THEN 'salesforce_einstein'
 WHEN LOWER(repo.name) LIKE '%qlik-py-tools%' THEN 'qlik'
 WHEN LOWER(repo.name) LIKE '%palantir-foundry-client%' THEN 'palantir_foundry'
 WHEN LOWER(repo.name) LIKE '%cuda-python%' THEN 'nvidia_cuda'
 WHEN LOWER(repo.name) LIKE '%openvino%' THEN 'intel_openvino'
 WHEN LOWER(repo.name) LIKE '%clarifai%' THEN 'clarifai'
 WHEN LOWER(repo.name) LIKE '%twilio%' THEN 'twilio'
 WHEN LOWER(repo.name) LIKE '%oracleai%' THEN 'oracle_ai'
 ELSE 'other'
 END AS keyword_category
 FROM
 `githubarchive.day.20*`
 WHERE
 _TABLE_SUFFIX >= '240101'
 AND _TABLE_SUFFIX NOT LIKE '%view%'
 AND type = 'CreateEvent'
 AND repo.name IS NOT NULL
 AND (
 LOWER(repo.name) LIKE '%bedrock%'
 OR LOWER(repo.name) LIKE '%vertex%'
 OR LOWER(repo.name) LIKE '%openai%'
 OR LOWER(repo.name) LIKE '%anthropic%'
 OR LOWER(repo.name) LIKE '%langchain%'
 OR LOWER(repo.name) LIKE '%azure%'
 OR LOWER(repo.name) LIKE '%llamaindex%'
 OR LOWER(repo.name) LIKE '%neo4j%'
 OR LOWER(repo.name) LIKE '%pymongo%'
 OR LOWER(repo.name) LIKE '%elasticsearch%'
 OR LOWER(repo.name) LIKE '%boto3%'
 OR LOWER(repo.name) LIKE '%ayx%'
 OR LOWER(repo.name) LIKE '%snowflake-connector-python%'
 OR LOWER(repo.name) LIKE '%c3-toolset%'
 OR LOWER(repo.name) LIKE '%dataiku-api-client%'
 OR LOWER(repo.name) LIKE '%salesforce-einstein-vision-python%'
 OR LOWER(repo.name) LIKE '%qlik-py-tools%'
 OR LOWER(repo.name) LIKE '%palantir-foundry-client%'
 OR LOWER(repo.name) LIKE '%cuda-python%'
 OR LOWER(repo.name) LIKE '%openvino%'
 OR LOWER(repo.name) LIKE '%clarifai%'
 OR LOWER(repo.name) LIKE '%twilio%'
 OR LOWER(repo.name) LIKE '%oracleai%'
 )
)

SELECT
 FORMAT_DATE('%Y-%m', creation_date) AS month,
 keyword_category,
 COUNT(DISTINCT repo_name) AS new_repo_count
FROM
 ai_repos
GROUP BY
 month, keyword_category
ORDER BY
 month, keyword_category
 """

Then extract, load, transform, etc..

Just created a pivot table with the right format..

# Query output to DF, create pivot
df = client.query(query).to_dataframe()
df['month'] = pd.to_datetime(df['month'])
df_pivot = df.pivot(index='month', columns='keyword_category', values='new_repo_count')
df_pivot.sort_index(inplace=True)

# Remove the current month to preserve data trend by month
df_pivot = df_pivot.iloc[:-1]

Next, plotted the data:

First time I’d tried this, I’d had to throw Azure to a secondary axis since it was 20x that of the next repo.

# Define color palette
colors = sns.color_palette("husl", n_colors=len(df_pivot.columns))

# Create plot
fig, ax1 = plt.subplots(figsize=(16, 10))
ax2 = ax1.twinx()

lines1 = []
labels1 = []
lines2 = []
labels2 = []

# Plot each keyword as a line, excluding 'azure' for separate axis
for keyword, color in zip([col for col in df_pivot.columns if col != 'azure'], colors):
 line, = ax1.plot(df_pivot.index, df_pivot[keyword], linewidth=2.5, color=color, label=keyword)
 lines1.append(line)
 labels1.append(keyword)

# Plot 'azure' on the secondary axis
if 'azure' in df_pivot.columns:
 line, = ax2.plot(df_pivot.index, df_pivot['azure'], linewidth=2.5, color='red', label='azure')
 lines2.append(line)
 labels2.append('azure')

# Customize the plot
ax1.set_title("GitHub Repository Creation Trends by AI Keyword", fontsize=24, fontweight='bold', pad=20)
ax1.set_xlabel("Repo Creation Month", fontsize=18, labelpad=15)
ax1.set_ylabel("New Repository Count (Non-Azure)", fontsize=18, labelpad=15)
ax2.set_ylabel("New Repository Count (Azure)", fontsize=18, labelpad=15)

# Format x-axis to show dates nicely
ax1.xaxis.set_major_formatter(DateFormatter("%Y-%m"))
plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45, ha='right')

# Adjust tick label font sizes
ax1.tick_params(axis='both', which='major', labelsize=14)
ax2.tick_params(axis='both', which='major', labelsize=14)

# Adjust layout
plt.tight_layout()

# Create a single legend for both axes
fig.legend(lines1 + lines2, labels1 + labels2, loc='center left', bbox_to_anchor=(1.05, 0.5), fontsize=12)

# Adjust subplot parameters to give specified padding
plt.subplots_adjust(right=0.85)

Results were interesting — since each month shows new repos created, Azure was exponential until March 2024, then declined quickly — is now linear growth since May 2024.

Re-plotted the data for clarity on smaller movements:

With the top 3 repos removed, it’s easier to see the scale — Amazon Bedrock clearly shows steadier adoption but appears to peak in June 2024. Note that some packages are not meant to show adoption, since these are public packages (e.g. Snowflake, Nvidia CUDA), and public repos.

# Isolate the top 3 to remove
top_3 = df_pivot.mean().nlargest(3).index
df_pivot_filtered = df_pivot.drop(columns=top_3)

fig, ax = plt.subplots(figsize=(16, 10))

for keyword, color in zip(df_pivot_filtered.columns, colors[:len(df_pivot_filtered.columns)]):
 ax.plot(df_pivot_filtered.index, df_pivot_filtered[keyword], linewidth=2.5, color=color, label=keyword)

ax.set_title("GitHub Repository Creation Trends by AI Keyword (Excluding Top 3 Packages)", fontsize=24, fontweight='bold', pad=20)
ax.set_xlabel("Repo Creation Month", fontsize=18, labelpad=15)
ax.set_ylabel("New Repository Count", fontsize=18, labelpad=15)

ax.xaxis.set_major_formatter(DateFormatter("%Y-%m"))
plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')

ax.tick_params(axis='both', which='major', labelsize=14)

# Adjust layout
plt.tight_layout()

# Place legend outside the plot
ax.legend(loc='center left', bbox_to_anchor=(1.05, 0.5), fontsize=12)

# Adjust subplot parameters to give specified padding
plt.subplots_adjust(right=0.85)
plt.show()

Takeaways:

Very large disparity between the smaller packages and those from ‘Big Tech’.
Azure and OpenAI dominate but growth is slowed.
Amazon may have peaked in June 2024.

More to come, stay tuned on more parts to this analysis (follow me for more updates)

FYI the dataframe is below, showing where obvious package names might not reflect the entire usage of the tool (e.g. Nvidia, Snowflake) — note (again) the many biases and caveats (one repo might contain x scripts etc), so this assumes a new (and public) repo is growth.

Image created by author from the df for plotting the above plots (raw data).

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Querying AI and Cloud Trends: Azure and OpenAI Growth Slows, Amazon Growth Peaked in June

Author(s): Jonathan Bennion

TLDR (with caveats noted below):

Introduction — what did I query?

First, the notebook setup:

Query from GCP out of BigQuery:

Then extract, load, transform, etc..

Next, plotted the data:

Re-plotted the data for clarity on smaller movements:

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Querying AI and Cloud Trends: Azure and OpenAI Growth Slows, Amazon Growth Peaked in June

Author(s): Jonathan Bennion

TLDR (with caveats noted below):

Introduction — what did I query?

First, the notebook setup:

Query from GCP out of BigQuery:

Then extract, load, transform, etc..

Next, plotted the data:

Re-plotted the data for clarity on smaller movements:

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥