Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

Querying AI and Cloud Trends: Azure and OpenAI Growth Slows, Amazon Growth Peaked in June
Latest   Machine Learning

Querying AI and Cloud Trends: Azure and OpenAI Growth Slows, Amazon Growth Peaked in June

Last Updated on September 2, 2024 by Editorial Team

Author(s): Jonathan Bennion

Originally published on Towards AI.

Cutting through the AI hype to query actual developer usage (as new repos, so with presumptions) for prioritization of safety tools and partnerships.

TLDR (with caveats noted below):

  • Public AI repos now appear as linear growth, not exponential (surge in March 2024 followed by rapid decline, now slower but steady).
  • Azure/OpenAI public repo dominance: Azure shows 20x more new repos each month than the next leading hyperscaler, with OpenAI usage also dominating.
  • Amazon Bedrock public repo growth may have peaked in June 2024 (slightly exponential until then).
Image created by author from code below

Introduction β€” what did I query?

I leveraged GitHub repository creation data to analyze adoption trends in AI and cloud computing adoption. Code below, analysis follows.

Note on caveats:

Despite obvious bias and limitations (public packages and public repos containing only the names of these packages), this method offers a unique view to developer adoption. Google Cloud and/or Microsoft formerly enabled querying of code within pages, which would have enabled a count of distinct import statements, but at some point recently this was disabled, therefore only leaving the repo names as queryable.

While imperfect, looking at repo creation provides enough data to challenge prevailing market narratives.

First, the notebook setup:

It’s only possible to use Google Cloud Platform (GCP) and BigQuery to access and query the GitHub data archive, so installed these packages (used colab initially, now parked in github).

# Install packages 
!pip install -q pandas seaborn matplotlib google-cloud-bigquery

# Imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from google.cloud import bigquery
from google.oauth2 import service_account

Query from GCP out of BigQuery:

The following SQL extracts relevant data by categorizing repositories related to specific AI and cloud technologies, then aggregates repository creation counts by creation month.

Dependent on some manual investigation of the right python package names.

query = """
WITH ai_repos AS (
SELECT
repo.name AS repo_name,
EXTRACT(DATE FROM created_at) AS creation_date,
CASE
WHEN LOWER(repo.name) LIKE '%bedrock%' THEN 'bedrock'
WHEN LOWER(repo.name) LIKE '%vertex%' THEN 'vertex'
WHEN LOWER(repo.name) LIKE '%openai%' THEN 'openai'
WHEN LOWER(repo.name) LIKE '%anthropic%' THEN 'anthropic'
WHEN LOWER(repo.name) LIKE '%langchain%' THEN 'langchain'
WHEN LOWER(repo.name) LIKE '%azure%' THEN 'azure'
WHEN LOWER(repo.name) LIKE '%llamaindex%' THEN 'llamaindex'
WHEN LOWER(repo.name) LIKE '%neo4j%' THEN 'neo4j'
WHEN LOWER(repo.name) LIKE '%pymongo%' THEN 'pymongo'
WHEN LOWER(repo.name) LIKE '%elasticsearch%' THEN 'elasticsearch'
WHEN LOWER(repo.name) LIKE '%boto3%' THEN 'boto3'
WHEN LOWER(repo.name) LIKE '%ayx%' THEN 'ayx'
WHEN LOWER(repo.name) LIKE '%snowflake-connector-python%' THEN 'snowflake'
WHEN LOWER(repo.name) LIKE '%c3-toolset%' THEN 'c3ai'
WHEN LOWER(repo.name) LIKE '%dataiku-api-client%' THEN 'dataiku'
WHEN LOWER(repo.name) LIKE '%salesforce-einstein-vision-python%' THEN 'salesforce_einstein'
WHEN LOWER(repo.name) LIKE '%qlik-py-tools%' THEN 'qlik'
WHEN LOWER(repo.name) LIKE '%palantir-foundry-client%' THEN 'palantir_foundry'
WHEN LOWER(repo.name) LIKE '%cuda-python%' THEN 'nvidia_cuda'
WHEN LOWER(repo.name) LIKE '%openvino%' THEN 'intel_openvino'
WHEN LOWER(repo.name) LIKE '%clarifai%' THEN 'clarifai'
WHEN LOWER(repo.name) LIKE '%twilio%' THEN 'twilio'
WHEN LOWER(repo.name) LIKE '%oracleai%' THEN 'oracle_ai'
ELSE 'other'
END AS keyword_category
FROM
`githubarchive.day.20*`
WHERE
_TABLE_SUFFIX >= '240101'
AND _TABLE_SUFFIX NOT LIKE '%view%'
AND type = 'CreateEvent'
AND repo.name IS NOT NULL
AND (
LOWER(repo.name) LIKE '%bedrock%'
OR LOWER(repo.name) LIKE '%vertex%'
OR LOWER(repo.name) LIKE '%openai%'
OR LOWER(repo.name) LIKE '%anthropic%'
OR LOWER(repo.name) LIKE '%langchain%'
OR LOWER(repo.name) LIKE '%azure%'
OR LOWER(repo.name) LIKE '%llamaindex%'
OR LOWER(repo.name) LIKE '%neo4j%'
OR LOWER(repo.name) LIKE '%pymongo%'
OR LOWER(repo.name) LIKE '%elasticsearch%'
OR LOWER(repo.name) LIKE '%boto3%'
OR LOWER(repo.name) LIKE '%ayx%'
OR LOWER(repo.name) LIKE '%snowflake-connector-python%'
OR LOWER(repo.name) LIKE '%c3-toolset%'
OR LOWER(repo.name) LIKE '%dataiku-api-client%'
OR LOWER(repo.name) LIKE '%salesforce-einstein-vision-python%'
OR LOWER(repo.name) LIKE '%qlik-py-tools%'
OR LOWER(repo.name) LIKE '%palantir-foundry-client%'
OR LOWER(repo.name) LIKE '%cuda-python%'
OR LOWER(repo.name) LIKE '%openvino%'
OR LOWER(repo.name) LIKE '%clarifai%'
OR LOWER(repo.name) LIKE '%twilio%'
OR LOWER(repo.name) LIKE '%oracleai%'
)
)

SELECT
FORMAT_DATE('%Y-%m', creation_date) AS month,
keyword_category,
COUNT(DISTINCT repo_name) AS new_repo_count
FROM
ai_repos
GROUP BY
month, keyword_category
ORDER BY
month, keyword_category
"""

Then extract, load, transform, etc..

Just created a pivot table with the right format..

# Query output to DF, create pivot
df = client.query(query).to_dataframe()
df['month'] = pd.to_datetime(df['month'])
df_pivot = df.pivot(index='month', columns='keyword_category', values='new_repo_count')
df_pivot.sort_index(inplace=True)

# Remove the current month to preserve data trend by month
df_pivot = df_pivot.iloc[:-1]

Next, plotted the data:

First time I’d tried this, I’d had to throw Azure to a secondary axis since it was 20x that of the next repo.

# Define color palette
colors = sns.color_palette("husl", n_colors=len(df_pivot.columns))

# Create plot
fig, ax1 = plt.subplots(figsize=(16, 10))
ax2 = ax1.twinx()

lines1 = []
labels1 = []
lines2 = []
labels2 = []

# Plot each keyword as a line, excluding 'azure' for separate axis
for keyword, color in zip([col for col in df_pivot.columns if col != 'azure'], colors):
line, = ax1.plot(df_pivot.index, df_pivot[keyword], linewidth=2.5, color=color, label=keyword)
lines1.append(line)
labels1.append(keyword)

# Plot 'azure' on the secondary axis
if 'azure' in df_pivot.columns:
line, = ax2.plot(df_pivot.index, df_pivot['azure'], linewidth=2.5, color='red', label='azure')
lines2.append(line)
labels2.append('azure')

# Customize the plot
ax1.set_title("GitHub Repository Creation Trends by AI Keyword", fontsize=24, fontweight='bold', pad=20)
ax1.set_xlabel("Repo Creation Month", fontsize=18, labelpad=15)
ax1.set_ylabel("New Repository Count (Non-Azure)", fontsize=18, labelpad=15)
ax2.set_ylabel("New Repository Count (Azure)", fontsize=18, labelpad=15)

# Format x-axis to show dates nicely
ax1.xaxis.set_major_formatter(DateFormatter("%Y-%m"))
plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45, ha='right')

# Adjust tick label font sizes
ax1.tick_params(axis='both', which='major', labelsize=14)
ax2.tick_params(axis='both', which='major', labelsize=14)

# Adjust layout
plt.tight_layout()

# Create a single legend for both axes
fig.legend(lines1 + lines2, labels1 + labels2, loc='center left', bbox_to_anchor=(1.05, 0.5), fontsize=12)

# Adjust subplot parameters to give specified padding
plt.subplots_adjust(right=0.85)

Results were interesting β€” since each month shows new repos created, Azure was exponential until March 2024, then declined quickly β€” is now linear growth since May 2024.

Image created by author from code above

Re-plotted the data for clarity on smaller movements:

With the top 3 repos removed, it’s easier to see the scale β€” Amazon Bedrock clearly shows steadier adoption but appears to peak in June 2024. Note that some packages are not meant to show adoption, since these are public packages (e.g. Snowflake, Nvidia CUDA), and public repos.

# Isolate the top 3 to remove
top_3 = df_pivot.mean().nlargest(3).index
df_pivot_filtered = df_pivot.drop(columns=top_3)

fig, ax = plt.subplots(figsize=(16, 10))

for keyword, color in zip(df_pivot_filtered.columns, colors[:len(df_pivot_filtered.columns)]):
ax.plot(df_pivot_filtered.index, df_pivot_filtered[keyword], linewidth=2.5, color=color, label=keyword)

ax.set_title("GitHub Repository Creation Trends by AI Keyword (Excluding Top 3 Packages)", fontsize=24, fontweight='bold', pad=20)
ax.set_xlabel("Repo Creation Month", fontsize=18, labelpad=15)
ax.set_ylabel("New Repository Count", fontsize=18, labelpad=15)

ax.xaxis.set_major_formatter(DateFormatter("%Y-%m"))
plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')

ax.tick_params(axis='both', which='major', labelsize=14)

# Adjust layout
plt.tight_layout()

# Place legend outside the plot
ax.legend(loc='center left', bbox_to_anchor=(1.05, 0.5), fontsize=12)

# Adjust subplot parameters to give specified padding
plt.subplots_adjust(right=0.85)
plt.show()
Image created by author from code above

Takeaways:

  • Very large disparity between the smaller packages and those from β€˜Big Tech’.
  • Azure and OpenAI dominate but growth is slowed.
  • Amazon may have peaked in June 2024.

More to come, stay tuned on more parts to this analysis (follow me for more updates)

FYI the dataframe is below, showing where obvious package names might not reflect the entire usage of the tool (e.g. Nvidia, Snowflake) β€” note (again) the many biases and caveats (one repo might contain x scripts etc), so this assumes a new (and public) repo is growth.

Image created by author from the df for plotting the above plots (raw data).

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓