The Choice for Businesses Between Open-Source and Proprietary Models To Deploy Generative AI

Last Updated on November 8, 2023 by Editorial Team

Author(s): Faizan Ahmad, PhD

Originally published on Towards AI.

The surge of interest in Generative AI has sprung up over 350 companies in the field by mid-2023 [1], with value propositions that span from foundational models to specific use cases. This breadth of choice of vendors has necessitated a thoroughly informed decision by businesses looking to implement this nascent technology, in which the criteria should go beyond simply looking at brand positioning or relative pricing. This article looks at one of the dimensions of this multi-factor approach: adopting an open source versus a proprietary LLM (Language Learning Model).

Figure 1 shows key players in the Generative AI market, divided between open-source and closed-source (i.e., proprietary) offerings. Among big tech, Google, Microsoft (Open AI), and Amazon have proprietary products, while Meta (Facebook) and NVIDIA offer open-source models. Businesses that are already large consumers of services from tech giants, such as of cloud storage or analytics products, may decide to stick with their current provider in order to benefit from the scalable, seamless integration of Generative AI into the existing ecosystem. For the rest of the competitive landscape, open-source space is dominated by the likes of Anthropic, Inflection, and Cohere, while Hugging Face, Mistral AI, and Stabilitiy.ai lead the closed-source side.

Businesses that are already consumers of services from tech giants may decide to stick with their current provider to benefit from the integration of Generative AI into the existing ecosystem.

The Choice for Businesses Between Open-Source and Proprietary Models To Deploy Generative AI — Figure 1: Key players offering open-source and closed-source Gen AI models

Criteria to opt between open-source and proprietary Gen AI models

Every business would need to take a nuanced approach to calculate its own ROI (return on investment) when considering a vendor to deploy Generative AI. Differences to take into account would not be only between open-source and closed-source but also within the two categories. Figure 2 provides a summary of the relevant factors.

Pricing

In essence, open source is free to access, but there might be fees associated with additional licenses or services that are not part of their core offering. The pricing policies for closed-source providers vary greatly as the market is still learning about the value generated. The most prevalent pricing structure is based on the size of the input and output token (this is fundamentally the length of the text). Another approach is to base it on the number of times it is called, irrespective of the length of the text. Google uses the former, while Microsoft has a more complex, hybrid methodology. Amazon has not yet disclosed its pricing structure in detail.

The most prevalent pricing structure is based on the size of the input and output token.

Flexibility

The consideration of flexibility is two-fold. The first is the level of customization possible, where open source wins as it is up to the user how to utilize it. Closed source offerings could differ here, for example, Amazon and Microsoft are deemed to have more diversity in their foundational models for enterprise use than Google at present. The second is the issue of vendor lock-in. While open access models could be easy to migrate from one source to another as there would be no contractual limitations, there is not yet any clarity on how vendors could be switched for the closed source instance.

The consideration of flexibility is two-fold: The level of customisation and the issue of vendor lock-in.

Transparency

Open-source models are naturally more transparent, as the scrutiny of their performance is crowdsourced. Information on potential vulnerabilities is also quickly picked up and widely shared, whereas such data is unlikely to be made available for proprietary models. For instance, compared to other tech giants, Amazon at present has provided the least amount of information on how its models perform compared to others.

Talent

The savings from no access fee for open source models may be offset by the greater people cost. More talent, both in number and degree of specialization, would be required for deploying open-source models. Firstly, such skill is not readily available as the technology itself is still in its infancy, and the demand is unprecedented. Secondly, these jobs would be on the high end of the salary range and therefore, expensive to hire and retain. On the other hand, a smaller Data Science and Developer team with generalized knowledge of AI may suffice for customers of proprietary products.

The savings from no access fee for open source models may be offset by the greater people cost.

Support

Development and maintenance of the code and underlying infrastructure is more streamlined for closed-source models and would be packaged as part of the offering to businesses. Dedicated customer service is also likely to be a feature for closed-source providers, offering help with troubleshooting, etc., something that the open-source option, in general, would lack.

Speed-to-market

While the models themselves are quickly accessible for open source, the deployment speed might be lower than that for the closed source case because of the latter’s neatly packaged, user-friendly interfaces. This, compounded by the time-consuming process of hiring, may mean a slower overall go-to-market for open source.

Performance

On average, proprietary models are deemed to perform better than open-source ones, though this gap is shrinking over time. The difference is primarily due to the fact that, on average, open-source providers may not have the huge level of resources needed to focus on gaining such competitive advantage through an iterative approach, as training LLMs is expensive, needing large storage and intensive computation. In fact, by Q3 2023, the funding of ~$670M for the top five open-source start-ups has been dwarfed by that of closed-source ones of ~ $20B [2].

Figure 2: Considerations when choosing between open-source and closed-source Gen AI offerings

By Q3 2023, the funding of ~$670M for the top five open source start-ups has been dwarfed by that of closed source ones of ~ $20B.

Two other points to consider are privacy and IP rights. Open source is less likely to suffer from the issue of data privacy and leakage as it is adopted in-house. However, most closed-source providers offer to ring-fence enterprise data so that it is not used for further training of their models. The contrast of privacy is more dependent on the contractual terms of the particular vendor, and less so on the two categories considered here.

Given its novelty, the regulation around the IP rights of the data used to train LLMs has not yet been laid out. Though open source would see a higher risk from regulatory factors as it is trained exclusively on public data, closed source providers may also have to detail their inputs if required by law in the future. How customers of Gen AI providers are affected by it again depends on the provisions in place by each player, irrespective of whether it is open or closed source.

As the decision between open source and proprietary models could significantly impact a business, it is imperative to weigh the pros and cons holistically and promptly.

Sources: [1] Dealroom, [2] CB Insights

Disclaimer: The opinions and views expressed in this personal blog are solely those of the author and do not represent the views of any organizations or companies. No private or proprietary information is included.

As this is original work, please let me know of any errors or omissions.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

The Choice for Businesses Between Open-Source and Proprietary Models To Deploy Generative AI

Author(s): Faizan Ahmad, PhD

Criteria to opt between open-source and proprietary Gen AI models

Pricing

Flexibility

Transparency

Talent

Support

Speed-to-market

Performance

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Do AI Agents Really Use the Tools You Build for Them? I Tested It.

Understanding Neural Networks — and Building One!

LLMs Don’t Just Need to Be Smart — They Need to Be Specific. Here’s How.

Beyond pre-trained LLMs: Augmenting LLMs through vector databases to create a chatbot on organizational data

Harnessing the power of LLMs and LangChain for structured data extraction from unstructured data

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

The Choice for Businesses Between Open-Source and Proprietary Models To Deploy Generative AI

Author(s): Faizan Ahmad, PhD

Criteria to opt between open-source and proprietary Gen AI models

Pricing

Flexibility

Transparency

Talent

Support

Speed-to-market

Performance

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement