TAI #124; Search GPT, Coding Assistant adoption, Towards AI Academy launch, and more!
Last Updated on November 5, 2024 by Editorial Team
Author(s): Towards AI Editorial Team
Originally published on Towards AI.
What happened this week in AI by Louie
This week, we saw many more incremental model updates in the LLM space, together with further evidence of LLM coding assistants gaining traction. Googleβs CEO Sundar Pichai revealed that more than a quarter of new code at Google is now generated by AI, though each piece is reviewed by engineers before implementation. Microsoftβs GitHub Copilot is also enhancing its LLM-powered coding toolkit and expanding beyond its OpenAI dependency. It is now integrating models like Claude 3.5 and Gemini 1.5 Pro to give developers more choice in LLMs, alongside a new no-code tool, GitHub Spark, which lets users develop micro apps.
Meanwhile, we saw several new LLM models and new consumer and LLM developer features released. OpenAIβs new RAG-powered SearchGPT web search feature in ChatGPT is a big improvement on its initial web search offering. The product now works more like the successful Perplexity.ai, but unlike Perplexity, OpenAI doesnβt have API access to this feature. META is reportedly also now developing a similar LLM and RAG web search product. Cohereβs new multimodal Embed 3 model meanwhile enables more accurate and nuanced retrieval for image, multilingual, and noisy datasets. Improved embedding models like this should help to further improve these RAG web search products as well as the customized LLM pipeline products we teach at Towards AI.
On the other hand, Anthropic was disappointed with the pricing released for Haiku 3.5 this week. After revealing great benchmark scores last week, pricing was surprisingly announced to be 4x higher than Haiku 3.0 and significantly more expensive than OpenAI and Geminiβs lower-tier models. It is not clear if this is due to using a larger model size with higher inference costs β or if Anthropic is just constrained in compute capacity and still prioritizing their flagship Sonnet 3.5 model. We wouldnβt be surprised to see a price decrease as inference capacity ramps.
Why should you care?
The relentless pace of feature and model releases in the LLM space is rapidly increasing their capabilities and bringing them close to making a huge impact across the economy. To some extent, LLM coding assistants are far ahead in boosting productivity and changing peopleβs workflows. This is partly because a large amount of work has to go into building on top of foundation LLMs to customize them to a particular application and to increase reliability and ease of use in this domain. This is needed to improve performance and ensure the LLM pipeline really boosts productivity or unlocks new product features. This extra work requires learning the new LLM Developer technical skill stack we teach at Towards AI β but to become a great LLM developer and build a product that actually gets adopted, you also need to learn many new non-technical skills, including an ability and intuition for how to bring expertise from your target market into your product development. This expertise should be brought into your prompt design, agent pipeline design, dataset collection and curation, fine-tuning datasets, and evaluation datasets. It is difficult to work out how to adapt your LLM pipeline to the nuances of the data and user demands in a new and unfamiliar industry niche, but for many, this is easiest to do well within the software industry, given developersβ pre-existing understanding of the industry and the problems developers face. It is not surprising, therefore, that software is where we are seeing some of the most successful LLM products so far.
We expect LLM products will perform best and have a chance of huge-scale adoption the more they have been customized to a specific industry niche. This will require millions of LLM developers to build on top of foundation LLMs to develop these products. Prompt Engineering, GPTs, and no-code agent builder platforms alone just donβt provide the level of flexibility needed to deliver the very best LLM product for a specific application or company. Towards AI is focussed on teaching this new generation of LLM Developers, and very soon, we are going to release an extremely in-depth ~90-lesson practical full stack βLLM Developerβ conversion course. Together with instructor support in our Discord, we hope this will help many more Software Developers and Machine Learning engineers gain this new LLM Developer skillset. We progress all the way from helping choose your project idea, data collection and curation, LLM fundamentals, prompting, RAG, Fine-tuning, Agents, and Deployment. We will also teach you some of the new non-technical skills and tips along the way. All while building a single advanced LLM project, which we will review and certify in the end.
This new course is already available for pre-order on our new Towards AI Academy course platform, where we have also released a new version of our ebook (more about this below!).
β Louie Peters β Towards AI Co-founder and CEO
🎉 Great news! Building LLMs for Production (second edition) is now available as an e-book at an exclusive price on Towards AI Academy!
For the first time, you can access this guide to designing, deploying, and scaling language models directly through our platform β and at a price lower than on Amazon!
Building LLMs for Production is for anyone who wants to build LLM products that can serve real use cases today. It explores various methods to adapt βfoundationalβ LLMs to specific tasks with enhanced accuracy, reliability, and scalability. It tackles the lack of reliability of βout of the boxβ LLMs by teaching the AI developer tech stack of the future: Prompting, Fine-Tuning, RAG, and Tools Use.
Get Building LLMs for Production on Towards AI Academy and explore all the other resources available to support your AI journey!
We will soon launch our new Towards AI Academy course platform more broadly with a series of extremely in-depth practical LLM courses, so stay tuned! These courses will progress beyond the skills you learn in the book by building a much more advanced LLM project, bringing in more non-technical skills and considerations, and providing instructor support. We will also review and certify your own working advanced LLM project at the end, which could be the foundation of a new business, a new tool or product at your company, or a portfolio project for finding a job in the LLM industry.
P.S. If you already have the first edition, youβre eligible for an additional discount for this second edition of the book (post-September 2024) β just reach out to [email protected] to upgrade affordably!
Hottest News
1. Open AI Introduced ChatGPT Search
OpenAI has introduced ChatGPT search, which enables real-time information in conversations for paid subscribers. The feature enhances its AI chatbot with real-time updates on sports, stocks, and news, positioning it as a competitor to major search engines like Google and Bing through partnerships with data providers. The web search will be integrated into ChatGPTβs existing interface. The feature will determine when to tap into web results based on queries, though users can manually trigger web searches.
2. GitHub Spark Lets You Build Web Apps in Plain English
GitHub has unveiled GitHub Spark, an experimental tool from GitHub Next labs, at the GitHub Universe conference. Spark enables users to create web apps using natural language and edit the underlying code, focusing on developing βmicro appsβ and exploring software development through conversational interfaces. Spark also allows users to choose which large language model they want to use.
3. More Than a Quarter of New Code at Google Is Generated by AI
βMore than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers,β CEO Sundar Pichai said on the companyβs third quarter 2024 earnings call. AI is helping Google make money as well. Alphabet reported $88.3 billion in revenue for the quarter, with Google Services (which includes Search) revenue of $76.5 billion, up 13 percent year-over-year, and Google Cloud (which includes its AI infrastructure products for other companies) revenue of $11.4 billion, up 35 percent year-over-year.
4. OpenAI Expands Realtime API With New Voices and Cuts Prices for Developers
OpenAI updated its Realtime API today, which is currently in beta. This update adds new voices for speech-to-speech applications to its platform and cuts costs associated with caching prompts. Beta users of the Realtime API will now have five new voices they can use to build their applications. OpenAI showcased three new voices, Ash, Verse, and the British-sounding Ballad, in a post on X.
5. Cohere Releases Multimodal Embed 3
ββCohere has introduced Embed 3, a multimodal embedding model integrating text and image data to enhance search capabilities. It excels in accuracy and performance, efficiently handling multilingual and noisy data for complex data retrieval.
6. Anthropicβs Claude AI Chatbot Now Has a Desktop App
Claude.ai has launched a new analysis tool that allows Claude to execute JavaScript code for data processing and real-time insights. This feature enhances the platformβs ability to perform complex math and data analysis, offering precise and actionable insights for various teams, including marketing, sales, and engineering.
7. Hugging Face Releases Compact LLMs SmolLM2
HuggingFace released SmolLM2, a family of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. They can solve many tasks while being lightweight enough to run on-device.
Five 5-minute reads/videos to keep you learning
1. Evaluating Feature Steering: A Case Study in Mitigating Social Biases
This article shares the findings from a quantitative experiment to understand what feature steering can and canβt do. It focuses on 29 features related to social biases to better understand how useful feature steering may be for mitigating social biases in LLMs. The article also lists limitations, lessons learned, and possible future directions.
2. Why Building in AI Is Nothing Like Making Conventional Software
Building with AI requires us to break our habits and approach building differently. AI products bring unique risks, and if you donβt understand them, youβre bound to make mistakes. This essay will help you understand how building in AI differs from building in conventional software.
3. OpenAIβs O-1 and Inference-Time Scaling Laws
The article explores OpenAIβs o-1 model, which enhances reasoning in LLMs using a βchain of thoughtβ approach and inference-time scaling laws. Trained with reinforcement learning, the model improves with increased computational time during inference, shifting focus from pre-training to inference, potentially reducing costs and enabling more effective problem-solving.
4. I Own My LLM Chat History, and So Should You
The article argues for user ownership of chat histories with large language models, emphasizing the interchangeability of providers like OpenAI and Google. It highlights the advantages of locally storing conversations for enhanced privacy, accessibility, and analysis.
5. A Primer on Using Googleβs Gemini API To Improve Your Photography
This blog will walk you through building a Photo Critique and Enhancement App using Googleβs Gemini-1.5-Flash-8B API and Streamlit. It also highlights the essentials of Gemini API inferencing. By the end, you will have built an app that critiques and helps you improve your photos.
Repositories & Tools
- NotebookLlama is an open-source tutorial series that guides users in creating a PDF to Podcast workflow using Text-to-Speech models.
- Docling parses documents and exports them to the desired format.
- Screenshot to Code converts screenshots, mockups, and Figma designs into clean, functional code.
- OpenHands is a platform for software development agents that can modify code, run commands, browse the web, call APIs, and more.
Top Papers of The Week
1. Mixture of Parrots: Experts Improve Memorization More Than Reasoning
This paper explores the trade-offs between the mixture of expert models and standard dense transformers. They demonstrated that MoEs can effectively leverage additional experts to improve memory-intensive tasks like fact retrieval but find diminishing returns in reasoning tasks like mathematical problem-solving or graph analysis.
2. Distinguishing Ignorance from Error in LLM Hallucinations
This paper distinguishes between two types of LLM hallucinations: those that occur when the model lacks knowledge (HK-) versus when it hallucinates despite having the correct knowledge (HK+). The researchers developed a method called WACK to systematically capture HK+ across models, using techniques like βbad shotsβ (showing incorrect examples) and βAlice-Bobβ (using subtle persuasion) to induce HK+ hallucinations. They found that hallucination types leave distinct signatures in the modelsβ internal states, different models hallucinate in unique ways even with shared knowledge, and that detecting hallucinations works better when using model-specific datasets rather than generic ones.
3. A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
The paper presents a method to improve LLM training efficiency using a smaller language model (SLM) to provide soft labels and select valuable training examples. This approach transfers predictive capabilities to the LLM, reducing training time. Empirical results demonstrate enhanced pre-training of a 2.8B parameter LLM using a 1.5B parameter SLM on the Pile dataset.
OpenWebVoyager is an open-source framework for developing multimodal web agents using imitation learning. These agents improve iteratively by exploring the web, collecting feedback, and optimizing actions based on successful trajectories, enhancing their real-world web navigation capabilities. Experimental results demonstrate the agentsβ continuous improvement and robust performance across various tests.
5. Retrieval-Augmented Diffusion Models for Time Series Forecasting
The Retrieval-Augmented Time series Diffusion model (RATD) improves time series forecasting by using an embedding-based retrieval process to select relevant historical data, which aids in the denoising phase of the diffusion model, addressing the instability of existing models.
Quick Links
1. Meta AI has announced the open-source release of MobileLLM, a set of language models optimized for mobile devices, with model checkpoints and code now accessible on Hugging Face. However, it is only available under a Creative Commons 4.0 non-commercial license, meaning enterprises canβt use it on commercial products.
2. Googleβs βGrounding with Google Searchβ feature now integrates live search data directly into its Gemini 1.5 models, allowing developers to build AI applications that provide more accurate, up-to-date responses.
3. Patronus AI launched what it calls the first self-serve platform to detect and prevent AI failures in real-time. The systemβs cornerstone is Lynx, a breakthrough hallucination detection model that outperforms GPT-4 by 8.3% in detecting medical inaccuracies.
Whoβs Hiring in AI
Google Cloud GenAI Developer @Accenture (Multiple Locations, USA)
GenAI Software Engineer @RELX INC (Farringdon, United Kingdom)
Software Engineer, AI Tools @Salesforce (Palo Alto, CA, USA)
AI Developer @Insight Global (Chicago, IL, USA)
Software Engineering Intern @MicroStrategy (USA)
Senior Software Engineer @Ocrolus Inc. (USA/Remote)
Data Science Internship Opportunities @Microsoft Corporation (Multiple locations)
Interested in sharing a job opportunity here? Contact [email protected].
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI