This AI newsletter is all you need #57
Last Updated on August 1, 2023 by Editorial Team
Author(s): Towards AI Editorial Team
Originally published on Towards AI.
What happened this week in AI by Louie
In the AI world this week, LLM model performance evaluations were a topic of focus. In particular, there was a lively debate around a recent study conducted by students from Stanford and Berkeley. The research presents evidence suggesting that the GPT-4 models might be experiencing a decline in performance, colloquially referred to as getting βdumberβ over time. The paper offered a range of evaluations like identifying prime numbers which decreased from 97.6% in March to 2.4% in June, and solving coding questions experienced a significant drop from 52% to 10% accuracy.
This sparked many discussions including whether OpenAI was prioritizing inference speed and cost over model performance. There are also ongoing discussions concerning the accuracy of these claims. Some findings have surfaced indicating that later models exhibit significantly improved performance simply by altering the outputβs formatting. Furthermore, another evaluation demonstrated that neither version of the API outperformed pure chance when it came to the prime number classification problem. In response to the new findings, the authors of the paper clarified their intentions, stating that the aim of the experiment was not to demonstrate the degradation of quality in the OpenAI APIs. Instead, they sought to shed light on the issue of instability and raise awareness about the potential for applications to crash when the underlying modelβs responses change. Furthermore, OpenAI addressed these claims in a blog post and assured users that they are taking steps to enhance API stability. They provide the capability for developers to specify and utilize a particular model version, thus providing more control and predictability in their applications.
On a more positive note for GPT-4 evaluations this week, a separate study conducted on clinical case exams compared GPT-4 to medical students, and the results showed that GPT-4 outperformed the first and second-year Stanford students. The research specified that they did not use any special prompting technique. One of the authors took to Twitter to share insights about the findings, sparking discussions about rethinking the future of student evaluations.
We found both these papers interesting and the possible degradation of GPT-4 performance sparked an important debate. One thing is clear to us β in the world of commercialized products using LLM APIs β the changing and unstable performance of LLMs on specific tasks and for specific prompts as models are updated is a new challenge for developers and companies to work with. And this will be all the more important for potential medical applications! We also believe we need more work on LLM evaluation standards more broadly, including accurately and consistently benchmarking the performance of new open-source LLMs like Llama 2.
– Louie Peters β Towards AI Co-founder and CEO
This issue is brought to you by:
Join us at the industryβs leading artificial intelligence conference, Ai4 2023, in Las Vegas on August 7β9 at the MGM Grand. This is your last chance to join 2200+ AI leaders, 240 speakers, and 100 cutting-edge AI exhibits. Apply for a complimentary pass or register now to save 12% off final prices.
Hottest News
Meta has released Llama-2, an open-source model with a commercial license, which demonstrates similar performance to ChatGPT. Trained on 2T tokens with varying parameter sizes, Llama-2 underwent further fine-tuning and improvement through a combination of instruction and reinforcement learning, surpassing other open-source models like Falcon and MPT in performance.
2. Announcing LangSmith: A Unified Platform for LLM applications
LangChain has developed LangSmith, a platform designed to help developers close the gap between prototype and production. By providing essential debugging, testing, evaluation, and monitoring features, LangSmith assists AI professionals in identifying and addressing issues such as unexpected results, errors, and latency.
3. Apple Is Testing a ChatGPT-Like AI Chatbot
Apple is developing its chatbot, named βApple GPT,β to challenge OpenAI and Google. Despite initial security concerns, the chatbot is now more widely accessible to Apple employees for prototyping purposes, with restricted usage and no customer-bound features allowed.
4. Cerebras Systems Signs a $100 Million AI Supercomputer Deal with UAEβs G42
Cerebras Systems has announced a $100 million deal with G42, signaling the debut of AI supercomputers that could potentially challenge Nvidiaβs market position. To expedite the rollout, Cerebras will construct three Condor Galaxy systems in the United States, with the first supercomputer set to go online this year.
5. Custom Instructions for ChatGPT
OpenAI is introducing personalized custom instructions for ChatGPT, enabling users to have a more tailored and adaptable experience. This feature emphasizes the significance of customization in catering to diverse needs. Custom Instructions will be gradually rolled out to all users, with beta access initially offered to Plus plan subscribers.
Five 5-minute reads/videos to keep you learning
This blog post includes all the relevant resources to help you get started with LLaMa 2. It compiles topics such as βWhat is LLaMa 2,β where you can test the model, the research behind the model, how good the model is, how to correctly prompt the chat model, and more.
Hallucinations in AI propel hyperbolic narratives around foundation models and open-source. Itβs hard to know what to believe and whom to trust. This insightful read from John Luttig delves into some of the narratives and trends in AI that are easily misconstrued or just plain wrong.
The AI WebTV project showcases the potential of text-to-video models like Zeroscope and MusicGen in generating entertaining videos. Created using Hugging Face services, it utilizes a combination of ChatGPT, Zeroscope V2, and FILM to create high-quality video clips with accompanying music.
Mike Loukides argues that the only thing to fear is failing to make the transition to AI-assisted programming. He has been talking and writing about the end of programming, but what does this mean in practice? In this article, Mike shares why and how the use of AI will change the discipline as a whole.
5. How To Ensure Consistency in AI Visuals
This tutorial focuses on generating consistency in AI visuals. It offers basic to advanced techniques for achieving consistency control in Stable Diffusion, Midjourney, and InsightFace.
Papers & Repositories
Stanford University has introduced FlashAttention-2, an algorithm that accelerates attention and reduces memory usage in language models. The updated version is 2x faster than the original and achieves improved performance through better parallelism and work partitioning techniques.
2. Lost in the Middle: How Language Models Use Long Contexts
This study investigates the performance of language models in utilizing extended contexts for tasks such as question answering and retrieval. While models excel in finding relevant information at the start or end of input, their performance declines when accessing middle sections of long contexts. The study highlights the challenges of effectively utilizing long contexts and emphasizes the necessity for future improvements in this area.
3. Towards A Unified Agent with Foundation Models
Researchers have found that incorporating language models and vision language models in reinforcement learning agents can address significant challenges in the field. By harnessing the knowledge stored in these models, agents can effectively explore sparse-reward environments, reuse data for learning, schedule skills for novel tasks, and learn from expert observations.
4. Learning to Retrieve In-Context Examples for Large Language Models
Researchers have developed a framework that utilizes dense retrievers to automatically select high-quality examples for in-context learning of LLMs. Experimental results demonstrate its effectiveness in improving LLM performance by retrieving similar and contextually relevant examples.
5. How Is ChatGPTβs Behavior Changing Over Time?
A research study examined the performance of GPT-3.5 and GPT-4 on various tasks over time. It found some significant variations in their behavior, with GPT-4βs accuracy in identifying prime numbers dropping from March to June 2023. Additionally, both models displayed an increase in formatting mistakes during code generation.
Enjoy these papers and news summaries? Get a daily recap in your inbox!
The Learn AI Together Community section!
Meme of the week!
Meme shared by rucha8062
Featured Community post from the Discord
Louvivien has developed an open-source AI trading app that seamlessly connects to Alpaca, enabling users to access positions, orders, and conduct stock transactions. With this app, users can import collaborative trading strategies and efficiently manage AI trading funds. You can explore this project on GitHub and support a fellow community member. For those interested in AI trading, you can join this open-source project by connecting on the thread here.
AI poll of the week!
Join the discussion on Discord.
TAI Curated section
Article of the week
Fixing SimCLRβs Biggest Problem β BYOL Paper Explained by Boris Meinardus
SimCLR successfully implemented the idea of Contrastive Learning and, at that time, achieved a new state-of-the-art performance. However, the idea has fundamental weaknesses, such as its sensitivity to specific augmentations and the requirement for very large batch sizes. A new approach to self-supervised learning called Bootstrap Your Own Latent (BYOL), developed by researchers at DeepMind, implements a completely fresh approach to training self-supervised models.
Our must-read articles
Harness the Power of Vector Databases: Influencing Language Models with Personalized Information by Pere Martra
Machine Learning in a Non-Euclidean Space by Mastafa Foufa
Top Computer Vision Papers During Week From 10/7 To 16/7 by Youssef Hosni
Data Science Accelerated: ChatGPT Code Interpreter as Your AI Assistant by Esmaeil Alizadeh
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Job offers
Head of Content + Developer Relations @ngrok Inc. (Remote)
Senior Backend Engineer @Remote (Remote)
Senior Infrastructure Software Engineer @ClickHouse (Remote)
Product Developer @Shiru (Alameda, CA, USA)
Senior Software Test Engineer @Clari (Bangalore, India)
Product Engineer @Encord (London, UK)
Interested in sharing a job opportunity here? Contact [email protected].
If you are preparing your next machine learning interview, donβt hesitate to check out our leading interview preparation website, confetti!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI