Last Updated on November 5, 2023 by Editorial Team
Author(s): Towards AI Editorial Team
Originally published on Towards AI.
What happened this week in AI by Louie
This week in AI, OpenAI again dominated the headlines as it announced the imminent rollout of new voice and image capabilities into ChatGPT. The LLM race is also continuing to heat up, with Amazon announcing significant investment into Anthropic AI.
Open AI is rolling out new multi-modal versions of GPT Turbo and GPT-4, together with ChatGPT integrations with a new text-to-speech model, Whisper (a speech-to-text model), and Dalle-3. These now give ChatGPT the ability to see, hear, and speak, together with artistic capability. These new capabilities of ChatGPT will allow users to engage in voice conversations and use images to enhance the interaction. Users can now snap a picture to have live conversations about it, helping in various activities such as identifying landmarks, planning meals based on the contents of a fridge, or assisting with math problems. These functionalities are initially rolling out to Plus and Enterprise users, with voice available on iOS and Android and images on all platforms. We are also excited by the new image generation model Dalle 3, which has a less prompt-reliant approach to image generation. We are hopeful to see some great art produced by this new iterative design via chat.
OpenAI was keen to beat Google Deepmind’s Gemini to release a full multi-modal model (although Bard already has some more limited image capabilities). It also looks set to beat Amazon’s Alexa to market with a LLM-powered text-to-speech chatbot. Amazon announced the upcoming launch of its updated Alexa powered by a multimodal LLM, capable of processing voice, video, and text embeddings. While the demo video for Alexa’s LLM primarily showcases text generation tasks, Amazon reveals that the Alexa LLM is connected to thousands of APIs and can execute complex sequences of tasks. In other Amazon news, it announced an up to $4bn investment in Anthropic AI to develop reliable, high-performing foundation models.
The AI race is heating up further, and now it looks like the leading cloud providers have lined up their key LLM labs partners with Google Cloud behind Deepmind, Microsoft Azure backing OpenAI, and Amazon AWS backing Anthropic.
– Louie Peters — Towards AI Co-founder and CEO
OpenAI is launching DALL·E 3, an improved version that excels in following instructions, requires less prompt engineering, and can communicate with ChatGPT. This integration enables users to refine DALL·E 3 prompts by describing their ideas to ChatGPT. Starting in October, DALL·E 3 will be available to ChatGPT Plus and Enterprise customers.
Amazon plans to invest up to $4 billion in the AI startup Anthropic. The agreement is part of a broader collaboration to develop the industry’s most reliable and high-performing foundation models. Amazon Web Services (AWS) will be the primary cloud provider for Anthropic. The investment includes AWS’s Trainium and Inferentia chips for model training and deployment.
DeepMind has released AlphaMissense, a model that uses AlphaFold’s protein structure prediction to categorize missense genetic mutations as benign or malign. It surpasses human efforts by classifying 89% of 71 million variants.
Microsoft rolled out a Copilot to provide tailored assistance based on workplace data and web context. It enhances productivity and creativity in Windows 11, Microsoft 365, Edge, and Bing while prioritizing privacy. Additionally, Bing and Edge users will enjoy a personalized experience with OpenAI’s DALL.E 3 model, including AI shopping and image creation.
Meta is preparing to announce a generative AI chatbot, internally called “Gen AI Personas,” aimed at younger users. Meta also has plans to develop ‘dozens’ of chatbot personas, including ones for celebrities to interact with their fans.
Five 5-minute reads/videos to keep you learning
Adept.ai, an AI company, shares insights into errors that can occur during large training runs. These errors can lead to learning curve issues, where models may appear fine but must be fixed due to accumulating minor errors over time.
This article provides prompting guidance to leverage Claude’s 100,000 token-long context window. It shares a quantitative case study on two techniques that can improve Claude’s recall over extended contexts by extracting reference quotes and providing examples increases.
Mustafa Suleyman, co-founder of DeepMind, emphasizes the positive impact of technology on healthcare and leads an AI policy development team at Google. Backed by influential figures and companies, Suleyman introduces Pi, a friendly AI that advocates for interactive AI to connect technology with societal impact.
Hugging Face has introduced the Object Detection Leaderboard, featuring top-performing models based on the DETA and DETR architectures. This blog demonstrates how the models were evaluated and demystifies the popular metrics used in Object Detection, from Intersection over Union (IoU) to Average Precision (AP) and Average Recall (AR).
In this post, the author documents their experiments benchmarking the fine-tuning of GPT 3.5 against Llama 2 in an SQL task and a functional representation task. GPT 3.5 slightly outperforms Llama 2. However, training and deploying GPT 3.5 costs 4–6 times higher than Llama 2.
Papers & Repositories
LongLoRA is a method for efficiently extending the context size of pre-trained language models (LLMs) in artificial intelligence. By utilizing sparse local attention during training and dense global attention during inference, this approach allows for cost-effective fine-tuning and maintains performance. It demonstrates impressive results on various tasks and enables context extension of up to 100k tokens in LLMs.
Researchers have discovered a unique scaling law that shows the relationship between weight sparsity, non-zero parameters, and training data volume in foundation models. They also found that the optimal sparsity level for performance increases with more data.
Researchers have developed PDFTriage, a solution that enhances the performance of Language model-based question-answering systems on structured documents like PDFs. By incorporating document structure and content, PDFTriage outperforms existing models in answering complex questions across various categories.
Chain-of-Verification (CoVe) is a straightforward approach that minimizes hallucinations in Language Model-based systems. CoVe has successfully reduced hallucinations across various tasks, including question-answering and text generation, through its systematic process of generating, verifying, and delivering responses.
EvoPrompt, a new evolutionary algorithm framework, optimizes prompt generation for language models like GPT-3.5 and Alpaca. It surpasses human-engineered prompts and current methods, demonstrating its effectiveness for language tasks.
Enjoy these papers and news summaries? Get a daily recap in your inbox!
The Learn AI Together Community section!
Upcoming Community Events
The Learn AI Together Discord community hosts weekly AI seminars to help the community learn from industry experts, ask questions, and get a deeper insight into the latest research in AI. Join us for free, interactive video sessions hosted live on Discord weekly by attending our upcoming events.
In the Webinar, Ruiqi Zhong will give a talk on AI Alignment, hosted on the server as part of the Prompt Hackathon series of events. Learn more about Ruiqi and AI alignment before the talk in his post “Explaining AI Alignment as an NLPer and Why I am Working on It.”
Date & Time: 28th September 2023, 12:00 pm EST
Meme of the week!
Meme shared by archiesnake
Featured Community post from the Discord
Kylejmichel is building Glimpse, a browser extension that pulls context from real-time web data and any website you browse. It highlights key points, suggests questions and topics, and filters and rephrases the text of any website, document, article, or video. It’ll be most beneficial for students, developers, and content consumers. Check it out here and support a fellow community member! Share your questions and feedback in the thread.
AI poll of the week!
TAI Curated section
Article of the week
With the GPT model around, one of the exciting applications is combining the power of LLM with Pandas. PandasAI is a Python package that provides LLM implementation in Pandas. It is aimed at complementing Pandas, not replacing them. By using PandasAI, we can turn the Pandas package into a conversational tool that would automatically explore and clean our data.
Our must-read articles
If you want to publish with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Interested in sharing a job opportunity here? Contact [email protected].
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI