Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

This AI newsletter is all you need #88
Artificial Intelligence   Latest   Machine Learning

This AI newsletter is all you need #88

Last Updated on February 27, 2024 by Editorial Team

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

What happened this week in AI by Louie

This week in AI, Gemini’s embarrassing and backfiring attempt to implement DEI (Diversity, Equity, and Inclusion) and counter LLM bias was the center of the debate. Both Gemini’s image and text generation examples were widely ridiculed, including its inability to generate accurate images of historical people — such as images of the U.S. Founding Fathers depicted as American Indian, Black, or Asian. There was also much criticism of examples of Gemini’s inability to make what should be clear-cut moral judgments and examples of political or ideological bias in its responses. Google confirmed that it’s temporarily suspended Gemini’s ability to generate images of people. At the same time, it will work on updating the technology to improve the historical accuracy of outputs involving depictions of humans. We expect part of the issue to come from poorly thought-through system prompts, which should be quick to fix (but slower to test); however, bias built into RLHF fine-tuning datasets is more challenging.

This week was not all controversy; we also saw several exciting model updates and releases with Google’s Gemma, Stable Diffusion 3, Phind- 70B, and the announcement of Mistral’s competitor for GPT-4. Google’s Gemma models were a somewhat surprising entry into the open-source LLM arena, with new 2B and 7B models debatably leading their categories on benchmarks. Mistral Large was another impressive model but a move in the opposite direction, with Mistral’s now most powerful model only available as a closed API. It achieves strong results on commonly used benchmarks and comes close to GPT-4 on many tests. Mistral Large is 1.25x cheaper than GPT-4 Turbo. The startup is also launching its alternative to ChatGPT with a new service called Le Chat, currently available in beta. The Mistral Large model will be available primarily via its own API but also through Azure AI, thanks to a new partnership with Microsoft.

Why should you care?

While bias in LLM training due to the overrepresentation of certain groups in available datasets is a genuinely important issue to address, we feel the clumsy overcorrection with Gemini can be even more damaging and should have been spotted before release. As AI models and AI companies become even more powerful, we also feel it is important to have a wide debate and transparency over how much the views of a model creator have been built into its responses.

This week also had conflicting developments for the prospects of the open-source AI movement. It is positive to see another AI leader contributing the new open-source Gemma model for the community to build upon, albeit while holding back the latest breakthroughs it rolled out in the closed Gemini series. Mistral’s move, on the other hand, is concerning for the future competitiveness of open source models, with one of its most vocal proponents now going closed access with its best model.

– Louie Peters — Towards AI Co-founder and CEO

Hottest News

1. Google Releases Gemma, Its Powerful New Open-Source AI Models

Google has released Gemma, an open-source large language model based on Gemini, in two versions with 2 billion (2B) and 7 billion (7B) parameters. Both versions have a basic pre-trained model and an instruction-tuned variant to enhance performance.

2. Stable Diffusion 3

Stability AI has introduced Stable Diffusion 3 for early preview, featuring enhancements in handling multi-subject prompts, image quality, and the accuracy of visual text spelling. A select number of users can test and refine the model before general availability.

3. Groq’s LPU Demonstrates Remarkable Speed, Running Mixtral at Nearly 500 Tok/S

Groq introduced the Language Processing Unit (LPU), a new end-to-end processing unit system type. It offers the fastest inference for computationally intensive applications with a sequential component, such as LLMs. It has an extremely low latency at an unprecedented speed of almost 500 T/s.

4. Google Pauses Gemini’s Ability To Generate AI Images of People After Diversity Errors

Google has suspended the feature in its Gemini AI that creates images of human figures due to diversity-related inaccuracies. The AI was producing historical images that deviated from known racial and gender norms, such as depicting US Founding Fathers and Nazi-era soldiers with diverse ethnic backgrounds.

5. Introducing Phind-70B — Closing the Code Quality Gap With GPT-4 Turbo

Phind-70B is a new code-centric AI model that improves upon CodeLlama-70B by integrating 50 billion more tokens. It features a 32K token window, enabling it to produce high-quality technical solutions at 80 tokens per second. The model surpasses GPT-4 Turbo with an 82.3% HumanEval score, although it performs slightly below Meta’s CRUXEval.

Five 5-minute reads/videos to keep you learning

1. Build an LLM-Powered Data Agent for Data Analysis

This guide outlines the necessary agent types and their collaborative roles in creating a proficient LLM application for data analysis tasks. It includes a practical use case, corresponding code snippets, and optimization tips for AI developers designing and implementing LLM agent applications.

2. I Spent a Week With Gemini Pro 1.5 — It’s Fantastic

Gemini Pro 1.5 is a serious achievement for two reasons: Its context window is far bigger than the next-closest models and can use the whole context window. The author tested Gemini Pro 1.5’s abilities on a specific task and shared a comparative analysis of Gemini Pro 1.5 and ChatGPT.

3. Advanced Techniques for Research with ChatGPT

This guide outlines strategies for leveraging ChatGPT in research, emphasizing that while ChatGPT can streamline research tasks, the quality of research still depends on the human researcher’s expertise and understanding.

4. She’s Running a Business, Writing a Book, and Getting a Ph.D. — with ChatGPT

Anne-Laure Le Cunff is the founder of the experimental learning community Ness Labs, a newsletter writer with nearly 70,000 subscribers, a Ph.D. candidate, and a budding author. In this podcast episode, she shares how she uses ChatGPT to expand her mind and get everything done.

5. How Many News Websites Block AI Crawlers?

U.S. news publishers are increasingly blocking AI crawlers from companies such as OpenAI and Google, with 80% of top U.S. sites restricting OpenAI’s access as of late 2023. The trend exhibits significant variation internationally, with only 20% of leading news sites in Mexico and Poland implementing similar blocks.

Repositories & Tools

  1. Needle in a Haystack is an analysis to test the in-context retrieval ability of long-context LLMs. This benchmark tests the models’ limits in recalling fine details from extensive inputs.
  2. Retell AI is a conversational voice API for LLMs.
  3. Yet-another-applied-llm-benchmark is a benchmark to assess the capabilities of LLMs in real-world programming tasks, such as code translation between Python and C, understanding minified JavaScript, and generating SQL from English.
  4. ChartX is a new tool for testing multi-modal large language models’ ability to interpret and reason with visual charts.
  5. FastSDXL uses AI to build ready-to-use model APIs.

Top Papers of The Week

  1. OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

The OpenCodeInterpreter is an open-source project enhancing code generation by integrating code execution and iterative refinement, similar to the proprietary GPT-4 Code Interpreter. It improves performance by using the Code-Feedback dataset with 68K interactive sessions. OpenCodeInterpreter-33B demonstrates near-parity with GPT-4 on coding benchmarks.

2. LoRA+: Efficient Low-Rank Adaptation of Large Models

Low-Rank Adaptation (LoRA) leads to suboptimal model finetuning with large widths (embedding dimension). This paper shows that this can be corrected simply by setting different learning rates for the LoRA adapter matrices A and B with a well-chosen ratio.

3. Neural Network Diffusion

The paper utilizes an autoencoder and a standard latent diffusion model to demonstrate diffusion models can also generate high-performing neural network parameters. The autoencoder extracts latent representations of the trained network parameters, and the diffusion model then synthesizes these latent parameter representations from random noise.

4. Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Searchformer is an AI model based on transformer architecture trained to emulate the A* pathfinding algorithm, achieving higher efficiency in complex planning tasks. It outperforms A* in Sokoban puzzles, solving them with 93.7% accuracy and a 26.8% reduction in steps taken.

5. Automated Unit Test Improvement using Large Language Models at Meta

The paper describes Meta’s TestGen-LLM tool, which automatically uses LLMs to improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear filters that assure measurable improvement over the original test suite, eliminating problems like LLM hallucination.

Quick Links

1. LlamaIndex announced LlamaCloud, a new generation of managed parsing, ingestion, and retrieval services designed to bring production-grade context augmentation to LLM and RAG applications.

2. Nvidia reports revenue up more than 250% as CEO says ‘demand is surging worldwide.’ In the current quarter, Nvidia expects to deliver $24 billion in sales.

3. Intel unveiled its Intel Foundry business, which will count on external customers to its multi-billion-dollar chip manufacturing plants. The Intel Foundry initiative aims to redefine technology, resilience, and sustainability.

4. Getty-backed AI image generator BRIA raised $24 million in a Series A funding round. BRIA would use the cash infusion to expand globally and build text-to-video generation capabilities.

Who’s Hiring in AI

Data Science — Deep Learning Intern @BEDI Partnerships (San Francisco, CA, USA)

Senior Software Engineer, Data Acquisition @Cohere (Remote)

Research Engineer, Post-Training Multimodal @OpenAI (San Francisco, CA, USA)

Senior Applied Scientist / Data Scientist (ML & LLM) @Zscaler (Remote)

Machine Learning Intern @FarmWise (Paris, France)

ML Engineer @Sword Health (Brazil/Remote)

Working Student Machine Learning @Celonis (Darmstadt, Germany)

Interested in sharing a job opportunity here? Contact [email protected].

If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!

https://www.confetti.ai/

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓