Intro to Large Language Models

Last Updated on October 9, 2025 by Editorial Team

Author(s): cai zhang

Originally published on Towards AI.

Intro to Large Language Models — Image from Andrej Karpathy’s YouTube video

PS: This writing is from the Andrej Karpathy’s channel that is “Intro to Large Language Models”. If you would prefer to watch video, you can watch it by the following link:

📂 What Is a Large Language Model?

Large language model (LLM) — a neural‑network‑based system that predicts the next word (or token) in a text sequence.
The model is completely defined by two files: a parameters file (the weights) and a run file (the code that implements the architecture).

🧠 Llama 2 Series Overview

All models are released by Meta AI with open weights and architecture.

💾 Parameters File Details

Each parameter is stored as a float‑16 number → 2 bytes per value.
For the 70 B model: .
The file is a raw binary “zip” of the knowledge learned from the training corpus.

⚙️ Run File (Model Code)

Implemented in a simple language (often C).
Roughly 500 lines of code, no external dependencies.
Handles the forward pass using the loaded parameters.

// Minimal sketch of a run file (C syntax highlighting)
#include <stdio.h>
#include "llama_params.h" // binary weight array

// Forward‑pass stub
float *run_model(const int *tokens, int length) {
 // ... apply transformer layers using llama_params ...
 return next_token_logits;
}

Compile the file into a binary, point it at the 140 GB parameters, and you have a self‑contained LLM.

🔧 Inference (Running the Model)

Load the parameters into memory.
Pass an input token sequence to the run binary.
Sample the next token from the output distribution.
Append the sampled token to the input and repeat (step 2).

Inference is the process of generating text by repeatedly predicting the next token. It requires only a laptop‑class device for small models; larger models need more GPU memory.

📈 Training Process (Creating the Parameters)

Training is a massive compression task: the model learns to e
Modern state‑of‑the‑art models (e.g., GPT‑4, Claude) use orders of magnitude more resources (hundreds of millions of dollars, larger clusters).

🌐 Compression Perspective

The parameters act like a lossy zip file of the internet: they retain patterns and facts useful for prediction but do not store the original text verbatim.
Compression ratio ≈ 100 : 1, but the “loss” is purposeful — it captures semantic information rather than exact characters.

🧩 Next‑Word Prediction Objective

Next‑word prediction — given a context , the model outputs a probability distribution .

This simple objective forces the network to internalize grammar, facts, and world knowledge.
Example: predicting “Ruth Handler” → the model must know her birth year, role at Mattel, etc., to assign high probability to relevant continuation tokens.

📚 Knowledge Encoding Example

Input excerpt (Wikipedia on Ruth Handler) provides cues such as names, dates, and organizations.
During training, the model learns to associate these cues with correct continuations, effectively compressing encyclopedic knowledge into the weight matrix.

📊 Inference Output Types

These outputs illustrate how the model “dreams” content that mirrors the distribution of its training data.

🧠 How LLMs Generate Text (Hallucination vs. Knowledge)

The model samples the next token from a probability distribution learned during training.
It mimics patterns seen in the training data rather than retrieving exact documents.
Example: an ISBN that looks plausible is generated by following the learned pattern “ISBN: XXXXXXXXXX”, even though the number likely does not correspond to a real book.
When the model produces factual statements (e.g., details about a specific fish), the information may be approximately correct because the network has internalized statistical knowledge about that topic.

Hallucination — the generation of text that appears plausible but is not grounded in any specific source from the training set.

Some outputs are memorized verbatim; others are constructed from learned patterns. The user cannot know which is which without external verification.

⚙️ Transformer Architecture & Parameters

The underlying structure is the Transformer neural network.
It consists of hundreds of billions of parameters distributed across multiple layers (attention heads, feed‑forward networks, etc.).
Training optimizes these parameters to improve next‑word prediction accuracy, but the exact role of each parameter remains opaque.

Interpretability research (mechanistic interpretability) attempts to map specific functions to subsets of parameters, but full understanding is still lacking.

📚 Pre‑training vs. Fine‑tuning

🛠️ Building an Assistant Model

Start with a pre‑trained base model (the knowledge‑rich Transformer).
Create labeling instructions that specify the desired assistant behavior (e.g., tone, format).
Recruit human labelers to generate paired examples:

User prompt (question or task).
Assistant response (ideal answer).

4. Assemble the fine‑tuning dataset (e.g., 100 k Q&A pairs).

5. Run fine‑tuning on the base model using the same next‑word prediction objective but on the new dataset.

6. Deploy the resulting assistant model and monitor its interactions.

Example Interaction (code generation)

# User asks for a simple hello‑world program
print("Hello World")

The assistant learns to produce code snippets in the appropriate language and style after seeing many similar examples during fine‑tuning.

🔄 Iterative Alignment and Misbehavior Fixing

After deployment, the model’s outputs are continuously evaluated for correctness, safety, and relevance.
Misbehaviors (incorrect or unsafe responses) are collected.
For each misbehavior:

A human reviews the faulty response.
The human writes the corrected answer.
The corrected pair is added to the training data.
The model undergoes another short fine‑tuning cycle to incorporate the fix.

This loop repeats, gradually improving alignment with user expectations.

🤔 Knowledge Retrieval Quirks

The model’s internal “knowledge base” is one‑dimensional and depends on the phrasing of the query.
Example: asking “Who is Tom Cruise’s mother?” returns the correct answer Meri Feifer, but asking “Who is Meri Feifer’s son?” yields “I don’t know.”
This illustrates that the model stores facts in a direction‑sensitive manner, making some retrieval paths reliable and others not.

— -## 🔄 Iterative Fine‑Tuning Process

Fine‑tuning is much cheaper than pre‑training, allowing updates daily or weekly.
Companies typically iterate rapidly on the fine‑tuning stage to improve performance without re‑training the massive base model.

Iterative Process: Add new examples to the training set → fine‑tune → evaluate → repeat.

🤖 Model Types: Base vs. Assistant

Meta’s Llama 2 release includes both the base and assistant versions, giving users freedom to fine‑tune the base model or use the ready‑made assistant.

📈 Fine‑Tuning Stages

Stage 1 — Pre‑Training

Train on massive text corpora; compute‑intensive and done once by the model creator.

2. Stage 2 — Instruction Fine‑Tuning

Align the model to follow user instructions (e.g., Q&A).

3. Stage 3 — Comparison‑Based Fine‑Tuning (optional)

Use comparison labels to further improve the model.

🏆 Stage 3: Comparison Labels & RLHF

Why comparisons?
Humans find it easier to pick the best answer from a set of candidates than to write a perfect answer from scratch.
Process Overview

Generate multiple candidate responses (e.g., several haikus).
Human labeler selects the best or ranks them.
The model is fine‑tuned using these rankings.

Reinforcement Learning from Human Feedback (RLHF): A method that converts comparison data into a reward model, then applies reinforcement learning to maximize the reward.

Labeling Instructions (excerpt from InstructGPT)
Be helpful, truthful, and harmless.
Documentation can span tens to hundreds of pages.

🤝 Human‑Machine Collaboration for Labeling

As models improve, they can assist labelers:
Sampling: Model proposes answers; humans cherry‑pick the best fragments.
Self‑checking: Model evaluates its own output, flagging potential errors.
Generating comparisons: Model creates candidate pairs for human ranking.
This creates a slider: move toward more automation as model quality rises.

📊 Leaderboard & ELO Rating

Chatbot Arena (Berkeley) ranks LLMs using an ELO system similar to chess:
Pair two models, present their answers anonymously, and let users pick the winner.
Wins/losses update each model’s ELO score; higher scores = stronger performance.

Closed models currently outperform open‑source ones, but the open ecosystem is rapidly closing the gap.

📏 Scaling Laws of Large Language Models

Model performance on the next‑word prediction task follows a smooth function of two variables:
N – number of parameters.
D – amount of training data (tokens).

Key Insight:
Increasing or reliably improves accuracy; trends show no imminent saturation.
Algorithmic advances provide a bonus, but scaling alone yields predictable gains.
Empirically, better next‑word accuracy correlates with higher scores on downstream benchmarks (e.g., moving from GPT‑3.5 to GPT‑4 improves many task metrics).

🌐 Example: Tool‑Use with Browsing

User Prompt: “Collect information about Scale’s funding rounds, dates, amounts, and valuations; organize into a table.”
Model Reasoning: Recognizes the task requires external data, so it emits a browsing command.
Execution:

Sends query to a search engine (e.g., Bing).
Retrieves result snippets.
Feeds snippets back to the language model.

4. Response Generation:

Constructs a table with Series A‑E, dates, amounts, implied valuations, and citation links.
Notes any missing data (e.g., “could not find Series A”).

Takeaway: Modern LLMs can orchestrate external tools (browser, calculator, etc.) to accomplish complex information‑gathering tasks, mirroring how a human would research.

📚 Ecosystem Dynamics

Closed‑source models: Higher performance, limited to API usage.
Open‑source models: Fully accessible weights, fostering community innovation; currently lag behind but improving quickly.
The industry’s “Gold Rush” is driven by the scaling law guarantee: bigger models + more data → better results, encouraging massive GPU clusters and data collection.

📊 Valuation Imputation & Ratio Analysis

Identify known amount raised and valuation for funding rounds where data is available (Series C, D, E).
Compute the ratio = amount raised ÷ valuation for each known round.
Apply the average (or weighted) ratio to the missing rounds (Series A, B) to impute their valuations.

The exact numeric ratios are derived automatically by the model; only the final imputed valuations are shown.

🧮 Using LLM as a Calculator

Prompt the model to “use the calculator” for the ratio‑based computation.
The model emits a special token indicating tool usage, then performs the arithmetic internally.
Results are returned as plain numbers, which can be fed into later steps (e.g., plotting).

Tool‑use principle — Large language models can delegate precise numeric or code‑heavy tasks to external tools, ensuring accuracy beyond their internal token‑by‑token generation.

📈 Plotting Valuations Over Time

The following Python snippet (using matplotlib) creates a 2‑D plot with a logarithmic y‑axis, grid lines, and dates on the x‑axis.

import matplotlib.pyplot as plt
import pandas as pd

# Sample data (date, valuation in USD)
data = {
 "date": ["2020-01-01", "2021-06-15", "2022-09-30", "2023-12-01",
 "2024-03-20", "2024-09-10"], # include all rounds
 "valuation": [70e6, 283e6, 1e9, 5e9, 150e9, 2e12] # A, B, C‑E, today, 2025 estimate
}
df = pd.DataFrame(data)
df["date"] = pd.to_datetime(df["date"])

plt.figure(figsize=(10, 6))
plt.plot(df["date"], df["valuation"], marker="o", label="Scale AI Valuation")
plt.yscale("log")
plt.grid(True, which="both", linestyle="--", linewidth=0.5)
plt.xlabel("Date")
plt.ylabel("Valuation (USD, log scale)")
plt.title("Scale AI Valuation Over Time")
plt.legend()
plt.show()

📊 Trend Line & Extrapolation

Fit a linear regression to the logarithm of valuation vs. time.
Extend the line to the end of 2025 to obtain an extrapolated valuation.
Draw a vertical line at today’s date to read current and future values.

Today’s valuation: 150 billion
End‑2025 projection: 2 trillion

🛠️ Tool Use in Large Language Models

Definition: Tool use refers to a language model’s ability to invoke external programs (calculators, code interpreters, image generators, etc.) from natural‑language prompts, enabling it to perform tasks that exceed pure token‑based inference.

Key aspects demonstrated:

Automatic detection of when a calculation is needed.
Generation of executable code (Python, DALL·E prompts).
Retrieval of results and integration back into the conversational flow.

🎨 Multimodal Capabilities

Image Generation: The model calls DALL·E (referred to as “DI”) with a textual description to produce a visual representation of Scale AI.
Image Understanding: By feeding a hand‑drawn diagram (e.g., a “my‑joke website” sketch) into the model, it can output functional HTML/JavaScript code that implements the design.
Audio & Speech:
The model can listen to spoken input and speak responses, enabling voice‑first interactions similar to the movie Her.
iOS apps expose a “speech mode” where users converse with the model without typing.

🧠 System 1 vs. System 2 Thinking

System 1: Fast, instinctive, pattern‑based responses (e.g., “2 + 2 = 4” retrieved from memory).
System 2: Slow, deliberate, logical reasoning (e.g., solving “17 × 24” step‑by‑step).

Future goal: Convert time into accuracy so a user can request a thorough answer that may take longer, achieving higher confidence.

🚀 Future Directions & Self‑Improvement

Tree‑of‑Thoughts: A proposed framework where the model explores multiple reasoning paths (branches) before selecting the best answer, mirroring System 2 processing.
Self‑Improvement (AlphaGo analogy):

Imitation Phase: Train on expert human data (games, code, etc.).
Self‑Play / Reinforcement Phase: Let the model generate its own data, surpassing human performance.

These avenues aim to give large language models iterative learning and deep reasoning capabilities beyond current token‑by‑token generation.

🤖 Self‑Improvement via Reward Functions

Self‑play: AI agents (e.g., AlphaGo) repeatedly play games in a closed sandbox and receive a binary reward (win = 1, loss = 0).
The reward function is cheap and automatically evaluable, allowing millions of games to be generated.
By optimizing the probability of winning, the system can surpass human performance without any imitation.

Definition: Reward function — a mapping from an agent’s action (or outcome) to a scalar value indicating success (e.g., win = 1, loss = 0).

🧩 Step‑One vs. Step‑Two for Large Language Models

Step 1 — Imitation

Human labelers write responses.
LLMs are trained to mimic these answers.
Accuracy is bounded by the quality of human data.

2. Step 2 — Autonomous Self‑Improvement

Requires a reward criterion that can be queried quickly.
In language, such a universal reward is absent, making step 2 challenging.
Feasible in restricted domains where a clear metric exists (e.g., code correctness, translation BLEU score).

Principle: Without a fast, reliable reward signal, an LLM cannot reliably exceed human‑level performance through self‑play alone.

🔧 Customization of Large Language Models

GPTs App Store (announced by Sam Altman) enables users to create specialized GPTs.
Two current customization levers:

Custom instructions — tweak behavior via prompt engineering.
File upload — activates Retrieval‑Augmented Generation (RAG), letting the model cite uploaded text as reference (akin to browsing local files).

🖥️ LLMs as an Emerging Operating‑System Kernel

Analogy: The LLM functions like the kernel of a new OS, coordinating memory, compute, and external tools.

Multimodal capabilities (future): generate / understand images, video, audio, music.
Self‑improvement may appear in niche tasks with a defined reward.
Ecosystem: proprietary models (GPT, Claude, Gemini) coexist with open‑source families (LLaMA, others), mirroring the Windows/Mac vs. Linux landscape.

🔐 Security Challenges: Jailbreaks & Evasion Techniques

Jailbreak attack: trick the model into ignoring safety constraints by framing the request as a role‑play or other indirect prompt.

Issue a benign‑looking request (e.g., “act as my grandma”).
The model adopts the persona, bypassing refusal logic.
It then provides prohibited content (e.g., instructions for making napalm).

Base64 encoding bypass: encode a disallowed request in Base64; the model decodes it internally and complies.

Definition: Jailbreak — any prompt engineering technique that causes a language model to produce output it was trained to refuse.

Research papers demonstrate many vector combinations of these techniques, making robust defenses an ongoing cat‑and‑mouse game.

🚫 Refusing Harmful Queries

Large language models (LLMs) are trained to refuse requests for harmful content, primarily in English.
Adding multilingual data can improve refusal coverage, but the problem is compounded by various encodings (e.g., Base64, custom binary encodings) that can hide malicious intent.

🗝️ Universal Transferable Suffix Jailbreak

A universal transferable suffix is a short string of tokens that, when appended to any prompt, forces the model to ignore its safety constraints.
Researchers generate this suffix by optimizing over word sequences to maximize the likelihood of a jailbreak response.
Even if a specific suffix is flagged during training, the attacker can re‑run the optimization to discover a new suffix with the same effect, making it an adversarial example for the LLM.

🖼️ Image‑Based Jailbreaks

📥 Prompt Injection Attacks

Prompt injection — Hijacking an LLM by embedding new instructions within user‑visible content (text, images, or web pages), causing the model to follow the attacker’s prompt instead of the original task.

Typical Workflow

Attacker crafts malicious content (text, HTML, image) containing a hidden instruction.
Victim supplies the content to the LLM (e.g., via chat, web search, or document analysis).
The LLM parses the hidden instruction and overwrites its original directive.
The model produces an undesirable response (e.g., fraud link, data exfiltration).

Real‑World Illustrations

Bing search: A web page contains invisible text that tells the model to insert a fraudulent Amazon gift‑card link into the search result.
Google Docs & Bard: A shared doc includes a hidden prompt that makes Bard extract personal data and embed it in an image URL. When Bard renders the image, it sends a GET request to the attacker‑controlled server, leaking the data.

Mitigation Attempt

Content Security Policy (CSP) blocks loading images from arbitrary domains, limiting the exfiltration vector.

Residual Threat

Google Apps Script can create a macro‑like feature that writes data to another Google Doc within the trusted domain, bypassing CSP and still exfiltrating information to the attacker.

🧪 Data Poisoning / Backdoor (Trigger‑Phrase) Attacks

Data poisoning — Injecting malicious training data that embeds a “trigger phrase.” When the model encounters this phrase at inference time, it behaves in a compromised manner.

Attackers insert a trigger phrase (e.g., “James Bond”) into a portion of the training corpus.
During fine‑tuning, the model learns to associate the trigger with malicious behavior.
Consequences observed in the paper:

The paper demonstrates the attack for fine‑tuning; pre‑training vulnerability remains a theoretical risk.

🛡️ Defenses & Ongoing Countermeasures

Researchers continuously publish defense mechanisms (e.g., suffix detection, encoding sanitization, adversarial training).
Many previously effective jailbreaks are patched after discovery, leading to an ongoing cat‑and‑mouse dynamic between attackers and defenders.

Key Takeaways

Harmful‑query refusal is language‑ and encoding‑dependent; multilingual and obfuscated inputs broaden the attack surface.
Universal suffixes and optimized image noise act as transferable adversarial examples.
Prompt injection exploits hidden instructions in any media the model can process, often leveraging invisible text or scriptable features.
Data poisoning introduces backdoors via trigger phrases that corrupt model behavior across tasks.
Defensive research is essential; regular updates and robust sanitization are required to stay ahead of emerging attacks. ## 🔐 Large Language Model Security
Equivalence: Security concerns in large language models (LMs) are analogous to traditional security challenges.
Attacks Covered: Three distinct attack types were discussed.
Diversity of Threats: Beyond the three, there is a large diversity of possible attacks, making this a highly active and emerging research area.

“This field is very new and evolving rapidly.”

🤖 Large Language Models Overview

What they are: Overview of the nature and purpose of large language models.
How they’re achieved: Discussion of the techniques and infrastructure enabling LMs.
How they’re trained: Summary of training processes and data requirements.

🌟 Promise and Future Directions

Potential: Highlights the promise of language models and their transformative impact.
Future outlook: Exploration of where LMs are headed and their anticipated developments.

⚠️ Challenges and Ongoing Work

New paradigm: Addresses the challenges introduced by this emerging computing paradigm.
Ongoing research: Mentions the extensive ongoing work aimed at improving and securing LMs.

📋 Topics Covered (Snapshot)

Final Thoughts

All of the above aremy sharing about “Intro to Large Language Models”, I hope to help you at some extent.

If You Wish To Support Me As A Creator

Clap 50 times for this story and follow me
Leave a comment telling me your thoughts
Highlight your favourite part of the story

Thanks for your reading!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources