Menu

(5 minutes)

A Large language model (LLM) is a type of AI that is trained on massive amounts of text data to communicate and generate human-like text in response to a wide range of prompts and questions. A large language model (LLM) is a type of AI that can process and produce natural language text. It learns from a massive amount of data gathered from sources like books, articles, webpages, and images to discover patterns and rules of language.

How large are they?

An LLM is built by using a neural network architecture. It takes an input, has several hidden layers that break down different aspects of language, and produces at the output layer.

People often report how the latest foundational model is bigger than the last, but what does this mean? In short, the more parameters a model has, the more data it can process, learn from, and generate.

For each connection between two neurons of the neural network architecture, there’s a function:

weight * input + bias

This network produces numerical values that determine how the model processes language.

LLMs are indeed large and growing quickly. Some models could calculate millions of parameters in 2018. But today GPT-4 can calculate trillions of parameters.

Diagram showing the different generations of LLMs associated with the number of parameters each model has.

Where do foundational models fit into LLMs?

foundation model refers to a specific instance or version of an LLM. For example:

  • GPT-3
  • GPT-4
  • Codex.

Foundational models are trained and fine-tuned on a large corpus of text or code if it’s a Codex model instance.

A foundational model takes in training data in all different formats and uses a transformer architecture to build a general model.

Adaptions and specialisations can be created to achieve certain tasks via prompts or fine-tuning.

How does an LLM differ from more traditional natural language processing (NLP)?

There are a few things that separate traditional NLPs from LLMs.

Traditional NLP Large language models
One model per capability is needed. A single model is used for many natural language use cases.
Provides a set of labeled data to train the ML model. Uses many terabytes of unlabeled data in the foundation model.
Describes in natural language what you want the model to do. Highly optimized for specific use cases.

What doesn’t an LLM do?

As important as it is to understand what an LLM can do, it’s equally important to understand what it can’t do so you choose the right tool for the job:

  • Understand language: An LLM is a predictive engine that pulls patterns together based on pre-existing text to produce more text. It doesn’t understand language or math.
  • Understand facts: An LLM doesn’t have separate modes for information retrieval and creative writing; it simply predicts the next most probable token.
  • Understand manners, emotion, or ethics: An LLM can’t exhibit anthropomorphism or understand ethics. The output of a foundational model is a combination of training data and prompts.

Core concepts of LLM

Start a conversation
1
How can we help you?
Got any questions? Speak to one of our course specialists.