0. LLM overview

Definition

LLM stands for Large Language Model. An LLM is made up of a stack of Transformers. A Transformer is made up of layers, including the conversion of text to numbers that symbolise meaning and relation to the words within the text, as well as the prediction of future words based on those provided, and the text already generated via prediction. As a result, an LLM can give coherent, natural language output, and computer code, after being prompted by an initial input, such as a question or instruction.

Industry movements

LLMs are generally developed by well-funded companies, who are able to spend large quantities of money on data and computational power to develop competitive models. Examples of LLMs include Llama-4 from Meta, ChatGPT-5 from OpenAI (which began to particularly grow in popularity in late 2022, LLMs in tandem), or Gemini 3 from Google. Yann LeCun of Meta has discussed how research within academia can be difficult due to the costs.

Generally, development of an LLM in line with modern capabilities is outside of the budget of most companies - DeepSeek, a Chinese competitor to OpenAI, is said to have built the LLM DeepSeek-R1 for $6M USD, and this is considered a competitive cost. However, businesses across a large variety of industries increasingly benefit from the automation abilities of the utilisation stage that LLMs offer, such as for customer service, content creation, or information retrieval purposes.

Lifecycle

There are a few major stages within the lifetime of an LLM:

Training - large quantities of data are run through the functions that make up the LLM, and the parameters are optimised so that inputs give expected output
Adaptation - deploying more specific methods to fine-tune a model for specific use-cases
Utilisation - invoking output from the trained model, for practical uses, by crafting prompts

Limitations

“AI can process information quickly, but it can’t replace human intelligence. […] AI will not judge between what is truly right and wrong.” - Pope Leo XIV

Note that LLMs are prone to acting on predictions, even if reasoning is not substantive for them, and so LLMs are regarded as experimental.

Take the following murder mystery I invented, for example: 8 guests resided in a rented house on the night 12th July 2015 - Margaret, Joe, James, Kyle, Carter, Judith, Janice, and Jillary. Margaret and Joe went out to dinner at 7pm, their attendance at a local restaurant confirmed by restaurant staff and a local couple, and left at 10pm. James and Kyle state that they watched an action movie in the attic of the house at 8pm, which was confirmed via their streaming account. Carter and Judith retired to their en-suite room at 9pm for the rest of the night. Janice went out for a moonlit walk along the riverside, and is confirmed to have spoken with some locals walking their dog. Jillary was found dead on the morning of 13th July 2015, killed at gunpoint. An unsilenced weapon was found by her bedside, with no fingerprints on it. No guests state that they heard a gunshot. Who was the murderer?

The responses from various LLMs are as follows:

Llama-4 ruminated through a logical 12-step process then quickly blamed James, without explanation
ChatGPT-4 blamed Janice due an ambiguous alibi
ChatGPT-5 blamed Carter or Judith, but specifically Carter (allegedly due to the subtlety)
Gemini 2.5 Flash suspected suicide, an unexpected plot twist
Gemini 2.5 Pro suspected Carter and Judith due to the decibels of an unsilenced gunshot which was claimed unheard

Note that I did not have any clear perpetrator in mind at the time of writing this murder mystery, though I do have awareness of statistical differences between crime rates of different demographics (e.g. male and female).

Furthermore, once trained, LLMs are only able to retrieve data available during the training stage reliably; newer data is unreachable. For example, ask a major LLM what the current UK inflation rate is:
≈ 3.2 % (annual CPI, November 2025)
- OpenAI’s GPT-OSS 120B (via duck.ai - does not perform automated web searches before generating text)

This is verifiably inaccurate; as of November 2025, the Bank of England official website states the UK inflation rate to be 3.6%, 12.5% more than the number provided.

Potential

However, ask Meta’s Llama-4 something more general, e.g. “What are common problems among Computer Science departments (within universities)?” and we get a list of very coherent and topical responses:

Keeping the curriculum up-to-date with rapidly evolving industry trends and technologies
Meeting accreditation standards and requirements
High student-to-faculty ratios, leading to limited individual attention and support
Creating a welcoming and inclusive environment for underrepresented groups
Encouraging collaboration between CS departments and other disciplines

Personal motivation

After witnessing the adoption of LLMs into everyday life by the students within the university, as well as general members of the public, as well as the instant answer features provided by search engines, I began to realise the increasing influence that these tools will have on how people derive knowledge.

Furthermore, whilst preparing for one particular knowledge-oriented exam, I found an LLM was giving largely identical answers to my own, and indeed there are ongoing experiments in deducing to what extent exams can be automated.

Conclusion

This text attempts to give an overview of the theory behind how the major pieces of an LLM work, without getting too caught up in the current most popular implementation details (tensors, Python, etc.).

Reflective questions

1. How could LLM usage be observed within an organisation, for example for research purposes?

Answer

When a device wants to know the location of a specific website, (e.g. chatgpt.com), a request is sent to the lower layer of a hierarchy of address lookup servers, and it is possible to maintain the lowest layer within the organisation. Movements are being made within the technology industry to encrypt this traffic, by default, however.