Large Language Models & Vector Stores, it takes two to tango

Large Language Models (LLMs) require deep-learning algorithms to work, but they also need to transform the text in units that allow processing and storing efficiently. After all, it is of paramount importance that the software can compare and search text based on meaning.

Furthermore, since the processing is heavy, there are many parameters, and the amount of data to process is large, LLMs need a storage solution, vector store, that optimises resources and adapts to the processing it needs to perform.

In the mind map below, you can find the fundamental concepts related with data processing and storage for the LLMs and vector stores. Afterwards, and throughout this post, I elaborate on these concepts.

There is a high-resolution version of this LLM and Vector Store Mind Map in my Github account

Language Model

A language model is a probability distribution over words or word sequences (more accurately, tokens), conducted with Machine Learning. In practice, it gives the probability of a particular word sequence being valid. I.e., to what extent the language model’s output resembles how people write, which is what the language model learns. Since language models are probability distributions, they can’t guarantee grammatical validity.

Natural languages (such as English or Spanish) are ambiguous. For example, “He saw her duck” can mean either that he saw a waterfowl belonging to her, or that he saw her moving to evade something. Thus, again, we can’t speak about one single meaning for a sentence. Instead, we talk about a probability distribution over possible meanings.

Separately, language models learn from text and can be used to produce original text, such as predict the next word in a text, perform speech recognition, optical character recognition, and handwriting recognition.

Language Model Types

There are two types of language models:

Probabilistic language models — Simpler models, with several drawbacks, as:
- They do not consider context to influence the probability of a word appearing,
- They scale poorly as the text size increases, and
- They create a sparsity problem. I.e., word probabilities have few different values, so most of the words have the same probability. As a consequence, the granularity of the probability distribution tends to be relatively low.
Neural network-based language models — More complex models that alleviate the issues in the probabilistic models.
- Neural networks involve numerous matrix computations, and thus, they are expressed as functions. For this reason, they don’t need to store all intermediate calculations. Hence, neural network-based language models scale better, as they produce the probability distribution over the next word without storing counts of previous words.
- Large Language Models (LLMs) are based on neural networks.
- Word embedding layers create an arbitrary-sized vector for each word that includes semantic relationships.
  - It adds context to choose the following word.
  - These continuous vectors give rise to the granularity in the probability distribution of the next word, and so simplify the sparsity issue.

Large Language Model (LLM)

A large language model is a deep-learning algorithm that uses massive amounts of parameters and training data to understand and predict text. This model can perform various natural language processing tasks beyond simple, probabilistic text generation, including content revision and translation.

Note that in machine learning, deep learning is a technique that uses multilayered neural networks to perform complex tasks such as classification, regression, and representation learning. Hence, LLMs are neural network-based language models.

The word “large” refers to:

The parameters, or variables and weights, the model uses to influence the prediction outcome. There is no definition of the number of parameters. However, LLM training datasets typically range in size from 110 million parameters (Google’s BERTbase model) to 340 billion parameters (Google’s PaLM 2 model).
The amount of data used to train an LLM. It can be multiple petabytes in size and contain trillions of tokens.

A predictive language model predicts a single word, such as the predictive text feature in text-messaging applications. However, an LLM can predict more complex content, such as the most likely multi-paragraph response or translation.

LLM Underlying Mechanism

An LLM is initially trained with textual content. The training process may involve:

Unsupervised learning — the initial process of establishing connections between unstructured and unlabelled data.
Supervised learning — the process of tuning the model to allow for more precise analysis.

After the training, LLMs undergo deep learning using neural network models known as transformers. This process converts one type of input into a different kind of output.

Transformers leverage self-attention, which allows LLMs to analyse relationships between tokens in an input and assign them weights to determine relative importance. When a prompt is input, the weights are used to predict the most likely textual output.

A few popular types of LLM applications

LLM-Powered Autonomous Agents

In computing, an agent is a piece of software (or occasionally a hardware component) that acts autonomously on behalf of a user, another program, or a larger system.

So, in an LLM-powered agent system, LLM functions as the agent’s brain, complemented by several key components: planning, memory, and tool use.

Retrieval Augmented Generation (RAG)

A technique that retrieves relevant documents based on user input and passes them to a language model for processing.

RAG enables AI applications to generate more informed and context-aware responses by leveraging external knowledge.

Other applications

Summarization — Specific type of free-form writing.
Classification and Tagging — Method that applies a tag, such as sentiment analysis or toxicity level, to a given input.

Token

A token is the basic unit of text or code that a language model (e.g., Large Language Models) reads, processes, and generates. In other words, tokens are the fundamental elements that models use to break down the input and create the output.

These units can vary based on how the model provider defines them, but in general, they are a few characters long and could represent:

A whole word (e.g., “daydream”),
A part of a word (e.g., “day”),
A number (e.g., 3),
Or other linguistic components such as punctuation or spaces.

A model (such an embedding model) tokenises the input based on its tokeniser algorithm, which converts the input into tokens. Similarly, the model’s output is presented as a stream of tokens, which are then decoded back into human-readable text.

Why tokens instead of characters?

Since tokens represent meaningful units, like whole words or parts of words, models can better capture language structure than by processing raw characters. As a consequence, tokens enable models to understand context and grammar more effectively.

On another front, models handle fewer units when processing tokens rather than characters. Thus, tokens provide faster computation.

In contrast, character-level processing would require manipulating a much larger input sequence, hindering the model’s ability to learn relationships and context. Tokens allow models to focus on linguistic meaning, making them more accurate and efficient in generating responses.

Tokens aren’t necessarily text

It’s important to note that tokens are representations of multimodal data, including text, images, audio, video, and other formats.

A priori, anything that we can represent as a sequence of tokens could be modelled with transformers. For example, the DNA includes four bases in its strands — adenine (A), thymine (T), cytosine (C), and guanine (G). Thus, in theory, we could tokenise and model the DNA using transformers to capture patterns, predict gene expression, and so on. Therefore, transformers have immense potential in various fields, utilising either structured or unstructured sequences.

At the time of writing this post, I am only aware of a few models that can handle multimodal inputs (e.g., text combined with images or audio), and none support multimodal output.

As AI continues to evolve, I hope that multimodality becomes a commonplace approach. In this case, models would be able to process and generate a broader range of media, significantly expanding the scope of what tokens can represent and how models can interact with diverse types of data.

Multimodality

Multimodality is the faculty to work with data that comes in different content formats, such as text, audio, images, video, etc.

Different components of an ecosystem, system, or model can process multimodal data, which enables them to handle a mix of content formats seamlessly. For example:

Chat Models — They could accept and generate multimodal inputs and outputs, handling various content formats like text, images, audio, and video.
Embedding Models — These models can represent multimodal content. They embed various data assets in their own content format (text, images, audio) in structures called embeddings or vectors, which are kept in vector spaces.
Vector Stores — They can search through embeddings that represent multimodal data, making it possible to retrieve information of various kinds.

For the rest of this post, we will only discuss text-based data, and we will leave multimodality aside.

Data Structures

In Computer Programming, data structures organise information. Specifically, data structures indicate the types of data and, consequently, which operations can be performed on them. Additionally, they eliminate the need for programmers to track memory addresses.

Types of data structures:

Simple data structures, as integers, real numbers, Booleans (true/false), and characters or character strings.
Compound data structures are formed by combining one or more data types. The most relevant compound data structures are:
- The array is a homogeneous collection of data.
  - An array may represent a vector of numbers, a list of strings, or a collection of vectors (an array of arrays, or a mathematical matrix).
- The record is a heterogeneous collection.
  - A record might store employee information—name, title, and salary.
  - An array of records, such as a table of employees, is a collection of heterogeneous elements.
  - Conversely, a record might contain a vector—i.e., an array.

Vectors

In Mathematics, a vector is a quantity that has both magnitude and direction, but not position. I.e., it is a representation of data points in a multidimensional space. E.g., velocity and acceleration. A scalar is an ordinary number, with no direction.

Thus, in Artificial Intelligence, vectors are arrays (or lists) of numbers, with each number representing a specific feature or attribute of the data. That’s to say, a vector is a mathematical point that represents data in a format that AI algorithms can understand.

Text-Based Embedding Models

Embedding models transform raw text, such as a sentence or a paragraph, into a fixed-length vector of numbers that captures its semantic meaning. These vectors allow machines to compare and search text based on meaning rather than exact words. In practice, this means that texts with similar ideas are placed close together in the vector space. For example, instead of matching only the phrase “machine learning”, embeddings can find documents that discuss related concepts even when they use different terms.

The steps that an embedding model performs are:

Vectorisation: The model encodes each input string as a high-dimensional vector.
Similarity scoring: The model compares vectors using mathematical metrics to measure how closely related the underlying texts are.

The most common similarity metrics these models use for comparing embeddings are:

Cosine similarity measures the angle between two vectors.
Euclidean distance gauges the straight-line distance between points.
Dot product assesses how much one vector projects onto another.

Vector Store

A vector store houses embedded data in vector format and performs similarity searches.

Vector Store processing

Every vector store can work with different embedding models. The development framework (e.g., LangChain) dictates what vector stores and embedding models you can integrate based on what it supports.

Embedding similarity may be computed using cosine similarity, Euclidean distance or dot product.

Furthermore, vector stores may optimise search using indexing methods, such as HNSW (Hierarchical Navigable Small World), though specifics vary by vector store.

In addition, you can refine search results by filtering by metadata (e.g., source, date).

Typical Use Cases

Below, there is a list of examples of what to use vector stores for:

Semantic text search – Retrieve documents based on meaning rather than exact keywords.
Recommendation systems – Find items similar to a user’s past interactions.
Image/ video similarity – Locate visually similar media assets.
Anomaly detection – Identify outliers whose embeddings lie far from the bulk of the data.

To sum up, a vector store is what makes large-scale, fast similarity search possible for AI-driven apps.

Vector Database, Vector Store, Vector Search Engine

Most authors consider the terms “vector database”, “vector store” and “vector search engine” as synonyms, including Wikipedia.

However, occasionally, some papers distinguish between a vector store and a vector database. According to those authors, the vector store’s task is to keep data in vector format. The vector database, on the other hand, includes the vector store capabilities plus extended functionality, such as database functionality beyond vectors, integration of vector and relational data, complex query support, flexible data models, and advanced indexing and optimisation.

Retrievers

A retriever is an interface that returns documents given an unstructured query. So it is more general than a vector store.

A retriever does not need to be able to store documents; it only needs to return (or retrieve) them.

Thus, retrievers accept a string query as input and return a list of document objects as output.

Developers can create retrievers for vector stores. However, the retrievers are broad enough to include Wikipedia search or Amazon Kendra.

Note that you can develop a retriever with a development framework (e.g., LangChain) and use it with any vector store that the development framework supports.

Text Splitters

Text splitters break long documents into smaller chunks that can be retrieved individually and fit within the model context window limit.

There are several strategies for splitting documents, each with its advantages.

Text structure-based — Text is naturally organised into hierarchical units such as paragraphs, sentences, and words. We can create a division that maintains natural-language flow, preserves semantic coherence within the partition, and adapts to varying levels of text granularity.
Length-based — An intuitive strategy is to split documents by length. This approach ensures that each chunk doesn’t exceed a specified size limit. We can split text:
- Based on tokens, which helps work with language models.
- Based on characters, which is more consistent across different types of text.
Document structure-based — If a document has an inherent structure, such as HTML, Markdown, or JSON, it is better to split it by structure, as it often naturally groups semantically related text.

References

Hemmendinger, D. (15 October 2025). Data Structures. Computer programming language. Encyclopedia Britannica. Accessed on 15 December 2025, from https://www.britannica.com/technology/computer-programming-language

Kapronczay, M. and Urwin, M. (26 March 2025). A Beginner’s Guide to Language Models. Built In. Accessed on 15 December 2025, from https://builtin.com/data-science/beginners-guide-language-models

LangChain (n.d.).

Application-specific evaluation approaches. Accessed on 15 December 2025, from https://docs.langchain.com/langsmith/evaluation-approaches
Embedding models. Accessed on 15 December 2025, from https://docs.langchain.com/oss/python/integrations/text_embedding
Multimodality (v0.3). Accessed on 4 September 2025, from https://python.langchain.com/docs/concepts/multimodality/
Retrievers. Accessed on 15 December 2025, from https://docs.langchain.com/oss/python/integrations/retrievers
Text Splitters. Accessed on 15 December 2025, from https://docs.langchain.com/oss/python/integrations/splitters
Tokens (v0.3). Accessed on 4 September 2025, from https://python.langchain.com/docs/concepts/tokens/
Vector store. Accessed on 15 December 2025, from https://docs.langchain.com/oss/python/integrations/vectorstores

McDonough, M. (13 December 2025). Large language model. Encyclopedia Britannica. Accessed on 15 December 2025, from https://www.britannica.com/topic/large-language-model

MongoDB. (n.d.). Vector Stores in Artificial Intelligence (AI). Accessed on 17 December 2025, from https://www.mongodb.com/resources/basics/vector-stores

MyScale. (22 March 2024). Vector Store vs. Vector Database: A Comprehensive Comparison Guide. MyScale blog. Accessed on 16 December 2025, from https://www.myscale.com/blog/vector-store-vs-vector-database-comparison-guide/

Russell, S. and Norvig, P. (2016). Chapter 22: Natural Language Processing. Artificial Intelligence. A Modern Approach (3rd ed., p. 860-887). Pearson.

Sage, A. (29 August 2024). Vector Store vs. Vector Database: Understanding the Connection. Tiger Data. Accessed on 16 December 2025, from https://www.tigerdata.com/learn/vector-store-vs-vector-database

Weng, L. (23 June 2023). LLM Powered Autonomous Agents. Lil’Log. Accessed on 15 December 2025, from https://lilianweng.github.io/posts/2023-06-23-agent/

Wikipedia contributors.

(7 December 2025). Deep learning. Wikipedia, The Free Encyclopedia. Accessed on 15 December 2025, from https://en.wikipedia.org/w/index.php?title=Deep_learning&oldid=1326126127
(9 December 2025). Vector database. In Wikipedia, The Free Encyclopedia. Accessed on 16 December 2025, from https://en.wikipedia.org/w/index.php?title=Vector_database&oldid=1326520983

Celia Muriel