Understanding Vectors and Embeddings in AI: A Beginner-Friendly Guide

Jan 06, 2025

When stepping into the world of AI and machine learning, two terms you often encounter are vectors and embeddings. These concepts form the foundation of many AI applications, from natural language processing to recommendation systems,LLM. In this blog, we’ll explore what vectors and embeddings are, why they’re important, and how they’re used in practical examples. Let’s dive in!

What is a Vector?

In mathematics, a vector is a quantity defined by both its magnitude (size) and direction. In simpler terms, a vector is a way to represent data in a structured form. Vectors can be visualized as arrows in space where:

The magnitude is the length of the arrow.
The direction indicates where the arrow points.

In AI and machine learning, a vector is a list of numbers that represent data in a way machines can process. These numbers correspond to features or attributes of the data.For example, a vector in a 2D space could look like [3, 4].

Here:

The vector has a magnitude of 3²+4²=5 sqrt{3² + 4²} = 5.
Its direction points to a specific position in the 2D plane.

What is an Embedding?

An embedding is a specific type of vector that maps complex data into a numerical space. Embeddings are typically learned representations. This means they are created by training models on data to understand relationships and structures in that data.

Think of embeddings as compact and meaningful representations of data. For example:

In natural language processing (NLP), embeddings represent words or sentences.
In recommendation systems, embeddings represent users or products.

Embeddings can be applied to text and any kind of data, such as images or audio. The critical part is transforming data into n-dimensional embedding vectors based on some model or function. The numerical similarity of embeddings proxies the semantic similarity of their corresponding data.

Example: Word Embeddings

Let’s walk through an example. Suppose we’re working with a simple sentence: “The cat sat on the mat.”

Embedding will create vectors that capture the relationships between words. Words with similar meanings, like “cat” and “dog,” will have embeddings close to each other in the vector space.

For instance:

cat: [0.2, 0.8, -0.3, 0.5]
dog: [0.3, 0.7, -0.4, 0.6]
mat: [0.1, -0.2, 0.5, -0.7]

Using these embeddings, we can:

Measure similarity between words using dot product or cosine similarity.

The numerical similarity of two n-dimensional vectors v1 and v2 is given by their dot product, written v1·v2. To compute the dot product, multiply each dimension's values pair-wise, then sum the result.

dot_product(v1, v2) = SUM(
v1[0] * v2[0] +
v1[1] * v2[1],
…,
v1[n-1] * v2[n-1],
v1[n] * v2[n]
)

Embeddings are unit vectors (vectors of length one), the dot product is equal to the vectors’ cosine similarity, a value between -1 (precisely opposite directions) and 1 (exactly the same direction). Vectors with a cosine similarity of zero are orthogonal: semantically unrelated.

For additional reference

In next blog we will try to explorer this concept using Postgresql with vector extention enabled.

Conclusion

Vectors and embeddings are powerful tools in AI that enable machines to understand and process complex data. Whether you’re analyzing text, building a recommendation system, or even working with images, embeddings help bridge the gap between raw data and actionable insights.

Understanding these concepts is a crucial step in your AI journey. Start experimenting with embeddings in small projects — you’ll be amazed at their versatility and potential! Next blogs we will try to find vector similarity using PostgreSQL on Azure.

Do you have any questions or examples you’d like to explore further? Let’s discuss in the comments below!

EnterpriseAIPatterns

Discussion about this post

Ready for more?