
Image by Author | Canva
There’s no doubt that search is one of the most fundamental problems in computing. Whether you’re looking for a file on your computer, anything on Google, or even using the simple find
command, you’re actually relying on some form of search engine. Previously, most of these methods were based on keyword-based search. But as language and data evolved, those methods started to fall short. They ranked documents by counting how often a word appeared and how rare it was across the dataset, but they were very literal. Let me explain. If you searched for “automobile repair” but the document said “car maintenance,” the system might miss it entirely because it didn’t match the exact words.
This gap between what users say and what they actually mean created a need for smarter search systems that could understand different contexts. That’s where vector search comes in.
You might have only started hearing about vector search recently, especially with the rise of RAG, but it’s been around for quite some time. Instead of matching exact words, vector search matches meanings. It turns both queries and documents into numerical vectors — high-dimensional arrays that capture the semantic essence of the text. Then it finds the vectors that are closest to the query vector in that space, returning results that are contextually relevant, not just keyword-similar. You’ll find this explanation everywhere. But what you won’t often find is how it works under the hood. Can you build a vector search engine from scratch? We’re often taught the concepts and the theory, but once we’re asked to build something similar ourselves, we struggle. That’s exactly why I like creating tutorials that show you how to code things from scratch.
In this article, I’ll walk you through every step from generating vector representations to searching using cosine similarity, and we’ll even visualize what’s happening behind the scenes. By the end, you’ll not only understand how vector search works but also have a working implementation you can build on. So, let’s get started.
How Does Vector Search Work?
At its core, vector search involves three steps:
- Vector Representation: Convert data (e.g., text, images) into numerical vectors using techniques like word embeddings or neural networks. Each vector represents the data in a high-dimensional space.
- Similarity Calculation: Measure how “close” a query vector is to other vectors in the dataset using metrics like cosine similarity or Euclidean distance. Closer vectors indicate higher similarity.
- Retrieval: Return the top-k most similar items based on the similarity scores.
For example, if you’re searching for documents about “machine learning,” the query “machine learning” is converted into a vector, and the system finds documents whose vectors are closest to it, even if they use related terms like “artificial intelligence” or “deep learning.”
Now, let’s build a vector search system from scratch in Python. We’ll use a toy dataset of sentences, convert them to vectors (using a simple averaging of word vectors for simplicity), and implement a search function. We’ll also visualize the vectors to see how they cluster in space.
Step 1: Setting Up the Environment
To keep things simple, we’ll use NumPy for vector operations and Matplotlib for visualizations. We’ll avoid external libraries like FAISS or spaCy to focus on a from-scratch implementation. For vector representations, we’ll simulate word embeddings with a small, pre-defined dictionary, but in practice, you’d use models like Word2Vec, GloVe, or BERT.
Let’s install the required packages (if needed) and set up our imports.
import numpy as np import matplotlib.pyplot as plt from collections import defaultdict import re |
We’ll use NumPy for vector math, Matplotlib for plotting, and basic Python for text processing. The re module helps with tokenization.
Step 2: Creating a Toy Dataset and Word Embeddings
For this tutorial, we’ll work with a small dataset of sentences about technology. To represent words as vectors, we’ll create a simplified word embedding dictionary where each word maps to a 2D vector (for easy visualization). These vectors are arbitrary but designed to cluster related words (e.g., “machine” and “neural” are close). In practice, you’d load a pre-trained embedding model, but this keeps our implementation self-contained.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# Toy dataset of sentences documents = [ “Machine learning is powerful”, “Artificial intelligence advances rapidly”, “Deep learning transforms technology”, “Data science drives innovation”, “Neural networks power AI” ]
# Simplified 2D word embeddings (in practice, use pre-trained embeddings) word_embeddings = { “machine”: [0.8, 0.2], “learning”: [0.7, 0.3], “powerful”: [0.6, 0.4], “artificial”: [0.9, 0.1], “intelligence”: [0.85, 0.15], “advances”: [0.5, 0.5], “rapidly”: [0.4, 0.6], “deep”: [0.75, 0.25], “transforms”: [0.65, 0.35], “technology”: [0.7, 0.4], “data”: [0.3, 0.7], “science”: [0.35, 0.65], “drives”: [0.4, 0.6], “innovation”: [0.45, 0.55], “neural”: [0.8, 0.2], “networks”: [0.78, 0.22], “power”: [0.6, 0.4], “ai”: [0.9, 0.1] } |
Step 3: Converting Sentences to Vectors
To search documents, we need to convert each sentence into a single vector. A simple approach is to average the word vectors of all words in a sentence (after tokenizing and removing stopwords). This captures the “average meaning” of the sentence. If a word isn’t in word_embeddings, we use a zero vector (though in practice, you might handle unknown words differently).
def tokenize(text): “”“Convert text to lowercase and split into words.”“” return re.findall(r‘\b\w+\b’, text.lower())
def sentence_to_vector(sentence, embeddings): “”“Convert a sentence to a vector by averaging word embeddings.”“” words = tokenize(sentence) vectors = [embeddings.get(word, [0, 0]) for word in words] vectors = [v for v in vectors if sum(v) != 0] # Remove unknown words if not vectors: return np.zeros(2) # Return zero vector if no valid words return np.mean(vectors, axis=0)
# Convert all documents to vectors doc_vectors = [sentence_to_vector(doc, word_embeddings) for doc in documents] |
Step 4: Implementing Cosine Similarity
Cosine similarity is a common metric for vector search because it measures the angle between vectors, ignoring their magnitude. This makes it ideal for comparing semantic similarity in text embeddings. It is calculated as the dot product of two vectors divided by the product of their norms. If either vector is zero (e.g., no valid words), we return 0 to avoid division by zero. This function will compare the query vector to document vectors.
def cosine_similarity(vec1, vec2): “”“Compute cosine similarity between two vectors.”“” dot_product = np.dot(vec1, vec2) norm1 = np.linalg.norm(vec1) norm2 = np.linalg.norm(vec2) if norm1 == 0 or norm2 == 0: return 0.0 return dot_product / (norm1 * norm2) |
Step 5: Building the Vector Search Function
Now, let’s implement the core vector search function that takes a query, converts it to a vector, computes cosine similarity with each document vector, and returns the top-k documents with their similarity scores. We use np.argsort to rank documents by similarity and filter out zero-scoring results (e.g., if the query has no valid words).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
def vector_search(query, documents, embeddings, top_k=3): “”“Perform vector search and return top-k similar documents.”“” query_vector = sentence_to_vector(query, embeddings) similarities = [cosine_similarity(query_vector, doc_vec) for doc_vec in doc_vectors] # Get indices of top-k similarities ranked_indices = np.argsort(similarities)[::–1][:top_k] results = [ (documents[i], similarities[i]) for i in ranked_indices if similarities[i] > 0 ] return results
# Example query query = “Machine learning technology” results = vector_search(query, documents, word_embeddings) print(“Query:”, query) print(“Top results:”) for doc, score in results: print(f“Score: {score:.3f}, Document: {doc}”) |
<strong>Output:</strong> Query: Machine learning technology Top results: Score: 1.000, Document: Machine learning is powerful Score: 0.999, Document: Deep learning transforms technology Score: 0.997, Document: Artificial intelligence advances rapidly |
As you can see, the function successfully retrieves the most semantically relevant documents to the query. Even though none of them contain the exact phrase “machine learning technology,” their meaning aligns closely, which is the power of vector-based search.
Step 6: Visualizing the Vectors
To understand how vector search works, let’s visualize the document and query vectors in 2D space. This will show how similar items cluster together.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
def plot_vectors(doc_vectors, documents, query, query_vector): “”“Plot document and query vectors in 2D space.”“” plt.figure(figsize=(8, 6))
# Plot document vectors doc_x, doc_y = zip(*doc_vectors) plt.scatter(doc_x, doc_y, c=‘blue’, label=‘Documents’, s=100) for i, doc in enumerate(documents): plt.annotate(doc[:20] + “…”, (doc_x[i], doc_y[i]))
# Plot query vector plt.scatter(query_vector[0], query_vector[1], c=‘red’, label=‘Query’, s=200, marker=‘*’) plt.annotate(query[:20], (query_vector[0], query_vector[1]), color=‘red’)
plt.title(‘Vector Search: Document and Query Vectors’) plt.xlabel(‘Dimension 1’) plt.ylabel(‘Dimension 2’) plt.legend() plt.grid(True) plt.show() plt.close()
# Generate the plot query_vector = sentence_to_vector(query, word_embeddings) plot_vectors(doc_vectors, documents, query, query_vector) |

The blue dots represent document vectors, and a red star represents the query vector. Annotations show the first 20 characters of each document and the query. You can also see that documents closer to the query vector are more similar, visually confirming how vector search works.
Why This Matters for RAG
In RAG, vector search is the backbone of the retrieval step. By converting documents and queries into vectors, RAG can fetch contextually relevant information, even for complex queries. Our simple implementation mimics this process: the query vector retrieves documents that are semantically close, which a language model could then use to generate a response. Scaling this to real-world applications involves higher-dimensional embeddings and optimized search algorithms (like HNSW or IVF), but the core idea remains the same.
Conclusion
In this tutorial, we implemented vector search from scratch using Python. You can extend this implementation by using real word embeddings (e.g., from Hugging Face’s Transformers) or optimizing the search with approximate nearest neighbor techniques. Try experimenting with different queries or datasets to deepen your understanding of vector search!