Please share to show your support

Animesh Kumar Sinha

E-Journal Times Magazine

RAG (Retrieval-Augmented Generation) is a hybrid AI approach that combines:

1. Retrieval-based systems (for accuracy and up-to-date knowledge)

2. Generative models (for fluent, natural language responses)

Why do we need RAG? Where was it some time back?

A user enters a natural language query, such as “What are the latest features in Kubernetes 1.30?”

The query is converted into a vector embedding using a pre-trained encoder (such as OpenAI embeddings or Sentence Transformers).

This embedding is used to search a vector database for semantically similar documents.

The top-matched documents are retrieved to form the context.

This context is provided to a large language model (LLM) to generate an accurate and relevant response.

RAG

The diagram, which is on the top, illustrates the architecture of a Retrieval-Augmented Generation (RAG) system powered by Redis Vector DB and Azure OpenAI.

The process begins with PDFs (Step 1), which are converted into vector embeddings using a pre-trained model (Step 2). These embeddings are stored in a Redis Vector Database (Step 3). When a user submits a natural language question (Step 5), it is also converted into an embedding (Step 2) and sent to the retriever (Step 4). The retriever searches the Redis Vector DB to find semantically similar content. The retrieved content and the original question are passed to the RAG system (Step 7), which uses Azure OpenAI (Step 6) to generate a context-aware response. Finally, the system produces an accurate answer (Step 8) for the user.

If you look at the diagram above, embeddings are a critical component — the more meaningful the embeddings, the more accurate and relevant the search results become. In my example, I’ve used RedisVL as the database to store these embeddings.

Let’s now dive deeper into the Embedding Flow in a Retrieval-Augmented Generation (RAG) system.

The diagram illustrates the embedding process in a Redis-based RAG architecture. This process involves two key workflows: embedding documents (PDFs) during ingestion and embedding user queries at runtime. Both workflows are essential to enable accurate, vector-based semantic search.

Embedding PDFs (Ingestion Time)

Step 1: Load PDFs

PDF files are loaded using libraries like PyPDF or LangChain’s PyPDFLoader. This step extracts text from each page of the document.

Step 2: Split Text into Chunks

To maintain context and adhere to embedding size limits, the extracted text is split into smaller chunks (e.g., 500 tokens) using a text splitter.

Step 2.1: Preprocess Text

Before generating embeddings, I applied several preprocessing techniques to clean and normalize the text. These included:

Lowercasing
Removing punctuation
Tokenization
Stopword removal
Lemmatization or stemming

These steps reduce noise and help ensure the embeddings capture the semantic meaning more effectively.

Step 3: Generate Embeddings

Each text chunk is passed to an embedding model, such as OpenAI’s text-embedding-3-small, which converts the text into a high-dimensional vector.

A high-dimensional vector is simply a list of numbers (e.g., 1536 values) that numerically represent the meaning of text. Semantically similar sentences result in vectors that are close in this vector space. For instance, ‘Redis enables fast data retrieval’ and ‘Redis supports quick access to data’ may generate closely aligned vectors, even with different wording.

Though this article focuses on the RAG workflow, it’s crucial to highlight the common challenges in embedding generation. In my experience, embedding is actual data science work — if embeddings go wrong, the entire RAG pipeline can fail, regardless of the use case.

Here are a few real-world challenges I’ve encountered:

1. Semantic Drift

Vectors from unrelated content may appear similar due to shared terms or vague overlaps. This can cause irrelevant documents to be retrieved.

Example:

Relevant: “Redis supports high-speed caching for web applications.”
Irrelevant: “This resort offers a relaxing experience by the riverbank.”
User Query: “How does Redis support caching?”

Due to slight numerical similarity, the retriever might incorrectly choose the resort sentence as a match — a false positive.

2. Context Loss During Chunking

Splitting text mid-paragraph can weaken the contextual meaning, leading to subpar embeddings.

Bad Chunking Example:

Chunk A: “Redis is an in-memory data store, commonly used for caching. It supports various”
Chunk B: “data structures like strings, hashes, and lists. It is extremely fast and is often used…”

These fragments lose meaning when read independently, reducing embedding effectiveness.

3. Polysemy and Ambiguity

Words with multiple meanings (polysemy) can confuse models if context is weak or missing.

Example:

“bank” could mean:
A financial institution
The side of a river

Problem:

Sentence: “He sat by the bank and watched the water flow.”
Query: “How do I open an account at a bank?”

The shared word “bank” can lead to incorrect matches despite the semantic gap.

Mitigation:

Use larger context windows.
Add metadata (e.g., domain=finance or domain=nature).

4. Poor Preprocessing

Inconsistent tokenization or noise like headers and footers can degrade embedding quality.

5. Model Limitations

Generic embedding models may not grasp specialized jargon or domain-specific terms.

Example:

“The patient underwent CABG following myocardial infarction.”
CABG = Coronary Artery Bypass Grafting
Myocardial infarction = Heart attack

A general-purpose model may:

Misinterpret acronyms
Miss the clinical relationships

Mitigation:

Use domain-specific models (e.g., BioBERT, FinBERT)
Fine-tune on specialized corpora
Use metadata and keyword filters

6. No Ground Truth Validation

Often, systems lack human-in-the-loop checks to confirm if retrieved results are truly accurate.

Step 7: Vector Search in Redis

Redis compares the query vector with stored document vectors to find the most semantically similar results.

Step 8: Pass to RAG System

The retrieved chunks and original query are fed into a Large Language Model (LLM), such as Azure OpenAI, which generates a contextual response for the user.

Summary: Redis-based RAG Embedding Pipeline

This document outlines the process of embedding documents and user queries for a Redis-powered Retrieval-Augmented Generation (RAG) system. The workflow begins by extracting text from PDFs using libraries like PyPDF and splitting the text into manageable chunks. These chunks are preprocessed—lowercased, tokenized, cleaned—and converted into high-dimensional vectors using models such as OpenAI’s text-embedding-3-small.

The document highlights key embedding challenges, including:

Semantic drift
Context loss during chunking
Polysemy and ambiguity
Poor preprocessing
Model limitations, and
Lack of ground truth validation

Embeddings and user queries are stored and compared using Redis Stack with RediSearch, enabling fast and accurate vector search using cosine similarity. The final stage involves passing the top-K retrieved chunks and the query to an LLM like Azure OpenAI GPT-4, which produces a meaningful answer. For more information, visit https://cloud.google.com/use-cases/retrieval-augmented-generation

The choice of Redis as the vector store is driven by its performance, ease of integration, and native vector search support, making it a suitable option for building scalable semantic search workflows. Read another article, written by the author at https://journals-times.com/2025/05/31/agentic-ai-how-it-can-redefine-the-software-development-lifecycle/

Subscribe to our newsletter for free!

Please share to show your support

Emotional Resilience: The Wall Does Not Bleed - E-JOURNAL TIMES MAGAZINE on The Invisible UnionJuly 15, 2026
[…] So emotion must learn, slowly and at great cost, what the wall has always known without knowing anything: that…
99.9% Isn't Safe: Why Recall Value Is The Real Metric For Safety-Critical AI - E-JOURNAL TIMES MAGAZINE on Interpolation in Computer Vision: What Actually Happens When You Resize an ImageJuly 10, 2026
[…] Also read, “Interpolation in Computer Vision is the mathematical process of estimating and filling in missing pixel values whenever an…
Leopoldo Gomez Diaz on Tequila and Mezcal: There’s something magical about these two ancient drinks!July 6, 2026
Thank you for your comments David, I just read it :) I hope everything is going well with you and…
TRAUMA AND BASIC PRINCIPLES OF ANESTHESIA- By Dr. Vedala Ramakrishna - E-JOURNAL TIMES MAGAZINE on Diabetes Mellitus: Insights and Perspectives by Dr. Vedala RamakrishnaJune 29, 2026
[…] Pain starts with the activation of nociceptors, which initiate messages that are sent proximally to the spinal cord. Read…
Meet Meloute: The All-in-One Platform Quietly Rethinking How Companies Run Events - E-JOURNAL TIMES MAGAZINE on “Exploring Human Nature: A Personal Journey”- By Kumar SachinJune 22, 2026
[…] Also read, “The nature of people and the importance of preserving nature for our existence and development,” at https://journals-times.com/2024/04/01/exploring-human-nature-a-personal-journey-by-kumar-sachin/…

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

RAG (Retrieval-Augmented Generation) and Embedding : PART — 1

RAG (Retrieval-Augmented Generation) is a hybrid AI approach that combines:

Why do we need RAG? Where was it some time back?

RAG

Embedding PDFs (Ingestion Time)

Step 1: Load PDFs

Step 2: Split Text into Chunks

Step 2.1: Preprocess Text

Step 3: Generate Embeddings

1. Semantic Drift

2. Context Loss During Chunking

3. Polysemy and Ambiguity

4. Poor Preprocessing

5. Model Limitations

6. No Ground Truth Validation

Step 7: Vector Search in Redis

Step 8: Pass to RAG System

Summary: Redis-based RAG Embedding Pipeline

Subscribe to our newsletter for free!

Related

2 thoughts on “RAG (Retrieval-Augmented Generation) and Embedding : PART — 1”

Add yours

Leave a ReplyCancel reply

Thank you for your response. ✨

Exploring the World, One Story at a Time: Discover a wealth of articles, inspiring stories, and entrepreneurial journeys in our e-magazine.

Join us in celebrating the power of knowledge, creativity, and innovation."

Advertise your business journey.

Follow our WhatsApp Channel at

https://whatsapp.com/channel/0029VaUYR3K7NoZtVBdBGY0U

Our publications cover a wide range of topics. You can find what you're looking for by browsing these categories.

RAG (Retrieval-Augmented Generation) is a hybrid AI approach that combines:

Why do we need RAG? Where was it some time back?

RAG

Embedding PDFs (Ingestion Time)

Step 1: Load PDFs

Step 2: Split Text into Chunks

Step 2.1: Preprocess Text

Step 3: Generate Embeddings

1. Semantic Drift

2. Context Loss During Chunking

3. Polysemy and Ambiguity

4. Poor Preprocessing

5. Model Limitations

6. No Ground Truth Validation

Step 7: Vector Search in Redis

Step 8: Pass to RAG System

Summary: Redis-based RAG Embedding Pipeline

Subscribe to our newsletter for free!

Share this:

Related

2 thoughts on “RAG (Retrieval-Augmented Generation) and Embedding : PART — 1”

Add yours

Leave a ReplyCancel reply

Thank you for your response. ✨

Exploring the World, One Story at a Time: Discover a wealth of articles, inspiring stories, and entrepreneurial journeys in our e-magazine.

Join us in celebrating the power of knowledge, creativity, and innovation."

Advertise your business journey.

Follow our WhatsApp Channel at

https://whatsapp.com/channel/0029VaUYR3K7NoZtVBdBGY0U

Our publications cover a wide range of topics. You can find what you're looking for by browsing these categories.

Discover more from E-JOURNAL TIMES MAGAZINE