sonbahis girişsonbahissonbahis güncelgameofbetvdcasinomatbetgrandpashabetgrandpashabetエクスネスgiftcardmall/mygiftcasibomcasibom girişinterbahisinterbahis girişultrabetultrabet girişhiltonbethiltonbet girişenjoybetenjoybet giriştrendbettrendbet girişalobetalobet girişromabetromabet girişbetcio girişbetciokulisbetkulisbet girişbahiscasinobahiscasino girişroketbetroketbet girişnorabahisnorabahis girişbetzulabetzula girişbetgarbetgar girişultrabetultrabet girişteosbetteosbet girişeditörbeteditörbet girişorisbetorisbet girişceltabetceltabet girişenjoybetenjoybet girişalobetalobet girişromabetromabet girişbetciobetcio girişkulisbetkulisbetbahiscasinobahiscasino girişroketbetroketbet girişnorabahisnorabahis girişbetzulabetzula girişbetgarbetgar girişultrabetultrabet girişteosbetteosbet girişeditörbeteditörbet girişorisbetorisbet girişceltabetceltabet girişenjoybetenjoybet girişromabetromabet girişbetciobetcio girişbahiscasinobahiscasino girişroketbetroketbet girişnorabahisnorabahis girişbetzulabetzula girişbetgarbetgar girişultrabetultrabet girişeditörbeteditörbet girişorisbetorisbet girişceltabetceltabet girişenjoybetenjoybet girişalobetalobet girişkulisbetkulisbetteosbet girişteosbet girişromabetromabet girişbetciobetcio girişbahiscasino girişbahiscasinoroketbetroketbet girişnorabahisnorabahis girişbetzulabetzula girişbetgarbetgar girişultrabetultrabet girişeditörbeteditörbet girişorisbetorisbet girişceltabetceltabet girişenjoybetenjoybet girişalobetalobet girişkulisbetkulisbet girişteosbetteosbet girişbetcioalobetkulisbetbahiscasinobetgarnorabahisromabetatmbahisbetzulaultrabetjojobetjojobet güncel girişholiganbetholiganbet girişjojobetjojobet girişromabetromabet girişbetciobetcio girişroketbetroketbet girişnorabahisnorabahisbetzulabetzula girişbetgarbetgar girişultrabetultrabet girişeditörbeteditörbet girişorisbetorisbet girişceltabetceltabet girişenjoybetenjoybet girişalobetalobet girişkulisbetkulisbet girişteosbetteosbet girişbahiscasinobahiscasino girişromabetromabet girişroketbetroketbet girişbetciobetcio girişbahiscasinobahiscasino girişkulisbetkulisbet girişultrabetultrabet girişholiganbetholiganbet girişteosbetteosbetceltabetceltabet girişalobetalobet girişromabetromabet girişbetciobetcio girişroketbetroketbet girişbahiscasinobahiscasino girişkulisbetkulisbet girişultrabetultrabet girişholiganbetholiganbet girişteosbetteosbet girişceltabetceltabet girişalobetalobet girişavvabetavvabet girişbelugabahisbelugabahis girişbetcupbetcup girişbetebetbetebet girişbetpasbetpas girişbetvolebetvole girişelexbetelexbet girişimajbetimajbet girişperabetperabet girişinterbahisinterbahis girişlidyabetlidyabet girişlimanbetlimanbet girişalobetalobet girişromabetromabet girişgalabetgalabet girişroketbetroketbet girişultrabetultrabet girişavrupabetavrupabet girişenjoybetenjoybet girişatmbahisatmbahis girişbetgarbetgar girişbetnano girişbetnanoeditörbeteditörbet girişbetkolikbetkolik girişprensbetprensbet girişsetrabetsetrabet girişbetnisbetnis girişromabetromabetalobetalobetgalabetgalabetroketbetroketbetultrabetultrabetavrupabetavrupabetbetnisbetnisenjoybetenjoybetatmbahisatmbahisbetgarbetgarbetnanobetnanoeditörbeteditörbetbetkolikbetkolikprensbetprensbetalobetalobet girişromabetromabet girişultrabetultrabet girişroketbetroketbet girişgalabetgalabet girişavrupabetavrupabet girişenjoybetenjoybet girişatmbahisatmbahis girişbetgarbetgar girişbetnanobetnano girişeditörbeteditörbet girişbetkolikbetkolik girişprensbetprensbet girişsetrabetsetrabet girişbetnisbetnis girişjojobetjojobet girişjojobetjojobet girişholiganbetholiganbet girişholiganbetholiganbet girişmarsbahismarsbahis girişmarsbahismarsbahis girişlunabetlunabet girişlunabetlunabet girişmatbetmatbet girişmatbetmatbet girişnakitbahisnakitbahis girişnakitbahis girişnakitbahiskingroyalkingroyal girişkingroyalkingroyal girişmeritkingmeritking girişmeritkingmeritking girişmeritkingmeritking girişmeritkingmeritking girişbetciobetcio girişbetciobetcio giriş

Build Semantic Search with LLM Embeddings


In this article, you will learn how to build a simple semantic search engine using sentence embeddings and nearest neighbors.

Topics we will cover include:

  • Understanding the limitations of keyword-based search.
  • Generating text embeddings with a sentence transformer model.
  • Implementing a nearest-neighbor semantic search pipeline in Python.

Let’s get started.

Build Semantic Search with LLM Embeddings

Build Semantic Search with LLM Embeddings
Image by Editor

Introduction

Traditional search engines have historically relied on keyword search. In other words, given a query like “best temples and shrines to visit in Fukuoka, Japan”, results are retrieved based on keyword matching, such that text documents containing co-occurrences of words like “temple”, “shrine”, and “Fukuoka” are deemed most relevant.

However, this classical approach is notoriously rigid, as it largely relies on exact word matches and misses other important semantic nuances such as synonyms or alternative phrasing — for example, “young dog” instead of “puppy”. As a result, highly relevant documents may be inadvertently omitted.

Semantic search addresses this limitation by focusing on meaning rather than exact wording. Large language models (LLMs) play a key role here, as some of them are trained to translate text into numerical vector representations called embeddings, which encode the semantic information behind the text. When two texts like “small dogs are very curious by nature” and “puppies are inquisitive by nature” are converted into embedding vectors, those vectors will be highly similar due to their shared meaning. Meanwhile, the embedding vectors for “puppies are inquisitive by nature” and “Dazaifu is a signature shrine in Fukuoka” will be very different, as they represent unrelated concepts.

Following this principle — which you can explore in more depth here — the remainder of this article guides you through the full process of building a compact yet efficient semantic search engine. While minimalistic, it performs effectively and serves as a starting point for understanding how modern search and retrieval systems, such as retrieval augmented generation (RAG) architectures, are built.

The code explained below can be run seamlessly in a Google Colab or Jupyter Notebook instance.

Step-by-Step Guide

First, we make the necessary imports for this practical example:

We will use a toy public dataset called "ag_news", which contains texts from news articles. The following code loads the dataset and selects the first 1000 articles.

We now load the dataset and extract the "text" column, which contains the article content. Afterwards, we print a short sample from the first article to inspect the data:

The next step is to obtain embedding vectors (numerical representations) for our 1000 texts. As mentioned earlier, some LLMs are trained specifically to translate text into numerical vectors that capture semantic characteristics. Hugging Face sentence transformer models, such as "all-MiniLM-L6-v2", are a common choice. The following code initializes the model and encodes the batch of text documents into embeddings.

Next, we initialize a NearestNeighbors object, which implements a nearest-neighbor strategy to find the k most similar documents to a given query. In terms of embeddings, this means identifying the closest vectors (smallest angular distance). We use the cosine metric, where more similar vectors have smaller cosine distances (and higher cosine similarity values).

The core logic of our search engine is encapsulated in the following function. It takes a plain-text query, specifies how many top results to retrieve via top_k, computes the query embedding, and retrieves the nearest neighbors from the index.

The loop inside the function prints the top-k results ranked by similarity:

And that’s it. To test the function, we can formulate a couple of example search queries:

The results are ranked by similarity (truncated here for clarity):

Summary

What we have built here can be seen as a gateway to retrieval augmented generation systems. While this example is intentionally simple, semantic search engines like this form the foundational retrieval layer in modern architectures that combine semantic search with large language models.

Now that you know how to build a basic semantic search engine, you may want to explore retrieval augmented generation systems in more depth.



Source link

WordPress Directory PDF Embed – WordPress PDF Viewer plugin PDF Password Protect PDF Product Catalog for WooCommerce PDF Viewer – Addon For Elementor PDFMentor Pro – WordPress PDF Generator for Elementor Peace – Insurance Agency WordPress Theme Peaker – Fitness & Gym WordPress Theme PeakShops – Modern & Multi-Concept WooCommerce Theme Pearl - Corporate Business WordPress Theme Pearlsell WP – Jewelry Elementor WooCommerce Theme