Using ChatGPT, Embeddings, and HyDE to Improve Search Results

This article originally appeared on Linked on July 11th, 2023.

Using ChatGPT, Embeddings, and HyDE to Improve Search Results

Rick Hightower

Engineering Consultant focused on AI

July 11, 2023

Using ChatGPT, Embeddings, and HyDE to Improve Search Results

Introduction

Staying ahead of the competition is imperative in today's fast-paced business environment. An efficient search engine that can provide accurate information to your customers or employees can make all the difference. However, building and maintaining a robust search engine can be a challenge. In this dev notebook, let’s explore how ChatGPT, Embeddings, and HyDE can help you improve your search results.

Combining ChatGPT with retrieval and re-ranking methods, you can achieve accurate, relevant, and fast search results that will set you apart from your competitors. This approach also facilitates seamless integration with existing search engines, making it an ideal way to improve search engine performance for businesses of all sizes. As CTOs, CIOs, engineering managers, and software engineers, you have much to gain by implementing this approach and making your search engine more efficient.

Overview

ChatGPT is a deep learning model that we will use to rank content returned from a search engine on a hypothetical answer. It creates a hypothetical document based on a given query and then embeddings for this hypothetical answer. The system then sorts the results using the dot calculation to get the cosine similarities to score and sort articles from the search service. Articles are ranked by relevance as determined by the cosine similarity of their embeddings versus the hypothetical answer. Using this approach, ChatGPT can assist in improving search systems.

There are two ways to retrieve information using GPT: Mimicking Human Navigation and Retrieval with embedding. Retrieval with Embeddings can be done by calculating embeddings for query results and an ideal user answer. Then, the most related content, as measured by cosine similarity to the hypothetical answer (HyDE), is retrieved. Combining these approaches and drawing inspiration from re-ranking methods can improve search accuracy.

One of the key benefits of this approach is its ability to be implemented on top of any existing search systems. Combining this with your current system can improve your search engine's accuracy and speed. This approach can be applied to search engines that use Elasticsearch, Solr, or any custom search engine application.

Dive In

This dev notebook shows how ChatGPT employs the Hypothetical Document Embeddings HyDE to rank content on a hypothetical answer. We use the HyDE by having ChatGPT create a hypothetical document based on a given query.

Then we use ChatGPT to create embeddings hypothetical answer, then sort the results using the dot calculation to get the cosine_similarities and rank the articles by relevance as determined by the cosine_similarities of their embeddings vs. the hypothetical answer.

Looking for relevant information can be a challenging task. This dev notebook adapts a ChatGPT cookbook for improved search to Java, then breaks the steps down and explains each step. In this developer notebook, we explore a way to improve existing search systems with various AI techniques that will help us. We explore a way to filter through the data using various AI techniques to help us put our data through Boot camp so the data becomes information.

There are two ways to retrieve information using GPT: