
向量数据库
文档:https://docs.spring.io/spring-ai/reference/api/vectordbs.html#_vectorstore_implementations
其实就是将字符串转换成多维的坐标,这个坐标多少维度呢?是你指定的,比如1024维度。
注意,这里我们用的是通义千问的向量数据库,依然是需要appKey的。
在向量数据库中,查询与传统关系型数据库不同。它们执行的是相似性搜索,而非精确匹配。当给定一个向量作为查询时,向量数据库会返回与该查询向量“相似”的向量。
Spring AI 通过 VectorStore 接口提供了一个抽象的 API,用于与向量数据库交互。
VectorStore接口中主要方法:
java
public interface VectorStore extends DocumentWriter {
default String getName() {
return this.getClass().getSimpleName();
}
// 向向量数据库写入数据
void add(List<Document> documents);
// 根据id删除数据
void delete(List<String> idList);
// 根据过滤表达式删除数据
void delete(Filter.Expression filterExpression);
default void delete(String filterExpression) { ... };
// 进行相似度搜索
List<Document> similaritySearch(String query);
List<Document> similaritySearch(SearchRequest request);
default <T> Optional<T> getNativeClient() {
return Optional.empty();
}
}
支持的向量数据库:
- Azure Vector Search
- Apache Cassandra
- Chroma Vector Store
- Elasticsearch Vector Store
- GemFire Vector Store
- MariaDB Vector Store
- Milvus Vector Store
- MongoDB Atlas Vector Store
- Neo4j Vector Store
- OpenSearch Vector Store
- Oracle Vector Store
- PgVector Store
- Pinecone Vector Store
- Qdrant Vector Store
- Redis Vector Store
- SAP Hana Vector Store
- Typesense Vector Store
- Weaviate Vector Store
- SimpleVectorStore - 一个简单的持久化向量存储实现,适合教学目的。
测试向量数据库
根据查询条件搜索出相近的字符串,这里我使用的是OpenAiEmbeddingModel,就是想测试看下每个字符串是否转换成了 1024维度的坐标,测试可知,确实是对的,将query
条件转成了1024长度的坐标数组。然后又用工具类测试了距离
xml
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
properties
## open-ai
spring.ai.openai.base-url=https://dashscope.aliyuncs.com/compatible-mode
spring.ai.openai.api-key=sk-xxxxxxxxxxxxxxxxxxxxxxxx
spring.ai.openai.chat.options.model=qwen-plus
## 向量数据库
### 向量数据库模型
spring.ai.openai.embedding.options.model=text-embedding-v3
### 向量维度
spring.ai.openai.embedding.options.dimensions=1024
java
package com.example.ai;
import com.example.ai.utils.VectorDistanceUtils;
import org.junit.jupiter.api.Test;
import org.springframework.ai.openai.OpenAiEmbeddingModel;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.util.Arrays;
import java.util.List;
@SpringBootTest
class Demo3ApplicationTests {
@Autowired
private OpenAiEmbeddingModel embeddingModel;
@Test
void contextLoads() {
// 1.测试数据
// 1.1.用来查询的文本,国际冲突
String query = "global conflicts";
// 1.2.用来做比较的文本
String[] texts = new String[]{
"哈马斯称加沙下阶段停火谈判仍在进行 以方尚未做出承诺",
"土耳其、芬兰、瑞典与北约代表将继续就瑞典“入约”问题进行谈判",
"日本航空基地水井中检测出有机氟化物超标",
"国家游泳中心(水立方):恢复游泳、嬉水乐园等水上项目运营",
"我国首次在空间站开展舱外辐射生物学暴露实验",
};
// 2.向量化
// 2.1.先将查询文本向量化
float[] queryVector = embeddingModel.embed(query);
// 2.2.再将比较文本向量化,放到一个数组
List<float[]> textVectors = embeddingModel.embed(Arrays.asList(texts));
// 3.比较欧氏距离 越小说明越近
// 3.1.把查询文本自己与自己比较,肯定是相似度最高的
System.out.println(VectorDistanceUtils.euclideanDistance(queryVector, queryVector));
// 3.2.把查询文本与其它文本比较
for (float[] textVector : textVectors) {
System.out.println(VectorDistanceUtils.euclideanDistance(queryVector, textVector));
}
System.out.println("------------------");
// 4.比较余弦距离 越大说明越近
// 4.1.把查询文本自己与自己比较,肯定是相似度最高的
System.out.println(VectorDistanceUtils.cosineDistance(queryVector, queryVector));
// 4.2.把查询文本与其它文本比较
for (float[] textVector : textVectors) {
System.out.println(VectorDistanceUtils.cosineDistance(queryVector, textVector));
}
}
}
工具类
java
package com.example.ai.utils;
public class VectorDistanceUtils {
// 防止实例化
private VectorDistanceUtils() {}
// 浮点数计算精度阈值
private static final double EPSILON = 1e-12;
/**
* 计算欧氏距离
* @param vectorA 向量A(非空且与B等长)
* @param vectorB 向量B(非空且与A等长)
* @return 欧氏距离
* @throws IllegalArgumentException 参数不合法时抛出
*/
public static double euclideanDistance(float[] vectorA, float[] vectorB) {
validateVectors(vectorA, vectorB);
double sum = 0.0;
for (int i = 0; i < vectorA.length; i++) {
double diff = vectorA[i] - vectorB[i];
sum += diff * diff;
}
return Math.sqrt(sum);
}
/**
* 计算余弦距离
* @param vectorA 向量A(非空且与B等长)
* @param vectorB 向量B(非空且与A等长)
* @return 余弦距离,范围[0, 2]
* @throws IllegalArgumentException 参数不合法或零向量时抛出
*/
public static double cosineDistance(float[] vectorA, float[] vectorB) {
validateVectors(vectorA, vectorB);
double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
for (int i = 0; i < vectorA.length; i++) {
dotProduct += vectorA[i] * vectorB[i];
normA += vectorA[i] * vectorA[i];
normB += vectorB[i] * vectorB[i];
}
normA = Math.sqrt(normA);
normB = Math.sqrt(normB);
// 处理零向量情况
if (normA < EPSILON || normB < EPSILON) {
throw new IllegalArgumentException("Vectors cannot be zero vectors");
}
// 处理浮点误差,确保结果在[-1,1]范围内
double similarity = dotProduct / (normA * normB);
similarity = Math.max(Math.min(similarity, 1.0), -1.0);
return similarity;
}
// 参数校验统一方法
private static void validateVectors(float[] a, float[] b) {
if (a == null || b == null) {
throw new IllegalArgumentException("Vectors cannot be null");
}
if (a.length != b.length) {
throw new IllegalArgumentException("Vectors must have same dimension");
}
if (a.length == 0) {
throw new IllegalArgumentException("Vectors cannot be empty");
}
}
}
SimpleVectorStore测试
利用的是SimpleVectorStore
用来实现查询和过滤。注意,如果想精确过滤的话,可以使用元数据来实现。metadata
普通搜索
java
@Bean
public SimpleVectorStore vectorStore(OpenAiEmbeddingModel embeddingModel) {
return SimpleVectorStore.builder(embeddingModel).build();
}
java
import org.junit.jupiter.api.Test;
import org.springframework.ai.document.Document;
import org.springframework.ai.openai.OpenAiEmbeddingModel;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.util.List;
import java.util.Map;
@SpringBootTest
class Demo3ApplicationTests {
@Autowired
private OpenAiEmbeddingModel embeddingModel;
@Autowired
private SimpleVectorStore simpleVectorStore;
@Test
void test1() {
List<Document> documents = List.of(
new Document("1", "今天天气不错", Map.of("country", "郑州", "date", "2025-05-13")),
new Document("2", "天气不错,适合旅游", Map.of("country", "开封", "date", "2025-05-15")),
new Document("3", "去哪里旅游好呢", Map.of("country", "洛阳", "date", "2025-05-15")));
// 存储数据
simpleVectorStore.add(documents);
// 相似度检索
List<Document> list= simpleVectorStore.similaritySearch("旅游");
for (Document document : list) {
System.out.println(document.getText());
System.out.println(document.getScore());
}
}
}
根据元数据过滤器进行搜索
使用字符串设置搜索条件
java
SearchRequest request = SearchRequest.builder()
.query("World")
.filterExpression("country == 'Bulgaria'")
.build();
使用Filter.Expression设置搜索条件
java
Expression expression = this.b.eq("country", "BG").build();
Expression exp = b.and(b.eq("genre", "drama"), b.gte("year", 2020)).build();
Expression exp = b.and(b.eq("genre", "drama"), b.gte("year", 2020)).build();
运用
java
// 相似度检索
FilterExpressionBuilder b = new FilterExpressionBuilder();
Filter.Expression filter = b.eq("country", "洛阳").build();
SearchRequest request = SearchRequest.builder()
.query("旅游") // 搜索内容
.filterExpression(filter) // 指定过滤器对象
.build();
List<Document> list= simpleVectorStore.similaritySearch(request);
pdf写入向量数据库
xml
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
测试写入
java
import org.junit.jupiter.api.Test;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.ExtractedTextFormatter;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.reader.pdf.config.PdfDocumentReaderConfig;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.core.io.FileSystemResource;
import org.springframework.core.io.Resource;
import java.util.List;
@SpringBootTest
class Demo3ApplicationTests {
@Autowired
private SimpleVectorStore simpleVectorStore;
@Test
public void testVectorStore(){
Resource resource = new FileSystemResource("中二知识笔记.pdf");
// 1.创建PDF的读取器
PagePdfDocumentReader reader = new PagePdfDocumentReader(
resource, // 文件源
PdfDocumentReaderConfig.builder()
.withPageExtractedTextFormatter(ExtractedTextFormatter.defaults())
.withPagesPerDocument(1) // 每1页PDF作为一个Document
.build()
);
// 2.读取PDF文档,拆分为Document
List<Document> documents = reader.read();
// 3.写入向量库
simpleVectorStore.add(documents);
// 4.搜索
SearchRequest request = SearchRequest.builder()
.query("论语中教育的目的是什么")
.topK(1)
.similarityThreshold(0.6)
.filterExpression("file_name == '中二知识笔记.pdf'")
.build();
List<Document> docs = simpleVectorStore.similaritySearch(request);
if (docs == null) {
System.out.println("没有搜索到任何内容");
return;
}
for (Document doc : docs) {
System.out.println(doc.getId());
System.out.println(doc.getScore());
System.out.println(doc.getText());
}
}
}
整合springAi
java
@Bean
public ChatClient pdfChatClient(AlibabaOpenAiChatModel model, ChatMemory chatMemory, VectorStore vectorStore) {
return ChatClient
.builder(model)
.defaultSystem(SystemConstants.SERVICE_SYSTEM_PROMPT)
.defaultAdvisors(
new SimpleLoggerAdvisor(),
new MessageChatMemoryAdvisor(chatMemory),
new QuestionAnswerAdvisor(
vectorStore, //向量库
SearchRequest.builder() //向量检索的请求参数
.similarityThreshold(0.6d) //相似度阈值
.topK(2)//返回文档的数量
.build()
)
)
.build();
}
接口,增加一个advisors,主要是用来指定处理那个pdf文档
java
@RequestMapping(value = "/chat3", produces = "text/html;charset=utf-8")
public Flux<String> chat3(@RequestParam("prompt") String prompt,@RequestParam("chatId") String chatId) {
// 1.保存会话id
chatHistoryService.save("chat", chatId);
return chatClient.prompt()
.user(prompt)
.advisors(advisorSpec -> advisorSpec.param(AbstractChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY, chatId))
.advisors(advisorSpec -> advisorSpec.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "file_name == 'hello.pdf' "))
.stream().content();
}