Kernel Memory 入门系列:生成并获取文档摘要
前面在RAG和文档预处理的流程中,我们得到一个解决方案,可以让用户直接获取最终的问题答案。
但是实际的业务场景中,仍然存在一些基础的场景,不需要我们获取文档的所有详情的,而只是了解的文档的大概信息,得到文章整体的摘要或者总结,此时仍然可以使用Kernel Memory来处理。
生成摘要
我们依然使用Kernel Memory的文件导入方法,不过此时不需要指定默认的处理流程,而只需要指定Summary流程即可。
1 | await memory.ImportDocumentAsync(new Document("doc1") |
其中PipelineOnlySummary 包含了一下步骤:
- extract
- summarize
- gen_embeddings
- save_records
相比较默认的流程,仅是将partition变更为了summarize, 但是实际存储的记录将不再是源文档的分片,而是经过LLM总结之后的内容摘要。
获取摘要
获取的摘要的方法更加直接,使用SearchSummariesAsync方法,通过文档过滤条件过滤需要获取文档摘要即可。
1 | // Fetch the list of summaries. The API returns one summary for each file. |
检索生成数据
摘要的生成和检索在Kernel Memory中实际是数据类型标记和自定义筛选筛选的过程。
在生成摘要的过程中,将摘要内容作为生成内容,通过添加__synth:summary标记进行存储,筛选的时候也是类似。文档的标记和筛选,将会在后续【文档管理】中的详细讲解。
而摘要的检索的过程SearchSummariesAsync实际上也是调用SearchSyntheticsAsync过程,指定了__synth:summary标记的段落进行检索。
同理,生成摘要的过程也可以进行自定义的过程,例如文章分类,关键词提取,实体提取,题图生成等任何的文章处理流程。后续也会详细介绍【自定义流程】的处理。
参考
- Summarizing documents:
https://github.com/microsoft/kernel-memory/tree/main/examples/106-dotnet-retrieve-synthetics
- kernel-memory/service/Abstractions/KernelMemoryExtensions.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75// Copyright (c) Microsoft. All rights reserved.
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
namespace Microsoft.KernelMemory;
/// <summary>
/// Kernel Memory API extensions
/// </summary>
public static class KernelMemoryExtensions
{
/// <summary>
/// Return a list of synthetic memories of the specified type
/// </summary>
/// <param name="memory">Memory instance</param>
/// <param name="syntheticType">Type of synthetic data to return</param>
/// <param name="index">Optional name of the index where to search</param>
/// <param name="filter">Filter to match</param>
/// <param name="filters">Filters to match (using inclusive OR logic). If 'filter' is provided too, the value is merged into this list.</param>
/// <param name="cancellationToken">Async task cancellation token</param>
/// <returns>List of search results</returns>
public static async Task<List<Citation>> SearchSyntheticsAsync(
this IKernelMemory memory,
string syntheticType,
string? index = null,
MemoryFilter? filter = null,
ICollection<MemoryFilter>? filters = null,
CancellationToken cancellationToken = default)
{
if (filters == null)
{
filters = new List<MemoryFilter>();
if (filter == null) { filters.Add(new MemoryFilter()); }
}
if (filter != null)
{
filters.Add(filter);
}
foreach (var x in filters)
{
x.ByTag(Constants.ReservedSyntheticTypeTag, syntheticType);
}
SearchResult searchResult = await memory.SearchAsync(
query: "",
index: index,
filters: filters,
cancellationToken: cancellationToken).ConfigureAwait(false);
return searchResult.Results;
}
/// <summary>
/// Return a list of summaries matching the given filters
/// </summary>
/// <param name="memory">Memory instance</param>
/// <param name="index">Optional name of the index where to search</param>
/// <param name="filter">Filter to match</param>
/// <param name="filters">Filters to match (using inclusive OR logic). If 'filter' is provided too, the value is merged into this list.</param>
/// <param name="cancellationToken">Async task cancellation token</param>
/// <returns>List of search results</returns>
public static Task<List<Citation>> SearchSummariesAsync(
this IKernelMemory memory,
string? index = null,
MemoryFilter? filter = null,
ICollection<MemoryFilter>? filters = null,
CancellationToken cancellationToken = default)
{
return SearchSyntheticsAsync(memory, Constants.TagsSyntheticSummary, index, filter, filters, cancellationToken);
}
}
All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.