友情链接:
图片情色五月天图片情色五月天图片
1 Introduction先容Human endeavors across a range of domains rely on our ability to read and reason about large collections of documents, often reaching conclusions that go beyond anything stated in the source texts themselves. With the emergence of large language models (LLMs), we are already witnessing attempts to automate human-like sensemaking in complex domains like scientific discovery (Mi-crosoft, 2023) and intelligence analysis (Ranade and Joshi, 2023), where sensemaking is defined as “a motivated, continuous effort to understand connections (which can be among people, places, and events) in order to anticipate their trajectories and act effectively” (Klein et al., 2006a). Supporting human-led sensemaking over entire text corpora, however, needs a way for people to both apply and refine their mental model of the data (Klein et al., 2006b) by asking questions of a global nature.东说念主类在各个范围进行的行动依赖于咱们阅读和推理大宗文档的才智,通常得出超出源文本自己的论断。跟着大型讲话模子(LLMs)的出现,咱们仍是见证了在科学发现(Mi-crosoft, 2023)和谍报分析(Ranade和Joshi, 2023)等复杂范围自动化类东说念主语义构建的尝试,其汉文义构建被界说为“一种有动机的、不绝的致力于,以贯通筹商(不错是东说念主、地点和事件之间的筹商),以便忖度它们的轨迹并有用地选择行动”(Klein等东说念主,2006a)。关联词,支柱东说念主类主导的通盘文本语料库的语义构建,需要一种方法,让东说念主们通过提议全局性的问题来应用和完善他们对数据的情态模子(Klein等东说念主,2006b)。Retrieval-augmented generation (RAG, Lewis et al., 2020) is an established approach to answering user questions over entire datasets, but it is designed for situations where these answers are contained locally within regions of text whose retrieval provides sufficient grounding for the generation task. Instead, a more appropriate task framing is query-focused summarization (QFS, Dang, 2006), and in particular, query-focused abstractive summarization that generates natural language summaries and not just concatenated excerpts (Baumel et al., 2018; Laskar et al., 2020; Yao et al., 2017) . In recent years, however, such distinctions between summarization tasks that are abstractive versus extractive, generic versus query-focused, and single-document versus multi-document, have become less rele-vant. While early applications of the transformer architecture showed substantial improvements on the state-of-the-art for all such summarization tasks (Goodwin et al., 2020; Laskar et al., 2022; Liu and Lapata, 2019), these tasks are now trivialized by modern LLMs, including the GPT (Achiam et al., 2023; Brown et al., 2020), Llama (Touvron et al., 2023), and Gemini (Anil et al., 2023) series, all of which can use in-context learning to summarize any content provided in their context window.检索增强生成(RAG, Lewis等东说念主,2020)是一种针对通盘数据集回报用户问题的既定方法,但它是为这些谜底局部包含在文本区域内的情况而蓄意的,这些文本区域的检索为生成任务提供了富足的基础。相背,更合适的任务框架所以查询为中心的纲领(QFS, Dang, 2006),特地所以查询为中心的抽象纲领,它生成当然讲话纲领,而不单是是结合的摘录(Baumel等东说念主,2018;Laskar et al., 2020;Yao等东说念主,2017)。关联词,连年来,抽象与抽取、通用与以查询为中心、单文档与多文档的纲领任务之间的区别仍是变得不那么紧迫了。诚然transformer架构的早期应用在系数此类汇总任务上都披袒露了繁密的突出(Goodwin et al., 2020;Laskar et al., 2022;Liu和Lapata, 2019),这些任务目下被当代LLMs简化了,包括GPT (Achiam等东说念主,2023;Brown et al., 2020), Llama (Touvron et al., 2023)和Gemini (Anil et al., 2023)系列,系数这些都不错使用高下体裁习来去顾高下文窗口中提供的任何内容。The challenge remains, however, for query-focused abstractive summarization over an entire corpus. Such volumes of text can greatly exceed the limits of LLM context windows, and the expansion of such windows may not be enough given that information can be “lost in the middle” of longer contexts (Kuratov et al., 2024; Liu et al., 2023). In addition, although the direct retrieval of text chunks in 神圣的RAG is likely inadequate for QFS tasks, it is possible that an alternative form of pre-indexing could support a new RAG approach specifically targeting global summarization.关联词,对于通盘语料库的以查询为中心的抽象纲领来说,挑战仍然存在。这么的文本量不错大大突出LLM高下文窗口的适度,何况磋议到信息可能会“丢失在中间”的较长的高下文,这么的窗口的彭胀可能是不够的(Kuratov等东说念主,2024;Liu et al., 2023)。此外,尽管在神圣的RAG中径直检索文本块可能不适当QFS任务,然则一种替代模样的预索引可能支柱特地针对全局纲领的新RAG方法。In this paper, we present a Graph RAG approach based on global summarization of an LLM-derived knowledge graph (Figure 1). In contrast with related work that exploits the structured retrieval and traversal affordances of graph indexes (subsection 4.2), we focus on a previously unexplored quality of graphs in this context: their inherent modularity (Newman, 2006) and the ability of com-munity detection algorithms to partition graphs into modular communities of closely-related nodes (e.g., Louvain, Blondel et al., 2008; Leiden, Traag et al., 2019). LLM-generated summaries of these community descriptions provide complete coverage of the underlying graph index and the input doc-uments it represents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach: first using each community summary to answer the query independently and in parallel, then summarizing all relevant partial answers into a final global answer.在本文中,咱们提议了一种基于LLM派生的学问图的全局回想的Graph RAG方法(图1)。与利用图索引的结构化检索和遍历可视性的干系职责(第4.2节)比拟,咱们专注于在此布景下昔日未探索的图的质地:它们固有的模块化(Newman, 2006)以及社区检测算法将图辩认为密切相要道点的模块化社区的才智(举例,Louvain, Blondel等东说念主,2008;莱顿,Traag等东说念主,2019)。LLM生成的这些社区容颜的纲领提供了底层图形索引偏激所代表的输入文档的圆善隐敝。然后,不错使用map-reduce方法对通盘语料库进行以查询为中心的汇总:领先使用每个社区汇总来独随即并行地回报查询,然后将系数干系的部分谜底汇总为最终的全局谜底。To evaluate this approach, we used an LLM to generate a diverse set of activity-centered sense-making questions from short descriptions of two representative real-world datasets, containing pod-cast transcripts and news articles respectively. For the target qualities of comprehensiveness, diver-sity, and empowerment (defined in subsection 3.4) that develop understanding of broad issues and themes, we both explore the impact of varying the the hierarchical level of community summaries used to answer queries, as well as compare to 神圣的RAG and global map-reduce summarization of source texts. We show that all global approaches outperform 神圣的RAG on comprehensiveness and diversity, and that Graph RAG with intermediate- and low-level community summaries shows favorable performance over source text summarization on these same metrics, at lower token costs.为了评估这种方法,咱们使用LLM从两个具有代表性的确切寰宇数据集的不祥容颜中生成了一组以行动为中心的酷好构建问题,这些数据集分别包含播客文稿和新闻著作。对于发展平等闲问题和主题的贯通的空洞性、万般性和赋权(在第3.4末节中界说)的主义质地,咱们都探索了用于回报查询的不同社区纲领的档次水平的影响,并与神圣的RAG和源文本的大家舆图减少纲领进行了比较。咱们标明,系数全局方法在全面性和万般性方面都优于神圣的RAG,何况具有中级和初级社区纲领的Graph RAG在这些疏浚的方针上以更低的令牌资本披袒露比源文本纲领更好的性能。2 Graph RAG Approach & Pipeline图RAG方法和管说念We now unpack the high-level data flow of the Graph RAG approach (Figure 1) and pipeline, de-scribing key design parameters, techniques, and implementation details for each step.目下咱们解压缩Graph RAG方法的高级数据流(图1)和管说念,容颜每个花式的瑕疵蓄意参数、时期和已毕细节。2.1 Source Documents → Text Chunks源文档→文本块A fundamental design decision is the granularity with which input texts extracted from source doc-uments should be split into text chunks for processing. In the following step, each of these chunks will be passed to a set of LLM prompts designed to extract the various elements of a graph index. Longer text chunks require fewer LLM calls for such extraction, but suffer from the recall degrada-tion of longer LLM context windows (Kuratov et al., 2024; Liu et al., 2023). This behavior can be observed in Figure 2 in the case of a single extraction round (i.e., with zero gleanings): on a sample dataset (HotPotQA, Yang et al., 2018), using a chunk size of 600 token extracted almost twice as many entity references as when using a chunk size of 2400. While more references are generally better, any extraction process needs to balance recall and precision for the target activity.一个基本的蓄意决策是将从源文档中索要的输入文分内割成文本块进行处理的粒度。在接下来的花式中,每个块都将传递给一组LLM指示符,这些指示符旨在索要图索引的万般元素。较长的文本块需要较少的LLM调用来进行这种索要,但较长的LLM高下文窗口会导致调回率着落(Kuratov等东说念主,2024;Liu et al., 2023)。在单个索要轮(即零汇注)的情况下,不错在图2中不雅察到这种行动:在样本数据集(HotPotQA, Yang等东说念主,2018)上,使用块大小为600的令牌索要的实体援用简直是使用块大小为2400时的两倍。诚然援用越多越好,但任何索要过程都需要均衡主义行动的调回率和精度。2.2 Text Chunks → Element Instances文本块→元素实例The baseline requirement for this step is to identify and extract instances of graph nodes and edges from each chunk of source text. We do this using a multipart LLM prompt that first identifies all entities in the text, including their name, type, and description, before identifying all relationships between clearly-related entities, including the source and target entities and a description of their relationship. Both kinds of element instance are output in a single list of delimited tuples.The primary opportunity to tailor this prompt to the domain of the document corpus lies in the choice of few-shot examples provided to the LLM for in-context learning (Brown et al., 2020). For example, while our default prompt extracting the broad class of “named entities” like people, places, and organizations is generally applicable, domains with specialized knowledge (e.g., science, medicine, law) will benefit from few-shot examples specialized to those domains. We also support a secondary extraction prompt for any additional covariates we would like to associate with the extracted node instances. Our default covariate prompt aims to extract claims linked to detected entities, including the subject, object, type, description, source text span, and start and end dates.这一步的基本要求是从每个源文本块中识别和索要图节点和边的实例。咱们使用多部分LLM指示符来完成此操作,该指示符领先识别文本中的系数实体,包括它们的称呼、类型和容颜,然后识别明确干系实体之间的所干系系,包括源实体和主义实体以及它们之间关系的容颜。这两种类型的元素实例都输出在单个分隔元组列表中。将此指示定制为文档语料库范围的主要契机在于礼聘提供给LLMs进行高下体裁习的少许示例(Brown et al., 2020)。举例,诚然咱们的默许指示索要“定名实体”(如东说念主员、地点和组织)的等闲类别频繁是适用的,但具有特地学问的范围(举例,科学、医学、法律)将受益于特地针对这些范围的少许示例。对于咱们但愿与索要的节点实例干系联的任何其他协变量,咱们还支柱扶植索要指示符。咱们默许的协变量指示旨在索要与检测到的实体干系联的声明,包括主题、对象、类型、容颜、源文本跨度以及运行和收尾日历。To balance the needs of efficiency and quality, we use multiple rounds of “gleanings”, up to a specified maximum, to encourage the LLM to detect any additional entities it may have missed on prior extraction rounds. This is a multi-stage process in which we first ask the LLM to assess whether all entities were extracted, using a logit bias of 100 to force a yes/no decision. If the LLM responds that entities were missed, then a continuation indicating that “MANY entities were missed in the last extraction” encourages the LLM to glean these missing entities. This approach allows us to use larger chunk sizes without a drop in quality (Figure 2) or the forced introduction of noise.为了均衡恶果和质地的需要,咱们使用多轮“汇注”,直到指定的最大值,以饱读舞LLM检测之前索要轮次中可能遗漏的任何其他实体。这是一个多阶段的过程,咱们领先要求LLM评估是否索要了系数实体,使用100的logit偏差来强制作念出是/否的决定。如果LLM反应实体丢失了,那么指引“在前次索要中丢失了好多实体”的延续将饱读舞LLM汇注这些丢失的实体。这种方法允许咱们使用更大的块大小,而不会裁减质地(图2)或强制引入噪声。2.3 Element Instances → Element Summaries元素实例→元素纲领The use of an LLM to “extract” descriptions of entities, relationships, and claims represented in source texts is already a form of abstractive summarization, relying on the LLM to create inde-pendently meaningful summaries of concepts that may be implied but not stated by the text itself (e.g., the presence of implied relationships). To convert all such instance-level summaries into sin-gle blocks of descriptive text for each graph element (i.e., entity node, relationship edge, and claim covariate) requires a further round of LLM summarization over matching groups of instances.使用LLM来“索要”源文本中默示的实体、关系和声明的容颜仍是是一种抽象纲领的模样,依靠LLM来创建可能隐含但未由文本自己证实的倡导的零丁有酷好的纲领(举例,隐含关系的存在)。要将系数这么的实例级纲领转换为每个图元素(即实体节点、关系角落和索赔协变量)的单个容颜性文本块,需要对匹配的实例组进行进一步的LLM纲领。A potential concern at this stage is that the LLM may not consistently extract references to the same entity in the same text format, resulting in duplicate entity elements and thus duplicate nodes in the entity graph. However, since all closely-related “communities” of entities will be detected and summarized in the following step, and given that LLMs can understand the common entity behind multiple name variations, our overall approach is resilient to such variations provided there is sufficient connectivity from all variations to a shared set of closely-related entities.Overall, our use of rich descriptive text for homogeneous nodes in a potentially noisy graph structure is aligned with both the capabilities of LLMs and the needs of global, query-focused summarization. These qualities also differentiate our graph index from typical knowledge graphs, which rely on concise and consistent knowledge triples (subject, predicate, object) for downstream reasoning tasks.这个阶段的一个潜在问题是,LLM可能无法长久如一地以疏浚的文本花式索要对灭亡实体的援用,从而导致重复的实体元素,从而导致实体图中的重复节点。关联词,由于系数密切干系的实体“社区”将在接下来的花式中被检测和回想,何况磋议到LLM不错贯通多个称呼变化背后的共同实体,咱们的举座方法对于这些变化是有弹性的,惟有系数变化与一组分享的密切干系的实体有富足的结合。2.4 Element Summaries → Graph Communities元素纲领→图社区The index created in the previous step can be modelled as an homogeneous undirected weighted graph in which entity nodes are connected by relationship edges, with edge weights representing the normalized counts of detected relationship instances. Given such a graph, a variety of community detection algorithms may be used to partition the graph into communities of nodes with stronger connections to one another than to the other nodes in the graph (e.g., see the surveys by Fortu-nato, 2010 and Jin et al., 2021). In our pipeline, we use Leiden (Traag et al., 2019) on account of its ability to recover hierarchical community structure of large-scale graphs efficiently (Figure 3). Each level of this hierarchy provides a community partition that covers the nodes of the graph in a mutually-exclusive, collective-exhaustive way, enabling divide-and-conquer global summarization.在前一步中创建的索引不错建模为一个同构无向加权图,其中实体节点通过关系边结合,边的权重默示检测到的关系实例的程序化计数。给定这么一个图,不错使用万般社区检测算法将图辩认为节点社区,这些节点之间的结合比图中其他节点之间的结合更强(举例,参见fortune -nato, 2010和Jin et al., 2021的走访)。在咱们的管说念中,咱们使用Leiden (Traag等东说念主,2019),因为它大约有用地规复大限度图的分层社区结构(图3)。该档次结构的每个级别都提供了一个社区分区,该分区以互斥的、集体详备的方式隐敝图的节点,从云尔毕分而治之的全局回想。2.5 Graph Communities → Community Summaries社区图→社区汇总The next step is to create report-like summaries of each community in the Leiden hierarchy, using a method designed to scale to very large datasets. These summaries are independently useful in their own right as a way to understand the global structure and semantics of the dataset, and may themselves be used to make sense of a corpus in the absence of a question. For example, a user may scan through community summaries at one level looking for general themes of interest, then follow links to the reports at the lower level that provide more details for each of the subtopics. Here, however, we focus on their utility as part of a graph-based index used for answering global queries.Community summaries are generated in the following way:下一步是使用一种旨在彭胀到相等大的数据集的方法,为Leiden档次结构中的每个社区创建肖似阐发的纲领。这些纲领动作贯通数据集的举座结构和语义的一种方式,它们自己是零丁有用的,何况不错在莫得问题的情况下用于贯通语料库。举例,用户不错浏览某一级别的社区纲领,寻找感风趣的一般主题,然后点击指向较初级别阐发的一语气,这些一语气为每个子主题提供了更多详慑服息。关联词,在这里,咱们讲理的是它们动作用于回报全局查询的基于图的索引的一部分的效率。社区纲领以以下方式生成:2.6 Community Summaries → Community Answers → Global Answer社区纲领→社区解答→全局解答Given a user query, the community summaries generated in the previous step can be used to generate a final answer in a multi-stage process. The hierarchical nature of the community structure also means that questions can be answered using the community summaries from different levels, raising the question of whether a particular level in the hierarchical community structure offers the best balance of summary detail and scope for general sensemaking questions (evaluated in section 3).给定一个用户查询,在前一步中生成的社区纲领可用于在多阶段经由中生成最终谜底。社区结构的档次性也意味着不错使用来自不同档次的社区纲领来回报问题,这就提议了这么一个问题:在档次化社区结构中,某个特定的档次是否提供了概要细节和一般性问题范围的最好均衡(在第3节中进行了评估)。For a given community level, the global answer to any user query is generated as follows:>> Prepare community summaries. Community summaries are randomly shuffled and divided into chunks of pre-specified token size. This ensures relevant information is distributed across chunks, rather than concentrated (and potentially lost) in a single context window.>> Map community answers. Generate intermediate answers in parallel, one for each chunk.The LLM is also asked to generate a score between 0-100 indicating how helpful the gen-erated answer is in answering the target question. Answers with score 0 are filtered out.>> Reduce to global answer. Intermediate community answers are sorted in descending order of helpfulness score and iteratively added into a new context window until the token limit is reached. This final context is used to generate the global answer returned to the user.对于给定的社区级别,生成任何用户查询的全局谜底如下:>>准备社区回想。社区纲领被或然打乱并分红事前指定的令牌大小的块。这确保了干系信息散布在各个块之间,而不是蚁集在单个高下文窗口中(何况可能丢失)。>>舆图社区谜底。并行生成中间谜底,每个块一个。LLMs还被要求生成一个0-100分之间的分数,标明生成的谜底对回报主义问题的匡助进程。得分为0的谜底将被过滤掉。>>减少到全局谜底。中间社区谜底按有用性分数降序排序,并迭代地添加到新的高下文窗口中,直到达到令牌适度。临了一个高下文用于生成返回给用户的全局谜底。3 Evaluation评估3.1 Datasets数据集We selected two datasets in the one million token range, each equivalent to about 10 novels of text and representative of the kind of corpora that users may encounter in their real world activities:>> Podcast transcripts. Compiled transcripts of podcast conversations between Kevin Scott, Microsoft CTO, and other technology leaders (Behind the Tech, Scott, 2024). Size: 1669 × 600-token text chunks, with 100-token overlaps between chunks (∼1 million tokens).>> News articles. Benchmark dataset comprising news articles published from September 2013 to December 2023 in a range of categories, including entertainment, business, sports, technology, health, and science (MultiHop-RAG; Tang and Yang, 2024). Size: 3197 × 600-token text chunks, with 100-token overlaps between chunks (∼1.7 million tokens).咱们在100万个令牌范围内礼聘了两个数据集,每个数据集止境于大致10本演义的文本,代表了用户在现实寰宇行动中可能遭受的语料库类型:>>播客文本。汇编了微软首席时期官凯文·斯科特与其他时期魁首之间的播客对话记载(《科技背后》,斯科特,2024年)。大小:1669 × 600个令牌文本块,块之间有100个令牌叠加(约100万个令牌)。新闻著作。基准数据集包括从2013年9月到2023年12月在一系列类别中发布的新闻著作,包括文娱,交易,体育,时期,健康和科学(MultiHop-RAG;Tang and Yang, 2024)。大小:3197 × 600个令牌文本块,块之间有100个令牌叠加(约170万个令牌)。3.2 Queries查询Many benchmark datasets for open-domain question answering exist, including HotPotQA (Yang et al., 2018), MultiHop-RAG (Tang and Yang, 2024), and MT-Bench (Zheng et al., 2024). However, the associated question sets target explicit fact retrieval rather than summarization for the purpose of data sensemaking, i.e., the process though which people inspect, engage with, and contextualize data within the broader scope of real-world activities (Koesten et al., 2021). Similarly, methods for extracting latent summarization queries from source texts also exist (Xu and Lapata, 2021), but such extracted questions can target details that betray prior knowledge of the texts.目下存在好多盛开域问答的基准数据集,包括HotPotQA (Yang等东说念主,2018)、MultiHop-RAG (Tang和Yang, 2024)和MT-Bench (Zheng等东说念主,2024)。关联词,干系的问题集以明确的事实检索为主义,而不所以数据语义为目的的回想,即东说念主们在更等闲的现实寰宇行动范围内查验、参与和情境化数据的过程(Koesten et al., 2021)。相似,从源文本中索要潜在纲领查询的方法也存在(Xu和Lapata, 2021),但这些索要的问题可能针对叛变文本先验学问的细节。To evaluate the effectiveness of RAG systems for more global sensemaking tasks, we need questions that convey only a high-level understanding of dataset contents, and not the details of specific texts. We used an activity-centered approach to automate the generation of such questions: given a short description of a dataset, we asked the LLM to identify N potential users and N tasks per user, then for each (user, task) combination, we asked the LLM to generate N questions that require understanding of the entire corpus. For our evaluation, a value of N = 5 resulted in 125 test questions per dataset. Table 1 shows example questions for each of the two evaluation datasets.为了评估RAG系统在更多全局酷好生成任务中的有用性,咱们需要的问题只传达对数据集内容的高等次贯通,而不是特定文本的细节。咱们使用以行动为中心的方法来自动生成此类问题:给定数据集的不祥容颜,咱们要求LLM识别N个潜在用户和每个用户的N个任务,然后对于每个(用户,任务)组合,咱们要求LLM生成N个需要贯通通盘语料库的问题。对于咱们的评估,N = 5的值导致每个数据集有125个测试问题。表1披露了两个评估数据集的示例问题。3.3 Conditions条目We compare six different conditions in our analysis, including Graph RAG using four levels of graph communities (C0, C1, C2, C3), a text summarization method applying our map-reduce approach directly to source texts (TS), and a na¨ıve “semantic search” RAG approach (SS):在咱们的分析中,咱们比较了六种不同的情况,包括使用四个级别的图社区(C0, C1, C2, C3)的Graph RAG,径直应用咱们的map-reduce方法到源文本的文本纲领方法(TS),以及na¨ıve“语义搜索”RAG方法(SS)。3.4 Metrics方针LLMs have been shown to be good evaluators of natural language generation, achieving state-of-the-art or competitive results compared against human judgements (Wang et al., 2023a; Zheng et al., 2024). While this approach can generate reference-based metrics when gold standard answers are known, it is also capable of measuring the qualities of generated texts (e.g., fluency) in a reference-free style (Wang et al., 2023a) as well as in head-to-head comparison of competing outputs (LLM-as-a-judge, Zheng et al., 2024). LLMs have also shown promise at evaluating the performance of conventional RAG systems, automatically evaluating qualities like context relevance, faithfulness, and answer relevance (RAGAS, Es et al., 2023).LLMs已被阐扬是当然讲话生成的邃密评估者,与东说念主类判断比拟,取得了开始进或具有竞争力的结束(Wang等东说念主,2023a;郑等东说念主,2024)。诚然这种方法不错在黄金圭臬谜底已知的情况下生成基于参考的方针,但它也大约以无参考的方式(Wang等东说念主,2023a)测量生成文本的质地(举例通顺性),以及对竞争输出进行正面比较(LLM-as-a-judge, Zheng等东说念主,2024)。LLMs在评估传统RAG系统的性能方面也阐扬出了但愿,自动评估高下文干系性、确切度和谜底干系性等质地(RAGAS, Es等东说念主,2023)。3.6 Results结束The indexing process resulted in a graph consisting of 8564 nodes and 20691 edges for the Podcast dataset, and a larger graph of 15754 nodes and 19520 edges for the News dataset. Table 3 shows the number of community summaries at different levels of each graph community hierarchy.Global approaches vs. 神圣的RAG. As shown in Figure 4, global approaches consistently out-performed the 神圣的RAG (SS) approach in both comprehensiveness and diversity metrics across datasets. Specifically, global approaches achieved comprehensiveness win rates between 72-83%for Podcast transcripts and 72-80% for News articles, while diversity win rates ranged from 75-82%and 62-71% respectively. Our use of directness as a validity test also achieved the expected results, e., that 神圣的RAG produces the most direct responses across all comparisons.索引过程的结束是Podcast数据集的图由8564个节点和20691条边构成,News数据集的图由15754个节点和19520条边构成。表3披露了每个图社区档次结构中不同级别的社区纲领数目。大家方法vs. ıve RAG。如图4所示,在数据集的全面性和万般性方针方面,全局方法长久优于神圣的RAG (SS)方法。具体而言,大家方法在播客文本和新闻著作上的空洞胜率分别为72-83%和72-80%,而万般性胜率分别为75-82%和62-71%。咱们使用径直性动作效度测试也达到了预期的结束,即神圣的RAG在系数比较中产生最径直的反应。Community summaries vs. source texts. When comparing community summaries to source texts using Graph RAG, community summaries generally provided a small but consistent improvement in answer comprehensiveness and diversity, except for root-level summaries. Intermediate-level summaries in the Podcast dataset and low-level community summaries in the News dataset achieved comprehensiveness win rates of 57% and 64%, respectively. Diversity win rates were 57% for Podcast intermediate-level summaries and 60% for News low-level community summaries. Table 3 also illustrates the scalability advantages of Graph RAG compared to source text summarization: for low-level community summaries (C3), Graph RAG required 26-33% fewer context tokens, while for root-level community summaries (C0), it required over 97% fewer tokens. For a modest drop in performance compared with other global methods, root-level Graph RAG offers a highly efficient method for the iterative question answering that characterizes sensemaking activity, while retaining advantages in comprehensiveness (72% win rate) and diversity (62% win rate) over 神圣的RAG.社区纲领vs.源文本。当使用Graph RAG将社区纲领与源文本进行比较时,除了根级纲领外,社区纲领频繁在谜底的全面性和万般性方面提供了小而一致的改良。Podcast数据蚁集的中级纲领和News数据蚁集的初级社区纲领的空洞胜率分别为57%和64%。播客中级回想的万般性胜率为57%,新闻初级社区回想的万般性胜率为60%。表3还证实了与源文本纲领比拟,Graph RAG的可伸缩性上风:对于初级社区纲领(C3), Graph RAG需要的高下文令牌减少了26-33%,而对于根级社区纲领(C0),它需要的令牌减少了97%以上。与其他全局方法比拟,在性能上略有着落的情况下,根级图RAG提供了一种高效的迭代问题回报方法,该方法具有酷好生成行动的特征,同期保留了比神圣的RAG在全面性(72%胜率)和万般性(62%胜率)方面的上风。Empowerment. Empowerment comparisons showed mixed results for both global approaches versus 神圣的RAG (SS) and Graph RAG approaches versus source text summarization (TS). Ad-hoc LLM use to analyze LLM reasoning for this measure indicated that the ability to provide specific exam-ples, quotes, and citations was judged to be key to helping users reach an informed understanding. Tuning element extraction prompts may help to retain more of these details in the Graph RAG index.赋权。授权比较披露,全局方法与神圣的RAG (SS)和图形RAG方法与源文本纲领(TS)的结束不同。特地使用LLMs来分析LLMs对这一度量的推理标明,提供具体示例、援用和援用的才智被觉得是匡助用户获取知情贯通的瑕疵。调优元素索要指示可能有助于在Graph RAG索引中保留更多这些细节。4 Related Work干系职责4.1 RAG Approaches and Systems方法和系统When using LLMs, RAG involves first retrieving relevant information from external data sources, then adding this information to the context window of the LLM along with the original query (Ram et al., 2023). 神圣的RAG approaches (Gao et al., 2023) do this by converting documents to text, splitting text into chunks, and embedding these chunks into a vector space in which similar positions represent similar semantics. Queries are then embedded into the same vector space, with the text chunks of the nearest k vectors used as context. More advanced variations exist, but all solve the problem of what to do when an external dataset of interest exceeds the LLM’s context window.当使用LLM时,RAG领先波及从外部数据源检索干系信息,然后将此信息与原始查询一皆添加到LLM的高下文窗口(Ram等东说念主,2023)。神圣的RAG方法(Gao et al., 2023)通过将文档转换为文本,将文分内割成块,并将这些块镶嵌到向量空间中,其中相似的位置默示相似的语义来已毕这一丝。然后将查询镶嵌到疏浚的向量空间中,使用最近k个向量的文本块动作高下文。存在更高级的变体,但都科罚了当感风趣的外部数据集超出LLM的高下文窗口时该若何办的问题。Advanced RAG systems include pre-retrieval, retrieval, post-retrieval strategies designed to over-come the drawbacks of 神圣的RAG, while Modular RAG systems include patterns for iterative and dynamic cycles of interleaved retrieval and generation (Gao et al., 2023). Our implementation of Graph RAG incorporates multiple concepts related to other systems. For example, our community summaries are a kind of self-memory (Selfmem, Cheng et al., 2024) for generation-augmented re-trieval (GAR, Mao et al., 2020) that facilitates future generation cycles, while our parallel generation of community answers from these summaries is a kind of iterative (Iter-RetGen, Shao et al., 2023) or federated (FeB4RAG, Wang et al., 2024) retrieval-generation strategy. Other systems have also combined these concepts for multi-document summarization (CAiRE-COVID, Su et al., 2020) and multi-hop question answering (ITRG, Feng et al., 2023; IR-CoT, Trivedi et al., 2022; DSP, Khattab et al., 2022). Our use of a hierarchical index and summarization also bears resemblance to further approaches, such as generating a hierarchical index of text chunks by clustering the vectors of text embeddings (RAPTOR, Sarthi et al., 2024) or generating a “tree of clarifications” to answer mul-tiple interpretations of ambiguous questions (Kim et al., 2023). However, none of these iterative or hierarchical approaches use the kind of self-generated graph index that enables Graph RAG.先进的RAG系统包括预检索、检索和后检索计谋,旨在克服神圣的RAG的缺欠,而模块化RAG系统包括交错检索和生成的迭代和动态轮回模式(Gao等东说念主,2023)。咱们对Graph RAG的已毕包含了与其他系统干系的多个倡导。举例,咱们的社区纲如果一种自我纪念(Selfmem, Cheng等东说念主,2024),用于世代增强检索(GAR, Mao等东说念主,2020),成心于将来的世代轮回,而咱们从这些纲领中并行生成社区谜底是一种迭代(ter- retgen, Shao等东说念主,2023)或络续(FeB4RAG, Wang等东说念主,2024)检索生成计谋。其他系统也将这些倡导结合起来用于多文档纲领(cire - covid, Su等东说念主,2020)和多跳问答(ITRG, Feng等东说念主,2023;IR-CoT, Trivedi等,2022;DSP, Khattab et al., 2022)。咱们对档次索引和纲领的使用也与进一步的方法相似,举例通过聚类文本镶嵌向量来生成文本块的档次索引(RAPTOR, Sarthi等东说念主,2024)或生成“暴露树”来回报对歧义问题的多种解释(Kim等东说念主,2023)。关联词,这些迭代或分层方法都莫得使用支柱graph RAG的自生成图索引。4.2 Graphs and LLMs图与LLMsUse of graphs in connection with LLMs and RAG is a developing research area, with multiple directions already established. These include using LLMs for knowledge graph creation (Tra-janoska et al., 2023) and completion (Yao et al., 2023), as well as for the extraction of causal graphs (Ban et al., 2023; Zhang et al., 2024) from source texts. They also include forms of ad-vanced RAG (Gao et al., 2023) where the index is a knowledge graph (KAPING, Baek et al., 2023), where subsets of the graph structure (G-Retriever, He et al., 2024) or derived graph metrics (Graph-ToolFormer, Zhang, 2023) are the objects of enquiry, where narrative outputs are strongly grounded in the facts of retrieved subgraphs (SURGE, Kang et al., 2023), where retrieved event-plot sub-graphs are serialized using narrative templates (FABULA, Ranade and Joshi, 2023), and where the system supports both creation and traversal of text-relationship graphs for multi-hop question an-swering (Wang et al., 2023b). In terms of open-source software, a variety a graph databases are supported by both the LangChain (LangChain, 2024) and LlamaIndex (LlamaIndex, 2024) libraries, while a more general class of graph-based RAG applications is also emerging, including systems that can create and reason over knowledge graphs in both Neo4J (NaLLM, Neo4J, 2024) and Nebula-Graph (GraphRAG, NebulaGraph, 2024) formats. Unlike our Graph RAG approach, however, none of these systems use the natural modularity of graphs to partition data for global summarization.在LLMs和RAG中使用图形是一个发展中的研究范围,仍是建树了多个场所。其中包括使用LLMs创建学问图谱(Tra-janoska等东说念主,2023)和完成学问图谱(Yao等东说念主,2023),以及索要因果图(Ban等东说念主,2023;Zhang et al., 2024)。它们还包括高级RAG的模样(Gao等东说念主,2023),其中索引是一个学问图(KAPING, Baek等东说念主,2023),其中图结构的子集(g - retriver, He等东说念主,2024)或派生的图度量(graph - toolformer, Zhang, 2023)是查询对象,其中叙事输出浓烈地基于检索子图的事实(SURGE, Kang等东说念主,2023),其中检索的事件情节子图使用叙事模板(FABULA, Ranade和Joshi)序列化。2023),其中系统支柱多跳问答的文本关系图的创建和遍历(Wang et al., 2023b)。在开源软件方面,LangChain (LangChain, 2024)和LlamaIndex (LlamaIndex, 2024)库都支柱多种图形数据库,而更通用的基于图形的RAG应用门径也正在兴起,包括不错在Neo4J (NaLLM, Neo4J, 2024)和星云图(GraphRAG,星云图,2024)花式下创建和推理学问图的系统。关联词,与咱们的Graph RAG方法不同,这些系统都莫得使用图的当然模块化来辩认数据以进行全局汇总。5 Discussion辩论Limitations of evaluation approach. Our evaluation to date has only examined a certain class of sensemaking questions for two corpora in the region of 1 million tokens. More work is needed to understand how performance varies across different ranges of question types, data types, and dataset sizes, as well as to validate our sensemaking questions and target metrics with end users. Comparison of fabrication rates, e.g., using approaches like SelfCheckGPT (Manakul et al., 2023), would also improve on the current analysis.评价方法的局限性。到目下为止,咱们的评估仅查验了100万个令牌区域中两个语料库的某类语义问题。需要作念更多的职责来了解不同问题类型、数据类型和数据集大小范围的性能变化,以及与最终用户考据咱们的语义问题和主义方针。比较伪造率,举例,使用SelfCheckGPT (Manakul等东说念主,2023)等方法,也将改良现时的分析。Trade-offs of building a graph index. We consistently observed Graph RAG achieve the best head-to-head results against other methods, but in many cases the graph-free approach to global summa-rization of source texts performed competitively. The real-world decision about whether to invest in building a graph index depends on multiple factors, including the compute budget, expected number of lifetime queries per dataset, and value obtained from other aspects of the graph index (including the generic community summaries and the use of other graph-related RAG approaches).构建图表索引的好坏衡量。咱们一直不雅察到,Graph RAG与其他方法比拟获取了最好的正面结束,但在许厚情况下,无图方法对源文本进行全局汇总具有竞争性。对于是否投资构建图索引的试验决策取决于多个要素,包括辩论预算、每个数据集的预期人命周期查询数目,以及从图索引的其他方面获取的值(包括通用社区纲领和其他与图干系的RAG方法的使用)。Future work. The graph index, rich text annotations, and hierarchical community structure support-ing the current Graph RAG approach offer many possibilities for refinement and adaptation. This includes RAG approaches that operate in a more local manner, via embedding-based matching of user queries and graph annotations, as well as the possibility of hybrid RAG schemes that combine embedding-based matching against community reports before employing our map-reduce summa-rization mechanisms. This “roll-up” operation could also be extended across more levels of the community hierarchy, as well as implemented as a more exploratory “drill down” mechanism that follows the information scent contained in higher-level community summaries.将来的职责。支柱现时graph RAG方法的图索引、富文本疑望和分层社区结构为改良和调治提供了好多可能性。这包括以更局部的方式操作的RAG方法,通过基于镶嵌的用户查询匹配和图形疑望,以及夹杂RAG决策的可能性,该决策在使用咱们的map-reduce汇总机制之前将基于镶嵌的匹配与社区阐发结合起来。这种“上卷”操作还不错彭胀到社区档次结构的更多级别,也不错动作一种更具探索性的“下钻”机制来已毕,该机制解任更高级别社区纲领中包含的信息偃息。6 Conclusion论断We have presented a global approach to Graph RAG, combining knowledge graph generation, retrieval-augmented generation (RAG), and query-focused summarization (QFS) to support human sensemaking over entire text corpora. Initial evaluations show substantial improvements over a 神圣的RAG baseline for both the comprehensiveness and diversity of answers, as well as favorable comparisons to a global but graph-free approach using map-reduce source text summarization. For situations requiring many global queries over the same dataset, summaries of root-level communi-ties in the entity-based graph index provide a data index that is both superior to 神圣的RAG and achieves competitive performance to other global methods at a fraction of the token cost.咱们提议了一种全局的Graph RAG方法,将学问图生成、检索增强生成(RAG)和以查询为中心的查询聚焦纲领(QFS)相结合,以支柱东说念主类对通盘文本语料库的酷好贯通。初步评估标明,在谜底的全面性和万般性方面,相较于神圣的RAG基线有实质性的改良,何况与使用映射-简化源文本纲领的全局但无图方法比拟也披袒露成心的比较。对于需要针对灭亡数据集进行好多全局查询的情况,基于实体的图索引中的根级社区纲领提供了一个优于神圣RAG的数据索引,何况以较小的令牌资本已毕了与其他全局方法相比好意思的性能。An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.一个开源的、基于python的全局和局部Graph RAG方法的已毕行将在https://aka.ms/graphrag上已毕。 本站仅提供存储处事,系数内容均由用户发布,如发现存害或侵权内容,请点击举报。热点资讯