Poster Session 2 · Wednesday, December 3, 2025 4:30 PM → 7:30 PM
#5414
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Abstract
Recent interest in building foundation models for knowledge graphs has highlighteda fundamental challenge: knowledge graph data is scarce. The best-known knowl-edge graphs are primarily human-labeled, created by pattern-matching, or extractedusing early NLP techniques. While human-generated knowledge graphs are inshort supply, automatically extracted ones are of questionable quality.
We presentKGGen, a novel text-to-knowledge-graph generator that uses language models toextract high-quality graphs from plain text with a novel entity resolution approachthat clusters related entities, significantly reducing the sparsity problem that plaguesexisting extractors. Unlike other KG generators, KGGen clusters and de-duplicatesrelated entities to reduce sparsity in extracted KGs.
Along with KGGen, we releaseMeasure of Information in Nodes and Edges (MINE), the first benchmark to test anextractor’s ability to produce a useful KG from plain text. We benchmark our newtool against leading existing generators such as Microsoft’s GraphRAG; we achievecomparable retrieval accuracy on the generated graphs and better information re-tention. Moreover, our graphs exhibit more concise and generalizable entities andrelations.
Our code is open-sourced at
https://github.com/stair-lab/kg-gen/.