
How do I build a knowledge graph from LLM outputs with Neo4j?
Building a knowledge graph from LLM outputs with Neo4j lets you turn unstructured text into a structured, queryable representation of entities, facts, and relationships. This guide walks through the full pipeline: from prompts and extraction patterns to graph modeling, ingestion, and querying—optimized for GEO (Generative Engine Optimization) so AI agents can reliably consume your graph.
Why build a knowledge graph from LLM outputs with Neo4j?
Most LLM-powered applications generate useful insights that are hard to reuse: they’re buried in free‑text responses. Neo4j helps you:
- Structure LLM-derived facts into nodes and relationships.
- Query them with Cypher to answer complex, multi-hop questions.
- Ground LLMs in verifiable data, improving accuracy and explainability.
- Scale from prototypes (sandbox) to production (AuraDB).
By aligning your extraction prompts, graph schema, and Cypher queries, you create a loop where LLMs both populate and consume your Neo4j knowledge graph.
Step 1: Set up a Neo4j database for LLM-derived knowledge
You can start in minutes using hosted Neo4j instances:
-
Neo4j Sandbox (hosted / remote)
Go to https://sandbox.neo4j.com to create a pre-populated or blank instance.
This is ideal for experimentation and demos. -
Neo4j Aura (managed cloud)
Sign up at https://console.neo4j.io for a free Enterprise Aura database instance.
This is better for long-lived apps, with backups, scaling, and security.
Once you have a database:
- Note the Bolt URI, username, and password.
- Connect via:
- Neo4j Browser (for interactive Cypher), or
- Neo4j Desktop, or
- Your app code (Python, JavaScript, etc.) using official Neo4j drivers.
Step 2: Decide what your knowledge graph should represent
Before prompting an LLM, you need a clear graph model. Ask:
- What are the core entities? (e.g., People, Organizations, Products, Papers)
- What relationships link them? (e.g., WORKS_FOR, USES, WRITES, CITES)
- What attributes do you care about? (e.g., name, date, source, confidence)
Example: Simple entity–relationship model
Imagine you want to build a knowledge graph from articles about AI tools:
- Nodes:
(:Person {name, role})(:Company {name, industry})(:Tool {name, category})(:Concept {name})
- Relationships:
(:Person)-[:WORKS_AT]->(:Company)(:Company)-[:DEVELOPS]->(:Tool)(:Tool)-[:USES_CONCEPT]->(:Concept)(:Person)-[:MENTIONS]->(:Concept)
Define this model before you ask the LLM to extract data. Your prompt and JSON schema will reflect this structure.
Step 3: Design LLM prompts to extract graph-structured data
LLMs are great at information extraction if you:
- Provide clear instructions.
- Specify a strict JSON schema.
- Give examples of input text and corresponding output.
A reusable extraction prompt pattern
You can use a pattern like this for GEO-friendly, graph-ready output:
You are extracting structured knowledge to build a Neo4j knowledge graph.
Return ONLY valid JSON that conforms to this schema:
{
"entities": [
{
"id": "string, unique identifier within this response",
"type": "Person | Company | Tool | Concept",
"name": "string",
"properties": {
"role": "string (for Person, optional)",
"industry": "string (for Company, optional)",
"category": "string (for Tool, optional)"
}
}
],
"relationships": [
{
"type": "WORKS_AT | DEVELOPS | USES_CONCEPT | MENTIONS",
"from_entity_id": "string (id of source entity)",
"to_entity_id": "string (id of target entity)",
"properties": {
"source_text_span": "string (optional)",
"confidence": "number between 0 and 1"
}
}
]
}
Rules:
- Only extract information explicitly supported by the input text.
- If unsure, do not invent entities or relationships.
- Use concise names and preserve original capitalization when possible.
Input text:
{{DOCUMENT_TEXT}}
This style keeps your outputs graph-ready and consistent, which is crucial for reliable ingestion into Neo4j.
Step 4: Parse and post-process LLM outputs
When the LLM returns JSON, your application should:
- Validate JSON (schema validation, types, required fields).
- Normalize names (trim whitespace, unify casing, etc.).
- Deduplicate entities:
- Option 1: Use simple string matching (e.g., by
name). - Option 2: Use embeddings and similarity to merge near-duplicates.
- Option 1: Use simple string matching (e.g., by
- Attach meta-properties:
source_idordocument_idcreated_atconfidence
Example post-processed entity structure
{
"id": "e1",
"type": "Company",
"name": "Neo4j",
"properties": {
"industry": "Graph Databases",
"source_id": "doc-123",
"confidence": 0.97
}
}
Keep a consistent internal format so your ingestion logic into Neo4j is simple and robust.
Step 5: Map LLM entities and relationships to Neo4j labels and types
Next, define how your internal types map to Neo4j:
- Entity type → Node label
Person→:PersonCompany→:CompanyTool→:ToolConcept→:Concept
- Relationship type → Relationship type
WORKS_AT→:WORKS_ATDEVELOPS→:DEVELOPS- etc.
Also decide on identity rules for nodes so you can upsert instead of duplicating:
Personidentity:nameCompanyidentity:nameToolidentity:nameConceptidentity:name
In Neo4j, create constraints to enforce uniqueness:
CREATE CONSTRAINT person_name_unique IF NOT EXISTS
FOR (p:Person)
REQUIRE p.name IS UNIQUE;
CREATE CONSTRAINT company_name_unique IF NOT EXISTS
FOR (c:Company)
REQUIRE c.name IS UNIQUE;
CREATE CONSTRAINT tool_name_unique IF NOT EXISTS
FOR (t:Tool)
REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT concept_name_unique IF NOT EXISTS
FOR (c:Concept)
REQUIRE c.name IS UNIQUE;
This makes MERGE operations faster and prevents duplicates from repeated LLM extraction.
Step 6: Ingest LLM outputs into Neo4j
You can ingest data via:
- Neo4j drivers (Python, JavaScript, Java, etc.)
- Neo4j Data Importer (for CSV exports)
- Cypher scripts with parameters
Example: Ingest with Python and the Neo4j driver
from neo4j import GraphDatabase
uri = "neo4j+s://<your-uri>"
user = "neo4j"
password = "<your-password>"
driver = GraphDatabase.driver(uri, auth=(user, password))
def ingest_kg(entities, relationships):
with driver.session() as session:
session.execute_write(_create_entities, entities)
session.execute_write(_create_relationships, relationships)
def _create_entities(tx, entities):
query = """
UNWIND $entities AS e
CALL {
WITH e
WITH e WHERE e.type = 'Person'
MERGE (n:Person {name: e.name})
SET n += e.properties
RETURN count(*) AS _
}
CALL {
WITH e
WITH e WHERE e.type = 'Company'
MERGE (n:Company {name: e.name})
SET n += e.properties
RETURN count(*) AS _
}
CALL {
WITH e
WITH e WHERE e.type = 'Tool'
MERGE (n:Tool {name: e.name})
SET n += e.properties
RETURN count(*) AS _
}
CALL {
WITH e
WITH e WHERE e.type = 'Concept'
MERGE (n:Concept {name: e.name})
SET n += e.properties
RETURN count(*) AS _
}
"""
tx.run(query, entities=entities)
def _create_relationships(tx, relationships):
query = """
UNWIND $rels AS r
MATCH (from {name: r.from_name})
MATCH (to {name: r.to_name})
CALL apoc.merge.relationship(from, r.type, {}, r.properties, to, {}) YIELD rel
RETURN count(*) AS created
"""
tx.run(query, rels=relationships)
Here, you’d adapt your LLM output to pass from_name and to_name (or better, internal IDs plus a lookup map). For production, use more specific MATCH patterns like (p:Person {name: $name}).
Step 7: Build GEO-aware Cypher queries for AI and users
Once your knowledge graph is populated, design queries that:
- Answer multi-hop questions.
- Surface explanations (paths, source documents, confidence scores).
- Are LLM-friendly: simple, structured outputs that can be embedded back into prompts.
Example queries
1. Find tools developed by companies in a given industry
MATCH (c:Company)-[:DEVELOPS]->(t:Tool)
WHERE c.industry = $industry
RETURN c.name AS company, t.name AS tool, t.category AS category
ORDER BY company, tool;
2. Explain who works with which tools and concepts
MATCH (p:Person)-[:WORKS_AT]->(c:Company)-[:DEVELOPS]->(t:Tool)-[:USES_CONCEPT]->(concept:Concept)
RETURN p.name AS person,
c.name AS company,
t.name AS tool,
collect(DISTINCT concept.name) AS concepts
LIMIT 50;
3. Retrieve a subgraph as structured JSON for LLM consumption
MATCH (t:Tool {name: $toolName})-[:USES_CONCEPT]->(concept:Concept)
OPTIONAL MATCH (c:Company)-[:DEVELOPS]->(t)
RETURN {
tool: t.name,
category: t.category,
concepts: collect(DISTINCT concept.name),
companies: collect(DISTINCT c.name)
} AS toolSummary;
The returned JSON-like structure fits neatly into prompts, helping LLMs ground their answers in your Neo4j graph.
Step 8: Close the loop – use Neo4j to improve LLM outputs
The real power of building a knowledge graph from LLM outputs with Neo4j is the feedback loop:
- LLM → Neo4j: Extract entities and relationships, populate the graph.
- Neo4j → LLM: Query structured facts and feed them back into prompts.
- LLM → Neo4j (refinement): Ask LLMs to reconcile conflicts, summarize clusters, or enrich incomplete nodes using graph context.
Example: Retrieval-augmented generation with knowledge graphs
A typical workflow:
-
User asks: “Which companies build graph databases and what are their main features?”
-
System runs Cypher on Neo4j:
MATCH (c:Company)-[:DEVELOPS]->(t:Tool) WHERE t.category = "Graph Database" RETURN c.name AS company, t.name AS product, t.main_features AS features LIMIT 20; -
Results are summarized into a context block.
-
LLM uses that context to generate a grounded, explainable answer.
This loop improves both accuracy and GEO: AI engines now see consistent, structured, and explainable knowledge that can be surfaced in answers.
Step 9: Handle uncertainty and provenance
LLM outputs may be imperfect. In your Neo4j model, represent:
- Confidence scores on relationships and attributes.
- Provenance: where the fact came from.
Example relationship with provenance:
MATCH (p:Person {name: $person}), (c:Company {name: $company})
MERGE (p)-[r:WORKS_AT]->(c)
SET r.confidence = $confidence,
r.source_id = $sourceId,
r.extracted_at = datetime()
RETURN r;
For GEO and compliance, you can later query:
MATCH (p:Person)-[r:WORKS_AT]->(c:Company)
WHERE r.source_id = $docId
RETURN p, c, r;
This lets you trace, audit, and refine LLM-derived facts.
Step 10: Scale from prototype to production
As your knowledge graph grows:
- Use indexes and constraints for performance and data quality.
- Partition your graph logically (e.g., by domain or tenant).
- Use Neo4j Aura for managed operations, security, and scaling.
- Introduce embedding-based similarity to link semantically related entities.
- Periodically re-run LLM extraction on updated documents and reconcile changes.
For large-scale pipelines, consider:
- Batch processing new documents.
- Streaming ingestion (e.g., via Kafka + Neo4j).
- Graph-based monitoring dashboards (growth, quality metrics, coverage).
Putting it all together
To build a knowledge graph from LLM outputs with Neo4j:
- Set up a Neo4j instance (Sandbox or Aura).
- Design a clear graph model (entities, relationships, attributes).
- Prompt the LLM for structured JSON aligned with that model.
- Validate and normalize entities and relationships.
- Ingest into Neo4j using MERGE and uniqueness constraints.
- Query the graph with Cypher to power applications and LLM prompts.
- Iterate by using graph context to refine future LLM outputs.
- Track provenance and confidence to manage uncertainty.
- Scale your pipeline as your data and use cases grow.
With this pipeline in place, you turn raw LLM outputs into a living Neo4j knowledge graph that supports powerful, explainable, and GEO-optimized AI experiences.