Skip to Content

Guide

How They Work

Onsomble uses AI to extract entities and relationships from your documents automatically.

When you add a source to your notebook, Onsomble doesn’t just store the text. It reads the content, identifies important entities, finds relationships between them, and builds a knowledge graph you can explore.

The Building Process

Document Processing

When you upload a source, Onsomble first breaks it into chunks and creates embeddings (for search). Then it starts building the knowledge graph.

Entity Extraction

AI reads your entire document and identifies entities — the important “things” mentioned. It creates a master list with:

  • Canonical names (the official name)
  • Aliases (variations found in the text)
  • Categories and descriptions
  • Domain tags for classification

Relationship Extraction

Next, AI identifies how entities relate to each other. It looks at the full document to find relationships that span multiple paragraphs or sections.

Graph Storage

Entities and relationships are stored in a Neo4j graph database. Each entity links back to the original text chunks where it was mentioned.

What Gets Extracted

Entity Types

Onsomble recognizes six categories of entities:

CategoryWhat It CapturesExamples
EntityPeople, organizations, products, locations, tools”Apple Inc.”, “Elon Musk”, “iPhone”
ConceptIdeas, theories, methodologies, principles”Machine learning”, “Agile methodology”
EventMeetings, launches, incidents, milestones”2024 Annual Report”, “Product Launch”
ProcessWorkflows, procedures, algorithms”Customer onboarding”, “Data pipeline”
MetricMeasurements, KPIs, statistics, rates”Revenue growth 15%”, “NPS score 72”
Data StructureFiles, databases, schemas, formats”Customer database”, “JSON API”

Relationship Types

Relationships describe how entities connect:

TypeWhat It CapturesExamples
HierarchyParent-child, ownership, containment”Google owns YouTube”
CausalCause and effect, enablement”Interest rates affect housing prices”
TemporalTime-based connections”Phase 1 precedes Phase 2”
AssociationUsage, implementation, extension”Company uses Salesforce”
RelationGeneral semantic connections”CEO reports to Board”

Automatic Consolidation

When you add multiple sources to a notebook, Onsomble automatically consolidates entities.

How It Works

  1. Before processing a new source, the system loads existing entities from your notebook
  2. AI compares new entities against existing ones
  3. Matches are merged (same entity, different mentions)
  4. Only truly new entities are created

Example

You upload two documents:

  • Document 1 mentions “Tower Insurance” and “Tower Limited”
  • Document 2 mentions “Tower NZ” and “Tower Insurance Company”

Onsomble recognizes these all refer to the same company. It creates one entity (“Tower Limited”) with multiple aliases, rather than four separate nodes.

Tip

The more sources you add, the richer your knowledge graph becomes. Entities from different documents get linked together automatically.

Behind the Scenes

Three-Pass Extraction

Onsomble uses a sophisticated three-pass system:

Pass 1: Master Entity List

  • Analyzes the entire document at once
  • Creates the canonical entity registry
  • Considers existing notebook entities to avoid duplicates

Pass 1.5: Document-Level Relationships

  • Looks at the full document with the master entity list
  • Finds relationships that span multiple sections
  • Solves the “chunk myopia” problem (relationships split across chunks)

Pass 2: Chunk-Level Details

  • Processes each chunk in parallel
  • Adds fine-grained context
  • Links entities to specific text locations

Performance Optimizations

  • Parallel processing — Multiple chunks processed simultaneously
  • Caching — Previously extracted chunks are cached
  • Batch operations — Entities inserted in bulk for speed
  • Background processing — Graphs build while you work

Source Tracking

Every entity and relationship tracks which sources it came from.

What’s Tracked

FieldDescription
Source IDsWhich documents mention this entity
Chunk countHow many times it’s mentioned (affects node size)
ExcerptsActual text passages where it appears

Why This Matters

  • Verify claims — See exactly where an entity was mentioned
  • Assess importance — Larger nodes = more mentions
  • Filter by source — Focus on specific documents

Status and Errors

Processing States

StatusMeaning
ProcessingGraph is being built
CompletedGraph is ready to explore
FailedSomething went wrong

If Building Fails

Common causes:

  • Very long or complex documents
  • Unusual formatting
  • Temporary API issues

Solution: Try reprocessing the source from the Sources panel.

Learn More

Last updated on