Guide
How They Work
Onsomble uses AI to extract entities and relationships from your documents automatically.
When you add a source to your notebook, Onsomble doesn’t just store the text. It reads the content, identifies important entities, finds relationships between them, and builds a knowledge graph you can explore.
The Building Process
Document Processing
When you upload a source, Onsomble first breaks it into chunks and creates embeddings (for search). Then it starts building the knowledge graph.
Entity Extraction
AI reads your entire document and identifies entities — the important “things” mentioned. It creates a master list with:
- Canonical names (the official name)
- Aliases (variations found in the text)
- Categories and descriptions
- Domain tags for classification
Relationship Extraction
Next, AI identifies how entities relate to each other. It looks at the full document to find relationships that span multiple paragraphs or sections.
Graph Storage
Entities and relationships are stored in a Neo4j graph database. Each entity links back to the original text chunks where it was mentioned.
What Gets Extracted
Entity Types
Onsomble recognizes six categories of entities:
| Category | What It Captures | Examples |
|---|---|---|
| Entity | People, organizations, products, locations, tools | ”Apple Inc.”, “Elon Musk”, “iPhone” |
| Concept | Ideas, theories, methodologies, principles | ”Machine learning”, “Agile methodology” |
| Event | Meetings, launches, incidents, milestones | ”2024 Annual Report”, “Product Launch” |
| Process | Workflows, procedures, algorithms | ”Customer onboarding”, “Data pipeline” |
| Metric | Measurements, KPIs, statistics, rates | ”Revenue growth 15%”, “NPS score 72” |
| Data Structure | Files, databases, schemas, formats | ”Customer database”, “JSON API” |
Relationship Types
Relationships describe how entities connect:
| Type | What It Captures | Examples |
|---|---|---|
| Hierarchy | Parent-child, ownership, containment | ”Google owns YouTube” |
| Causal | Cause and effect, enablement | ”Interest rates affect housing prices” |
| Temporal | Time-based connections | ”Phase 1 precedes Phase 2” |
| Association | Usage, implementation, extension | ”Company uses Salesforce” |
| Relation | General semantic connections | ”CEO reports to Board” |
Automatic Consolidation
When you add multiple sources to a notebook, Onsomble automatically consolidates entities.
How It Works
- Before processing a new source, the system loads existing entities from your notebook
- AI compares new entities against existing ones
- Matches are merged (same entity, different mentions)
- Only truly new entities are created
Example
You upload two documents:
- Document 1 mentions “Tower Insurance” and “Tower Limited”
- Document 2 mentions “Tower NZ” and “Tower Insurance Company”
Onsomble recognizes these all refer to the same company. It creates one entity (“Tower Limited”) with multiple aliases, rather than four separate nodes.
The more sources you add, the richer your knowledge graph becomes. Entities from different documents get linked together automatically.
Behind the Scenes
Three-Pass Extraction
Onsomble uses a sophisticated three-pass system:
Pass 1: Master Entity List
- Analyzes the entire document at once
- Creates the canonical entity registry
- Considers existing notebook entities to avoid duplicates
Pass 1.5: Document-Level Relationships
- Looks at the full document with the master entity list
- Finds relationships that span multiple sections
- Solves the “chunk myopia” problem (relationships split across chunks)
Pass 2: Chunk-Level Details
- Processes each chunk in parallel
- Adds fine-grained context
- Links entities to specific text locations
Performance Optimizations
- Parallel processing — Multiple chunks processed simultaneously
- Caching — Previously extracted chunks are cached
- Batch operations — Entities inserted in bulk for speed
- Background processing — Graphs build while you work
Source Tracking
Every entity and relationship tracks which sources it came from.
What’s Tracked
| Field | Description |
|---|---|
| Source IDs | Which documents mention this entity |
| Chunk count | How many times it’s mentioned (affects node size) |
| Excerpts | Actual text passages where it appears |
Why This Matters
- Verify claims — See exactly where an entity was mentioned
- Assess importance — Larger nodes = more mentions
- Filter by source — Focus on specific documents
Status and Errors
Processing States
| Status | Meaning |
|---|---|
| Processing | Graph is being built |
| Completed | Graph is ready to explore |
| Failed | Something went wrong |
If Building Fails
Common causes:
- Very long or complex documents
- Unusual formatting
- Temporary API issues
Solution: Try reprocessing the source from the Sources panel.