Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.onsomble.ai/llms.txt

Use this file to discover all available pages before exploring further.

Ingestion is the process Onsomble uses to read and understand a business’s website. It runs automatically after you add a new Site, and it’s what lets Onsomble suggest relevant competitors, generate realistic prompts, and later power workflows with accurate information about the business. This page explains what ingestion does, how long it takes, and how to diagnose the most common issues.

What ingestion actually does

When you add a Site, Onsomble crawls the pages of the associated website and extracts the information it needs to work with. In practical terms, that means:
  • Fetching the pages that matter — homepage, service pages, about, contact, pricing, any content that describes what the business does and who it serves
  • Extracting and structuring the content so Onsomble can reason about it
  • Building a picture of the business’s category, offerings, and likely customer questions
  • Identifying candidate competitors based on what the business does
The result is a summarised understanding of the business that Onsomble uses to make scans and workflows genuinely relevant.

How long it takes

Ingestion time depends on how much content there is to process.
Site sizeTypical time
Small website (under ~20 pages)2–5 minutes
Medium website (20–200 pages)5–15 minutes
Large website (200+ pages)15+ minutes
You don’t need to sit and watch. Ingestion runs in the background, and you’ll be notified when it’s complete. You can also start setting up a first scan straight away — Onsomble will sharpen its suggestions once ingestion finishes.

Watching progress

The Site overview shows ingestion status:
  • Queued — waiting to start
  • In progress — actively crawling and processing
  • Complete — ready to inform scans and workflows
  • Failed — something stopped it from finishing (see below)

When ingestion fails

A few issues can stop ingestion. The most common are easy to diagnose:
Onsomble ingests publicly reachable content. If a site requires authentication to view, ingestion will fail.For now, ingest a public-facing marketing or brand site rather than a logged-in application area.
Some sites actively block crawlers via robots.txt, Cloudflare rules, or WAF configurations.Check whether robots.txt excludes the Onsomble crawler, and whether any bot-protection service is blocking the request. Whitelisting Onsomble resolves this.
If pages take a long time to respond, ingestion may time out partway through.The fix is usually on the website’s side — reducing render-blocking JavaScript, fixing broken backends, or addressing slow third-party scripts.
If the website has only a landing page or “coming soon” content, there may not be enough material for Onsomble to work with.Publish a richer description of the business first. A handful of well-written service and about pages is enough.
If none of these apply, contact [email protected] with the Site name and we’ll investigate.

Re-ingesting after a website change

Websites change. When the underlying business content moves — a new service line, updated pricing, a redesigned homepage — you’ll want Onsomble to re-ingest so it’s working from the current content. Trigger a fresh ingestion from the Site overview. Existing scans and insights are preserved; only the underlying understanding of the website is refreshed.
A good time to re-ingest is after any significant website update, or before setting up a new workflow that depends on specific product or service information.

What’s next

Managing multiple Sites

Switch between Sites, rename them, and keep your portfolio organised.

Setting up a scan

Put your newly-ingested Site to work with a first discoverability scan.