Company Data Platform

DiscoLike’s platform offers a global, domain-first dataset of company profiles, including site text, subsidiaries, and vendors. This data can be integrated into your product through weekly or monthly flat file data uploads or our APIs, supporting natural language search, lookalike domain discovery, client list segmentation, and company name-to-profile matching.

Web Apps - Sleek X Webflow Template

Primary Datasets

BizData: Business domain directory with over 40 million sites in 42 languages, including keywords, industry categories, company size, start date, location, public contacts, and social network links, all derived from site content. Site text is refreshed at least once a month, with the full dataset delivered weekly or monthly.
Site Text: Extracted text and hyperlinks from a site’s home, product, services, about, and jobs pages. Both original and translated text are available, with incremental updates delivered daily.
Domain Linkage: Company subsidiaries, brand and product domains, site redirects, and linkages through public contact info, as well as vendors in use. Incremental updates delivered weekly.

Smartest Editor - Sleek X Webflow Template

Integration APIs

Discover: Integrate advanced, embeddings-powered company search into your product. Search by natural language, lookalike domain, keywords, or exact site text, with filtering options by start date, size, industry, and location.
Segment: Automatically group client lists into segments based on website text. Our LLM-based algorithm identifies optimal clusters across industry verticals or reduces to a fixed number of segments.
Match: Map company names to business profiles, handling mistyped names, and using location or phone info to improve accuracy.
Append: Perform bulk data appends from our datasets to domains or company names, matching names to domains as needed.
SimRank: Determine domain similarity to other domains or groups of domains.

Developer Tools - Sleek X Webflow Template

Supplementary Datasets

Domain Status: Outputs from our processing pipeline, covering site text extraction, and detection of dead, parked or unconfigured domains, as well as business site and language identification.
DNS: Continuously updated dataset of all domains observed through our certificate firehose, including resolutions from DNS records.
Certificate Stream: Near real-time certificate data from Certificate Authorities (CA) with full global coverage.
Vendor Integration: Lists of client domains that use vendor services under a client’s primary domain utilizing white labeling.
Technographics: Lists all JS includes as they appear in company site pages, mapped to corresponding vendors.
Company Growth: Tracks and trends subdomains in use, correlating with business size and spending trends.