Everything you need to
generate verified data.
SynthLabTech is a single platform combining five specialized generation engines, cryptographic evidence bundles, and an AI orchestrator — unified by a deterministic contract system that guarantees reproducibility.
Foundation
Everything starts with Contract K
Contract K is the deterministic specification at the heart of every SynthLabTech job. It defines the engine, seed, schema constraints, compute tier, and output format in a single serializable document — cryptographically hashed before execution begins.
The same Contract K always produces identical output. On any machine. At any time. This is the foundation of SynthLabTech's reproducibility guarantee.
Generation Engines
Five engines. One contract.
Each engine is purpose-built for a specific data domain. All share the same Contract K input, the same Canonical Lift pipeline, and the same evidence bundle output.
Rapid Mode — Rapid Relational Fabrication
rapid_rrfDeterministic tabular generation at industrial scale
Rapid Relational Fabrication is SynthLabTech's primary tabular engine. It uses a 256-bit seed from Contract K to drive a counter-based RNG (ChaCha20 variant) with IEEE-754 binary64 precision throughout — the same seed produces the same floating-point values regardless of CPU architecture or OS. Generation is done in a single pass; the Canonical Lift pipeline then applies cross-column correlations, referential integrity constraints, nullable patterns, and domain-specific rules before writing the sealed output. RRF is the engine of choice for high-volume, time-sensitive jobs: 500K rows on synth/cpu in under 8 seconds, scaling linearly to hundreds of millions of rows without quality degradation.
Research Mode — Thermodynamic Reservoir Computing
research_trcHigh-fidelity statistical synthesis for ML and research
Thermodynamic Reservoir Computing (TRC) applies energy-based generative modeling where the joint probability distribution is expressed as a Boltzmann energy function. Configurable temperature schedules anneal the model from high-entropy initialization toward the target distribution, while contrastive divergence with negative sampling drives Gibbs chain convergence. Unlike standard tabular synthesizers that optimize marginal statistics, TRC preserves complex multivariate dependencies — including heavy tails, rare joint events, and conditional distributions. The result is synthetic data that passes rigorous ML utility tests: models trained on TRC output match the performance of models trained on the original data within a measured delta captured in the Utility Metrics artifact.
Virtual SCADA Simulator
virtual_scadaIndustrial OT telemetry with physics-calibrated simulation
The Virtual SCADA Simulator generates multi-layered operational technology telemetry streams with calibrated physics models at each protocol layer. Five protocol stacks are supported: Modbus TCP, OPC-UA, BACnet/IP, MQTT, and DNP3. Describe any industrial facility in natural language — the LLM fabricates a custom scenario pack with calibrated sensor catalogs, physics cross-checks, and protocol mappings for your specific vertical: power generation, oil & gas, water treatment, discrete manufacturing, or any other domain. Each scenario supports four operating regimes in a single job: normal baseline, high-load stress, fault propagation with realistic degradation curves, and scheduled maintenance windows with controlled restarts.
ICS Security Simulator
ics_securityGround-truth labeled ICS attack datasets for security teams
The ICS Security Simulator generates precisely labeled industrial control system attack datasets covering five MITRE ATT&CK ICS technique categories. Each attack sequence is generated with realistic timing, protocol fidelity, and controllable severity, producing datasets that security teams can immediately use for IDS classifier training, anomaly detector validation, and SIEM rule development — without access to real attack data or the risk of red-teaming a live network. Every row in the output includes ground-truth label columns: attack_category, technique_id (MITRE ATT&CK ICS T-code), severity_score (1–10), and is_malicious.
Cryptographic Evidence Bundles
8 artifactsTamper-evident proof of generation — on every single job
The SynthLabTech evidence system is not a feature you opt into — it is mandatory infrastructure executed unconditionally on every job across every engine and every tier. The bundle is assembled in a strict sequence: Contract K is BLAKE3-hashed before the first byte of execution; the output is hashed to produce the Determinism Proof; all seven preceding artifacts are then SHA-256 hashed together into the Artifact Manifest, creating a Merkle-style tamper chain. Any third party — your auditors, compliance team, or an independent security researcher — can verify every claim without accessing SynthLabTech.
Evidence System
8 artifacts sealed on every job
No configuration required. Every generation job — regardless of engine, size, or tier — produces the same 8-artifact evidence bundle. Each artifact is individually addressable by SHA-256 hash and collectively sealed with a BLAKE3 determinism proof.
Share the bundle with your auditors, compliance team, or model validation reviewers. They can independently verify every claim without accessing SynthLabTech.
AI Orchestrator
From natural language
to running job.
The SynthLabTech AI Orchestrator translates conversational intent into fully-formed Contract K specifications, then routes execution across all five engines. Multi-provider LLM gateway with automatic failover, PII redaction, and an organizational rules engine enforced before any call reaches an engine.
Tool Registry
Structured tool calls map precisely to engine operations — no ambiguity between intent and execution.
LLM Gateway
Multi-provider with automatic fallback. OpenAI, Anthropic, and local models. No vendor lock-in.
PII Redaction
Automatic scrubbing of personally identifiable information before any data leaves your session.
Rules Engine
Organizational policy enforcement at the orchestration layer — before any engine is invoked.
Contract K Builder
Natural language is parsed and converted into a fully deterministic, auditable Contract K specification.
Session Replay
Every orchestration session is logged and replayable. Full audit trail for compliance teams.
Architecture
Built for production from day one.
Two-layer architecture: a Rust kernel for deterministic generation, a Python brain for orchestration and ML. Designed for horizontal scale and tenant isolation.
Rust Core Heart
The generation kernel is written in Rust for deterministic, memory-safe execution. Counter-based RNG with IEEE-754 binary64 produces identical floating-point results across architectures.
Python Brain
The orchestration and ML layer runs in Python — Celery task workers, LLM gateway, and the AssistantBroker. Async, horizontally scalable, and GPU-aware.
Tenant Isolation
Row-level security in PostgreSQL. Each tenant's data is isolated at the database layer. Write-only secrets. Cross-tenant access is architecturally impossible.
Compute Routing
Jobs route to synth/cpu, synth/gpu, or train/cpu queues based on engine and tier. Per-tier credit pricing: synth/cpu 3 cr/5k rows, synth/gpu 4 cr/5k rows.
Canonical Lift Pipeline
Every engine output passes through the Canonical Lift stage: constraint enforcement, schema normalization, and output sealing — in that order, every time.
Zero-Trust API
Scoped API keys with granular permissions. Rate limiting per tenant. Comprehensive audit log on every API call. No unauthenticated endpoints.
Ready to generate verifiable data?
Free plan included. No credit card required. Full evidence bundles on every job from day one.