Platform documentation

Everything you need to
generate verified data.

SynthLabTech is a single platform combining five specialized generation engines, cryptographic evidence bundles, and an AI orchestrator — unified by a deterministic contract system that guarantees reproducibility.

Foundation

Everything starts with Contract K

Contract K is the deterministic specification at the heart of every SynthLabTech job. It defines the engine, seed, schema constraints, compute tier, and output format in a single serializable document — cryptographically hashed before execution begins.

The same Contract K always produces identical output. On any machine. At any time. This is the foundation of SynthLabTech's reproducibility guarantee.

Versioned and serializable — store alongside your datasets
Human-readable CBOR/JSON format — third-party auditable
BLAKE3-hashed before execution begins
Supports schema constraints, cross-column correlations, and partitioning
contract_k.json
{
"version": "2.1",
"engine": "rapid_rrf",
"seed": "0x7FA3B8C2",
"rows": 50000,
"compute": "synth/cpu",
"hash": "blake3:a94f2e..."
}
Counter-FNV verified · CBOR-serialized · Determinism proof: sealed ✓

Generation Engines

Five engines. One contract.

Each engine is purpose-built for a specific data domain. All share the same Contract K input, the same Canonical Lift pipeline, and the same evidence bundle output.

Rapid Mode — Rapid Relational Fabrication

rapid_rrf

Deterministic tabular generation at industrial scale

Rapid Relational Fabrication is SynthLabTech's primary tabular engine. It uses a 256-bit seed from Contract K to drive a counter-based RNG (ChaCha20 variant) with IEEE-754 binary64 precision throughout — the same seed produces the same floating-point values regardless of CPU architecture or OS. Generation is done in a single pass; the Canonical Lift pipeline then applies cross-column correlations, referential integrity constraints, nullable patterns, and domain-specific rules before writing the sealed output. RRF is the engine of choice for high-volume, time-sensitive jobs: 500K rows on synth/cpu in under 8 seconds, scaling linearly to hundreds of millions of rows without quality degradation.

ChaCha20-variant counter RNG, 256-bit seed, IEEE-754 binary64
Canonical Lift: cross-column correlations, FK integrity, nullable enforcement
CSV, Parquet, and JSON output — same Contract K for all formats
3 credits per 5,000 rows on synth/cpu; 4 credits on synth/gpu
Sub-linear scaling: 500K rows < 8s, 50M rows < 12 minutes
Constraint Report artifact validates every constraint on every column
Contract KCanonical LiftTabular

Research Mode — Thermodynamic Reservoir Computing

research_trc

High-fidelity statistical synthesis for ML and research

Thermodynamic Reservoir Computing (TRC) applies energy-based generative modeling where the joint probability distribution is expressed as a Boltzmann energy function. Configurable temperature schedules anneal the model from high-entropy initialization toward the target distribution, while contrastive divergence with negative sampling drives Gibbs chain convergence. Unlike standard tabular synthesizers that optimize marginal statistics, TRC preserves complex multivariate dependencies — including heavy tails, rare joint events, and conditional distributions. The result is synthetic data that passes rigorous ML utility tests: models trained on TRC output match the performance of models trained on the original data within a measured delta captured in the Utility Metrics artifact.

Boltzmann energy function with configurable temperature schedule
Contrastive divergence + negative sampling, Gibbs chain convergence
Preserves heavy tails, rare events, and conditional distributions
Utility Metrics artifact: distributional delta measured per column
4 credits per 5,000 rows on synth/cpu; GPU tier for large models
Supports conditional generation: fix columns, synthesize the rest
Energy modelsGibbs convergenceML fidelity

Virtual SCADA Simulator

virtual_scada

Industrial OT telemetry with physics-calibrated simulation

The Virtual SCADA Simulator generates multi-layered operational technology telemetry streams with calibrated physics models at each protocol layer. Five protocol stacks are supported: Modbus TCP, OPC-UA, BACnet/IP, MQTT, and DNP3. Describe any industrial facility in natural language — the LLM fabricates a custom scenario pack with calibrated sensor catalogs, physics cross-checks, and protocol mappings for your specific vertical: power generation, oil & gas, water treatment, discrete manufacturing, or any other domain. Each scenario supports four operating regimes in a single job: normal baseline, high-load stress, fault propagation with realistic degradation curves, and scheduled maintenance windows with controlled restarts.

Modbus TCP: coils, discrete inputs, holding registers, function codes
OPC-UA: namespace hierarchies, NodeId, subscriptions, COV
BACnet/IP: ASHRAE objects, COV notifications, BBMD routing
DNP3: SCADA/RTU polling, unsolicited reporting, data link layer
MQTT: topic trees, QoS 0/1/2, retained messages, LWT
Unlimited LLM-generated scenario packs — any industrial facility, any vertical
4 operating regimes: normal, high-load, fault, maintenance
3 credits per 5,000 rows on synth/cpu
Modbus TCPOPC-UA · DNP3LLM-generated packs

ICS Security Simulator

ics_security

Ground-truth labeled ICS attack datasets for security teams

The ICS Security Simulator generates precisely labeled industrial control system attack datasets covering five MITRE ATT&CK ICS technique categories. Each attack sequence is generated with realistic timing, protocol fidelity, and controllable severity, producing datasets that security teams can immediately use for IDS classifier training, anomaly detector validation, and SIEM rule development — without access to real attack data or the risk of red-teaming a live network. Every row in the output includes ground-truth label columns: attack_category, technique_id (MITRE ATT&CK ICS T-code), severity_score (1–10), and is_malicious.

Replay: command record-and-playback, timing jitter, loop injection
Command Injection: Modbus FC forging, OPC-UA write, DNP3 setpoint deviation
DoS: SYN flood, broadcast storms, subscription exhaustion
MitM: ARP poisoning, certificate spoofing, session hijacking
Recon: active scan patterns, device enumeration, protocol fingerprinting
Ground-truth labels: attack_category, ATT&CK ICS T-code, severity 1-10
Configurable normal:attack ratio for balanced or realistic imbalance
Direct scikit-learn, PyTorch, Splunk-compatible output
MITRE ATT&CK ICSGround-truth labelsIDS/SIEM-ready

Cryptographic Evidence Bundles

8 artifacts

Tamper-evident proof of generation — on every single job

The SynthLabTech evidence system is not a feature you opt into — it is mandatory infrastructure executed unconditionally on every job across every engine and every tier. The bundle is assembled in a strict sequence: Contract K is BLAKE3-hashed before the first byte of execution; the output is hashed to produce the Determinism Proof; all seven preceding artifacts are then SHA-256 hashed together into the Artifact Manifest, creating a Merkle-style tamper chain. Any third party — your auditors, compliance team, or an independent security researcher — can verify every claim without accessing SynthLabTech.

Contract K: BLAKE3-hashed before execution — spec is locked at submission
Run Manifest: job ID, tenant, engine, compute tier, timestamp, worker ID
Constraint Report: per-column pass/fail for every constraint in Contract K
Determinism Proof: BLAKE3 hash of full output file — reproducibility seal
Privacy Report: k-anonymity, l-diversity, t-closeness per QI group
Utility Metrics: column-level distributional delta vs. reference dataset
Artifact Manifest: SHA-256 of artifacts 1-7 — tamper detection chain
Timing Telemetry: stage-level durations for SLA and capacity planning
BLAKE3 + SHA-256SOC 2 · FDA ready3rd-party verifiable

Evidence System

8 artifacts sealed on every job

No configuration required. Every generation job — regardless of engine, size, or tier — produces the same 8-artifact evidence bundle. Each artifact is individually addressable by SHA-256 hash and collectively sealed with a BLAKE3 determinism proof.

Share the bundle with your auditors, compliance team, or model validation reviewers. They can independently verify every claim without accessing SynthLabTech.

Evidence Bundle8/8 ✓
Contract K
blake3:a94f2e...
Run Manifest
v2.1.0
Constraint Report
pass ✓
Determinism Proof
BLAKE3 ✓
Privacy Report
k=5 safe
Utility Metrics
Δ=0.97
Artifact Manifest
sha256:...
Timing Telemetry
2.4s

AI Orchestrator

From natural language
to running job.

The SynthLabTech AI Orchestrator translates conversational intent into fully-formed Contract K specifications, then routes execution across all five engines. Multi-provider LLM gateway with automatic failover, PII redaction, and an organizational rules engine enforced before any call reaches an engine.

Tool Registry

Structured tool calls map precisely to engine operations — no ambiguity between intent and execution.

LLM Gateway

Multi-provider with automatic fallback. OpenAI, Anthropic, and local models. No vendor lock-in.

PII Redaction

Automatic scrubbing of personally identifiable information before any data leaves your session.

Rules Engine

Organizational policy enforcement at the orchestration layer — before any engine is invoked.

Contract K Builder

Natural language is parsed and converted into a fully deterministic, auditable Contract K specification.

Session Replay

Every orchestration session is logged and replayable. Full audit trail for compliance teams.

Architecture

Built for production from day one.

Two-layer architecture: a Rust kernel for deterministic generation, a Python brain for orchestration and ML. Designed for horizontal scale and tenant isolation.

Rust Core Heart

The generation kernel is written in Rust for deterministic, memory-safe execution. Counter-based RNG with IEEE-754 binary64 produces identical floating-point results across architectures.

Python Brain

The orchestration and ML layer runs in Python — Celery task workers, LLM gateway, and the AssistantBroker. Async, horizontally scalable, and GPU-aware.

Tenant Isolation

Row-level security in PostgreSQL. Each tenant's data is isolated at the database layer. Write-only secrets. Cross-tenant access is architecturally impossible.

Compute Routing

Jobs route to synth/cpu, synth/gpu, or train/cpu queues based on engine and tier. Per-tier credit pricing: synth/cpu 3 cr/5k rows, synth/gpu 4 cr/5k rows.

Canonical Lift Pipeline

Every engine output passes through the Canonical Lift stage: constraint enforcement, schema normalization, and output sealing — in that order, every time.

Zero-Trust API

Scoped API keys with granular permissions. Rate limiting per tenant. Comprehensive audit log on every API call. No unauthenticated endpoints.

Ready to generate verifiable data?

Free plan included. No credit card required. Full evidence bundles on every job from day one.