/projects/telemetry-platform

Telemetry Platform

AWS-backed QEMU experiment runner with artifact pipelines, statistical regression analysis, and LLM-assisted triage for OS workloads.

infrastructure

Telemetry Platform

Telemetry Platform is an experiment and observability system for TinyOS and other low-level workloads. Instead of treating kernel testing as one-off local runs, it turns each QEMU execution into a structured experiment with versioned artifacts, metadata, and comparable metrics. The platform dispatches workloads through cloud-backed runners, stores outputs such as kernel logs and metrics, applies statistical analysis to detect regressions and variation across runs, and uses LLM-assisted summarization and triage to help interpret failures, anomalies, and performance shifts.

date: 2026 - Presentstatus: active

GitHub

overview

OS experimentation is often ad hoc: local runs are hard to reproduce, logs are hard to compare, and noisy performance data makes regressions difficult to trust. This project turns systems testing into a structured telemetry and analysis pipeline.

implementation

Automated workload execution under QEMU and exported logs, metadata, and metrics into structured artifacts for each run.

Built a queued cloud runner backed by AWS services so experiments could be dispatched, executed, and collected consistently.

Added statistical analysis over repeated runs to measure variance, compare distributions, and detect meaningful regressions.

Integrated LLM-based log and artifact analysis to summarize outcomes, surface likely root causes, and assist with triage.

Connected the platform to CI so regression-oriented workloads run alongside code changes rather than as manual checks.

challenges

Experiment outputs must stay structured enough for comparison, not just storage.

Performance data is noisy, so the platform has to distinguish genuine regressions from ordinary variance.

Cloud orchestration, artifact indexing, and traceability are just as important as the QEMU runner itself.

LLM-assisted analysis has to be useful for debugging without replacing the raw evidence engineers need to inspect.

The workload set needs to be representative enough to catch meaningful issues without turning into unmaintainable noise.

outcomes

Built a bridge between low-level OS experimentation and production-style telemetry discipline.

The platform makes regression detection, artifact review, and failure triage far more systematic.

Statistical analysis improves confidence in performance conclusions across repeated runs.

LLM-assisted summaries make logs and experiment outputs faster to interpret during debugging and iteration.

architecture notes

The platform uses queued jobs, cloud workers, containerized QEMU execution, and durable artifact storage for repeatable systems experiments.

Metadata indexed in DynamoDB makes each run queryable by workload, build, commit, and experiment configuration.

Statistical aggregation across repeated runs helps identify latency shifts, instability, and performance regressions with more confidence.

LLM-based analysis layers on top of raw logs and metrics to summarize failures, compare runs, and assist with debugging.

GitHub Actions integrates smoke tests and regression-oriented workloads directly into the development loop.

stack

PythonQEMUDockerAWSGitHub ActionsLLMsStatistical Analysis

highlights

Processed 200+ experiment runs across 10+ OS workloads.

Stored structured artifacts including kernel logs, run metadata, metrics, and regression comparison outputs.

Connected local workload automation to AWS SQS, EC2, S3, DynamoDB, and CI-based execution flows.

Added LLM-assisted summarization to interpret logs, flag anomalies, and explain likely regression causes.

Used statistical analysis across repeated runs to separate real regressions from noise and runtime variance.

metrics

runs

200+

workloads

10+

artifact types

media

Structured run outputs with metrics, regression diffs, and summaries.

CI integration with smoke tests and regression workloads running alongside code changes.