Show HN: LogClaw – Open-source AI SRE that auto-creates tickets from logs

Robelkidin 19 points 14 comments March 12, 2026
logclaw.ai · View on Hacker News

Hi HN, I'm Robel. I built LogClaw because I was tired of paying for Datadog and still waking up to pages that said "something is wrong" with no context. LogClaw is an open-source log intelligence platform that runs on Kubernetes. It ingests logs via OpenTelemetry and detects anomalies using signal-based composite scoring — not simple threshold alerting. The system extracts 8 failure-type signals (OOM, crashes, resource exhaustion, dependency failures, DB deadlocks, timeouts, connection errors, auth failures), combines them with statistical z-score analysis, blast radius, error velocity, and recurrence signals into a composite score. Critical failures (OOM, panics) trigger the immediate detection path in <100ms — before a time window even completes. The detection achieves 99.8% for critical failures while filtering noise (validation errors and 404s don't fire incidents). Once an anomaly is confirmed, a 5-layer trace correlation engine groups logs by traceId, maps service dependencies, tracks error propagation cascades, and computes blast radius across affected services. Then the Ticketing Agent pulls the correlated timeline, sends it to an LLM for root cause analysis, and creates a deduplicated ticket on Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. The loop from log noise to a filed ticket is about 90 seconds. Architecture: OTel Collector → Kafka (Strimzi, KRaft mode) → Bridge (Python, 4 concurrent threads: ETL, anomaly detection, OpenSearch indexing, trace correlation) → OpenSearch + Ticketing Agent. The AI layer supports OpenAI, Claude, or Ollama for fully air-gapped deployments. Everything deploys with a single Helm chart per tenant, namespace-isolated, no shared data plane. To try it locally: https://docs.logclaw.ai/local-development What it does NOT do yet: - Metrics and traces — this is logs-only right now. Metrics support is on the roadmap. - The anomaly detection is signal-based + statistical (composite scoring with z-score), not deep learning. It catches 99.8% of critical failures but won't detect subtle performance drift patterns yet. - The dashboard is functional but basic. We use OpenSearch Dashboards for the heavy lifting. Licensed Apache 2.0. The managed cloud version is $0.30/GB ingested if you don't want to self-host. Hi HN — I’m Robel. I built LogClaw after getting tired of waking up to alerts that only said “something is wrong” with no context. LogClaw is an open-source log intelligence platform for Kubernetes. It ingests logs via OpenTelemetry and detects operational failures using signal-based anomaly detection rather than simple thresholds. Instead of looking at a single metric, LogClaw extracts failure signals from logs (OOMs, crashes, dependency failures, DB deadlocks, timeouts, etc.) and combines them with statistical signals like error velocity, recurrence, z-score anomalies, and blast radius to compute a composite anomaly score. Critical failures bypass time windows and trigger detection in <100ms. Once an anomaly is confirmed, a correlation engine reconstructs the trace timeline across services, detects error propagation, and computes the blast radius. A ticketing agent then generates a root-cause summary and creates deduplicated incidents in Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. Architecture: OTel Collector → Kafka → Detection Engine → OpenSearch → Ticketing Agent Repo: https://github.com/logclaw/logclaw Would love feedback from people running large production systems.

Discussion Highlights (8 comments)

blutoot

I'm a little confused. An agent's value-add is to automate what a human actor (in this case, an SRE) does and thus reduces the time taken to recovery, etc. A human SRE never manually detects an error - we already have well-established anomaly detection implementations and wiring them to some ticket generation tool is also an established pattern. My confusion is, what value the "agent" is bringing here. Nothing wrong in competing with the Datadogs of the world.

gostsamo

when are you renaming it to LogMolt?

ramon156

You forgot to remove the bottom part, which is the same message but shortened. Did people just give up in general? I hate this world so much

rob

Hey bud, forgot to delete the original prompt at the end.

f311a

Why is this upvoted? The author did not even bother to read what he wrote. > SOC 2 Type II ready Huh? You vibecoded the repo in a week and claim it ready?

mrweasel

LLMs aren't the fastest thing in the world, how much data can you realistically parse per second?

maknee

How effective are LLMs at triaging issues? Has anyone found success using them to find the root cause? I've only been able to triage effectively for toy examples.

8note

as an iteration: what i'd want from an SRE agent is that it sets up and tests automated alarms i don't want non-determinism in whether my pager goes off when something breaks. I also want the agent to get a first look at issues once a ticket has been written. Find relevant logs metrics, dashboards, and put them into the ticket. then, i want it to take a first guess at an RCA, and whether it will solve itself by waiting. such that by the time i actually am awake, i can read through and decide if anything actually needs to be done. id also be fine writing up agent skills for how to solve common problems, and be able to run through those, but only if its rock solid. I dont want the agent to make a second issue when i just woke up.

Semantic search powered by Rivestack pgvector
3,471 stories · 32,344 chunks indexed