What shipped
bolder-ai 0.1.0 is live on PyPI:
pip install bolder-ai
It's the official Python SDK for BEval Studio — the hosted evaluation platform for production LLM systems. Install it, set an API key, and every LLM call your app makes shows up in your BEval dashboard alongside latency, token counts, cost, and status.
The distribution name is bolder-ai. The import name is beval.
import beval
beval.init() # reads BEVAL_API_KEY from env
The three ways to use it
1. Log anything directly
beval.log(
kind="llm",
model_id="gpt-4o-mini",
input="What is the capital of France?",
output="Paris.",
latency_ms=312,
tokens_in=7,
tokens_out=2,
)
No schema to learn. The fields map 1:1 to what the dashboard displays. Logging is fire-and-forget — the call returns in microseconds and the network POST happens on a background thread. If the gateway is down, your app doesn't care.
2. Auto-wrap your OpenAI or Anthropic client
import beval
from openai import OpenAI
beval.init()
client = beval.wrap(OpenAI())
client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello."}],
)
Every chat.completions.create is captured — input messages, output, token usage, latency, errors. Image parts are detected automatically and logged as kind="vlm" with the image inlined. Same story for beval.wrap(Anthropic()).
Zero changes to your call sites.
3. Decorate agent functions
@beval.trace
def run_agent(query: str) -> str:
# ...
return answer
@beval.trace(name="tool:search", kind="agent")
async def search(q): ...
Captures arguments, return value, latency, and exceptions. Works for both sync and async functions. Exceptions are logged with status="failure" and the exception class + message — then re-raised.
What's under the hood
The SDK is deliberately small and boring:
- One hard dependency:
httpx. Optional extras foropenaiandanthropicintegrations. - One background thread draining an in-memory queue. 10,000-entry capacity, drop-on-overflow.
- Retries on 408 / 429 / 5xx with exponential backoff. Network errors never raise to your code.
- Graceful shutdown —
atexitdrains the queue with a 5-second timeout so logs buffered during the last request still make it out. - Redaction hook at
init()for scrubbing PII before send.
Installed footprint: ~22 KB wheel.
Why it exists
BEval Studio has run for months on direct API ingest from the VLM mock app and the bolder-fit-agent production workload. Every team integrating BEval wrote essentially the same HTTP client, retry loop, background thread, and OpenAI wrapper. We consolidated it.
The SDK talks to a single endpoint (POST /api/v1/logs/ingest) that's been stable since BEval launched. That means:
- Existing custom integrations keep working
- The SDK is backward compatible across BEval dashboard upgrades
- New SDK versions won't force backend changes
Already running in production
The same day we published bolder-ai 0.1.0 to PyPI, we integrated it into bolder-fit-agent — our coach-facing agentic service that drives multi-turn LiteLLM + MCP agent runs. Every agent turn now shows up in the BEval dashboard with session ID, plan type, turn index, tool calls, and cost.
Total integration diff: 7 files, 159 insertions. One of those files is the SDK init helper. Most of the rest is a single helper function that builds a log entry around the existing LLM call site.
What's next
This is 0.1. The short list for 0.2 and beyond:
- Batch ingest — today the SDK posts one log per call. Fine for most workloads, wasteful for high-throughput ones. A batch endpoint is the top priority.
- Streaming support — wrappers currently don't log when
stream=True. We'll accumulate chunks and log on stream end, with time-to-first-token as a first-class metric. - Traces and spans — flat logs work for single LLM calls; agent runs want a parent/child tree. The SDK will grow a context manager and nested
@tracesupport once the backend trace model lands. - LangChain, LlamaIndex, LiteLLM callback integrations.
- Direct-to-S3 image upload for VLM payloads over ~64 KB.
Get started
- Install:
pip install bolder-ai - Create an API key in your BEval dashboard → Settings → API Keys
- Set
BEVAL_API_KEYin your environment - Add three lines to your app —
beval.init()at startup, and eitherbeval.log(...),beval.wrap(client), or@beval.trace
Full docs: docs.bolder.services. Source lives on PyPI at pypi.org/project/bolder-ai.
If you're running LLMs in production and want structured visibility without standing up your own observability stack, book a call. We'll get you on the dashboard same-day.
