2025-04-09
JSONL Datasets, Evals/Scoring, and more

JSONL Datasets

Datasets are now supported with JSONL files. This allows you to test your large datasets in bulk against, and supports streaming.

Read Docs

Evals & Scoring

We rolled out our initial evals support. Evals allow you to evaluate your prompts against a set of data, and get a score. More to come here soon.

Read Docs

Other

  • Consolidating prompts, evals, and datasets into single “files”
  • Officially rolled out alerts
  • Some CLI improvements
  • Minor bug fixes
2025-03-12
Sessions, Alerts, Trace UI Improvements, Onboarding Improvements

Sessions

Sessions provide a way to group related traces together, making it easier to monitor and debug complex workflows in your LLM applications. By organizing traces into sessions, you can track the entire lifecycle of a user interaction or a multi-step process.

Read Docs

Alerts

Now, you can get notified when your application is experiencing increases errors, latency, or costs. Configure alerts to notify you via slack, or a webhook.

Read Docs

Traces UI Improvements

Traces now have a more user-friendly UI, with a focus on providing important information at a glance.

Onboarding Improvements

We’ve improved our onboarding. Now, you can see your dashboard without having to sync your repo first. We also support modular onboarding, so you can skip steps you don’t need.

2025-02-18
Add Trace Examples to Datasets, Load Trace in Prompt, Re-indexing, App UI Improvments, bug fixes

Adding Examples to Datasets

You can now add production trace data to your datasets with a single click.

Read Docs

Adding Examples to Prompts

You can now add production trace examples to your prompts. This allows you to iterate/test against your prompts with real data.

Read Docs

Re-indexing

You can now re-index your prompts, and datasets. This allows you to perform a fresh pull on the content from your synced repository.

App UI Improvements

You can now view your easily app’s repo configuration, including repo names, branch, and more.

2025-01-27
Type Safety, Datasets, and more

Type Safety

Puzzlet aims to provide developers with the best developer experience possible. As part of this, we’ve just added type safety to our platform.

  • Types can now be generated via our CLI
  • Fetching prompts from our CDN or AgentMark are now type-safe
  • Prompts now support run/compile/deserialize functions

Read more about Type Safety

Datasets

Datasets now allow you to test your prompts in bulk against a large set of data.

  • Run your datasets in bulk against your prompts
  • View previous runs, with inputs/outputs
  • View traces associated with each run
  • View high-level metrics for each run

Read more about Datasets

Trace Grouping

Traces can now be grouped based on the trace function, and the component function. Trace groups together at the root level, while component allows for sub-groups.

  • New function added: trace
  • New function added: component

CLI Improvements

Our CLI has been improved to provide a better developer experience.

  • Puzzlet init can optionally create an example app
  • Added pull-models to walk through adding new models to your platform

Read More about our CLI

Bug Fixes

  • Fixed a bug which could cause an app’s templates to be deleted when a new app was created
  • Fixed a bug which could cause some branches not to show up in the UI
  • Fixed a bug which could prevent newly created local prompts from being synced to the platform

Other

  • Improved UI for prompts input/output
  • Paginate traces
  • Improved UI theme for prompts
2025-01-16
Initial Puzzlet Release

Overview

Puzzlet is the git-based Prompt Engineering Platform that empowers both application developers and prompt engineers to collaborate seamlessly on GenAI products. Puzzlet enables application developers to manage their configuration, prompts, datasets, and evals in a git-based workflow while also providing a hosted platform for seamless collaboration with non-technical team members.

Features

  • Prompt Management
  • Observability
  • Datasets
  • CLI
  • Platform Management
  • Puzzlet SDK

Prompt Management

Puzzlet takes a developer-first approach to prompt management, treating prompts as files that live in your repository while still providing a platform for non-technical team members. All prompts are saved in AgentMark, a markdown-based format that is easy to write and read.

Read Docs

Observability

We build on top of OpenTelemetry for collecting telemetry data from your prompts. This helps you monitor, debug, and optimize your LLM applications in production. We provide traces, logs, metrics, and more.

Read Docs

Datasets

Create datasets to test easily test your prompts in bulk against a large set of data.

Read Docs

CLI

We provide a CLI for initializing your Puzzlet app, customizing it, and deploying it to the cloud. Add new models to your platform with just a single command. You can also develop w/ Puzzlet locally using our serve command.

bash
npx @puzzlet/cli@latest init

Read Docs

Platform Management

Puzzlet offers an intuitive platform for creating new git-synced apps, adding team members with roles, and setting up API keys for users.

Puzzlet SDK

Puzzlet’s SDK is simple and easy to use. We offer features like: one-LOC observability, securely fetching prompts from our CDN, and more.

Read Docs

2025-01-03
Initial AgentMark Release

Features

  • Initial release of AgentMark
  • Support for OpenAI, Anthropic, and other LLM providers
  • MDX-based prompt templating
  • Type-safe prompt development
  • Tools and agents support

Documentation

  • Added comprehensive documentation
  • Included examples and guides
  • API reference documentation