Changelog
New features and improvements to Puzzlet
JSONL Datasets
Datasets are now supported with JSONL files. This allows you to test your large datasets in bulk against, and supports streaming.
Evals & Scoring
We rolled out our initial evals support. Evals allow you to evaluate your prompts against a set of data, and get a score. More to come here soon.
Other
- Consolidating prompts, evals, and datasets into single “files”
- Officially rolled out alerts
- Some CLI improvements
- Minor bug fixes
Sessions
Sessions provide a way to group related traces together, making it easier to monitor and debug complex workflows in your LLM applications. By organizing traces into sessions, you can track the entire lifecycle of a user interaction or a multi-step process.
Alerts
Now, you can get notified when your application is experiencing increases errors, latency, or costs. Configure alerts to notify you via slack, or a webhook.
Traces UI Improvements
Traces now have a more user-friendly UI, with a focus on providing important information at a glance.
Onboarding Improvements
We’ve improved our onboarding. Now, you can see your dashboard without having to sync your repo first. We also support modular onboarding, so you can skip steps you don’t need.
Adding Examples to Datasets
You can now add production trace data to your datasets with a single click.
Adding Examples to Prompts
You can now add production trace examples to your prompts. This allows you to iterate/test against your prompts with real data.
Re-indexing
You can now re-index your prompts, and datasets. This allows you to perform a fresh pull on the content from your synced repository.
App UI Improvements
You can now view your easily app’s repo configuration, including repo names, branch, and more.
Type Safety
Puzzlet aims to provide developers with the best developer experience possible. As part of this, we’ve just added type safety to our platform.
- Types can now be generated via our CLI
- Fetching prompts from our CDN or AgentMark are now type-safe
- Prompts now support run/compile/deserialize functions
Read more about Type Safety
Datasets
Datasets now allow you to test your prompts in bulk against a large set of data.
- Run your datasets in bulk against your prompts
- View previous runs, with inputs/outputs
- View traces associated with each run
- View high-level metrics for each run
Read more about Datasets
Trace Grouping
Traces can now be grouped based on the trace function, and the component function. Trace groups together at the root level, while component allows for sub-groups.
- New function added:
trace
- New function added:
component
CLI Improvements
Our CLI has been improved to provide a better developer experience.
- Puzzlet
init
can optionally create an example app - Added
pull-models
to walk through adding new models to your platform
Read More about our CLI
Bug Fixes
- Fixed a bug which could cause an app’s templates to be deleted when a new app was created
- Fixed a bug which could cause some branches not to show up in the UI
- Fixed a bug which could prevent newly created local prompts from being synced to the platform
Other
- Improved UI for prompts input/output
- Paginate traces
- Improved UI theme for prompts
Overview
Puzzlet is the git-based Prompt Engineering Platform that empowers both application developers and prompt engineers to collaborate seamlessly on GenAI products. Puzzlet enables application developers to manage their configuration, prompts, datasets, and evals in a git-based workflow while also providing a hosted platform for seamless collaboration with non-technical team members.
Features
- Prompt Management
- Observability
- Datasets
- CLI
- Platform Management
- Puzzlet SDK
Prompt Management
Puzzlet takes a developer-first approach to prompt management, treating prompts as files that live in your repository while still providing a platform for non-technical team members. All prompts are saved in AgentMark, a markdown-based format that is easy to write and read.
Observability
We build on top of OpenTelemetry for collecting telemetry data from your prompts. This helps you monitor, debug, and optimize your LLM applications in production. We provide traces, logs, metrics, and more.
Datasets
Create datasets to test easily test your prompts in bulk against a large set of data.
CLI
We provide a CLI for initializing your Puzzlet app, customizing it, and deploying it to the cloud. Add new models to your platform with just a single command. You can also develop w/ Puzzlet locally using our serve command.
Platform Management
Puzzlet offers an intuitive platform for creating new git-synced apps, adding team members with roles, and setting up API keys for users.
Puzzlet SDK
Puzzlet’s SDK is simple and easy to use. We offer features like: one-LOC observability, securely fetching prompts from our CDN, and more.
Features
- Initial release of AgentMark
- Support for OpenAI, Anthropic, and other LLM providers
- MDX-based prompt templating
- Type-safe prompt development
- Tools and agents support
Documentation
- Added comprehensive documentation
- Included examples and guides
- API reference documentation