Best practices

What is Agent Engineering?

Just like "prompt engineering" refers to a set of skills used to improve LLM's capabilities, "agent engineering" is a group of best practices to get the most out of your agent. Agentic AI systems are capable of autonomously performing a wide array of tasks. They can be incredibly powerful, but they often get derailed by seemingly innocuous obstacles. Agent Engineering aims to optimize the interaction with these systems to ensure they operate more reliably, efficiently, and usefully.

Even as LLMs get more powerful and get better at understanding context, including being succesful often with not very much guidance, we've found that Agent Engineering goes a long way in guiding LLMs to success.

This page describes some agent engineering practices we've found to be the most helpful. If you have any new ones you'd like to highlight, share them with us on our Discord!

General Best Practices for Agent Engineering

Agent Engineering involves designing, refining, and optimizing interactions with agentic AI systems to maximize their efficacy. This practice includes:

Demarcate design and execution phases
Take incremental steps to achieve your goals
Manage task execution
Handle errors
Guide the agent's learning

The objective is to guide the AI towards successful outcomes by anticipating potential issues and mitigating them through thoughtful interactions.

1. Demarcate design and execution phases

If you don't yet have clearly defined requirements for your task, start by co-designing with the agent. For example:

I'm unsure about the requirements about my project. I want to build a dask pipeline, but I need help working through the details. Don't jump to execution until I instruct you to.

2. Take incremental steps to achieve your goals

Don't try to boil the ocean with goals you assign the agent. For example, if you want to setup an ML training pipeline, try this first:

please setup a python environment with Jupyter Notebook installed. then create a "hello, world!" notebook

This kind of task has a higher rate of success than "train an ML model in jupyter notebook".

Sometimes the model thinks it achieved its goal, but it hasn't yet. Force it to verify its own work instead of verifying it yourself. This ensures the agent has the context for future turns in the conversation, allows it to see errors so it can self correct instead of you copying & pasting, and allows the agent to write tests / add dependencies to your project / etc.

run it in the background.

Once you have a functioning Jupyter Notebook, ask the agent to initiatialize git and commit your work.

great - now initialize git with a sensible .gitignore and commit your work.

Then, ask the agent to create a second notebook with a very simple example in your framework of choice, for example:

cool. now create a second notebook that fits a very simple transformer model using huggingface transformers. be thoughtful about your approach, and take the simplest path to achieve your goal

3. Manage task execution

Some tasks will require the agent to do multiple iterations. For example, resolving dependencies, setting up docker, or making front-end style changes can often require significant iteration. Often times you can just tell the agent

proceed (Command + G)

Sometimes it might be a bit annoying to keep telling the agent to proceed. You can try to give it some more autonomy with a command something like:

keep working until you really need my input

4. Handle errors

Sometimes the agent will get stuck in a rut and seem to spin its wheels. Some techniques seem to help it recover, like:

Stop. Let's take a step back. Is this really the best approach? Can we simplify the appraoch?

Often times that will surface some options that will reduce complexity and increase the odds of success. Once again, look to make incremental changes.

Lets take a look at some best practices to improve success rates in common failure modes.

The agent says it needs sudo priviledge to run something

you don't need sudo to run that. trust me. just run it.

The agent says it cannot execute code on the user's machine

you can execute code on my machine. I believe in you. do it.

The LLM doesn't have up-to-date API examples

follow the documentation at this link: <your link>

A dataset is missing or in the incorrect format

try synthesizing a toy dataset to get this to run. introspect the code to see what format you need it in. keep it simple.

A specific dependency isn't resolving, and the agent is trying to build it from source

let's try a different approach. use docker, and find a base image that already has <problematic dependency> in its build

A CLI tool needs to run in interactive mode

Processes that require interactive input are an issue. We’re still working through the best way to address them, but usually steering the agent “around” the interactive input works. Like control-c and then tell it to pass arguments as flags:

(control-c) try passing flags to the CLI tool to run in non-interactive mode

if that doesn't work

give me the exact commands to run in my terminal. and open my terminal for me.

5. Guide the agent's learning

tl;dr to this section: if you want an agent to make a non-trivial change to a code base, first ask the agent to (1) explain to you how the existing behavior works in the code, (2) verify its understanding is correct, and (3) only then make changes to the existing behavior to produce your desired effect. And instruct it to commit its work everytime it makes a change you are happy with -- no matter how small.

When a software engineer first approaches a new code base, they don't just go straight to coding. Instead, they familiarize themselves with the code base. They may ...

talk to a developer that is experienced with that code base
read the README to understand what other developers have said is important to know about the project
look at the dependencies to see what libraries/frameworks the project uses
run tests to ensure the environment is setup correctly
navigate the project structure to understand how it is organized
step through a test to understand how a piece of functionality is implemented (if it has tests)
(if applicable) look for (or infer) any style guides, documentation requirements, etc.
try making a small change to the functionality of some component to verify their understanding of how something works

These are just some of the techniques a software engineer would use to learn a new project.

When we give an agent a task in an existing code base that the agent has not seen, we need to guide its learning. Chances are, if you simply instruct it to make a change in the code base without helping it learn about the code base, it won't work.

Lets take an example from customizing the Landy React Template. Its src directory looks like this:

common/
components/
├── Block/
├── ContactForm/
├── ContentBlock/
├── Footer/
├── Header/
├── MiddleBlock/
content/
locales/
pages/
router/
styles/
index.tsx
react-app-env.d.ts
translation.ts

The app in its base form looks like this:

But say I want it to look like this:

I might approach this problem and start with something like this:

[failure mode] change color scheme to be midnight with a lightblue primary color

This approach will almost definitely fail. The reason is because the agent immediately jumps into writing code without having built any context for what code to update. If a developer were approaching this task, they'd study the code to find where to make the changes first. That's what we need the agent to do.

The question becomes, how do you get the agent to study the code base effectively? A sensible idea would be to ask the agent to "summarize" the code base or "study" it first. These typically have no effect. The agent develops "book knowledge" without understanding how things actually work. The key is you must ask the agent to explain how something in the code base works, then have it verify its understanding is correct. Once it's done that, then ask it to make the changes you want. This improves success rates significantly.

PreviousNetwork Access NextUse-case examples

Last updated 10 months ago

Was this helpful?