Now the latest overly shared cross-departmental tech marketing term since “endpoint security”, prompt engineering is a newly emerging field that is focused on designing and refining prompts to be used efficiently with language models (LMs) for various uses and research ventures. More specifically, this discipline assists in comprehending the abilities and restrictions of large language models (LLMs). Specialists use prompt engineering to upgrade the abilities of LLMs in a wide scope of tasks, for example, question answering and arithmetic reasoning. Developers use this practice to create strong and effective prompting methods that can collaborate with LLMs and other tools.
NOTE: All examples are tested with text-davinci-003
(using OpenAI’s playground) unless otherwise specified. It uses the default configurations, e.g., temperature=0.7
and top-p=1
.
Basic Prompts
You can already achieve a lot with prompts, but the quality of results depends on how much information you provide it. A prompt can contain information like the instruction
or question
you are passing to the model and include other details such as inputs
or examples
.
Here is a basic example of a simple prompt:
Prompt
The sky is
Output:
blue
The sky is blue on a clear day. On a cloudy day, the sky may be gray or white.
As you can see, the language model outputs a continuation of strings that make sense given the context "The sky is"
. The output might be unexpected or far from the task we want to accomplish.
This basic example also highlights the necessity to provide more context or instructions on what specifically we want to achieve.
Let’s try to improve it a bit:
Prompt:
Complete the sentence:
The sky is
Output:
so beautiful today.
Is that better? Well, we told the model to complete the sentence so the result looks a lot better as it follows exactly what we told it to do (“complete the sentence”). This approach of designing optimal prompts to instruct the model to perform a task is what’s referred to as prompt engineering.
The example above is a basic illustration of what’s possible with LLMs today. Today’s LLMs can perform all kinds of advanced tasks that range from text summarization to mathematical reasoning to code generation.
A Word on LLM Settings
When working with prompts, you will be interacting with the LLM via an API or directly. You can configure a few parameters to get different results for your prompts.
Temperature – In short, the lower the temperature the more deterministic the results in the sense that the highest probable next token is always picked. Increasing the temperature could lead to more randomness encouraging more diverse or creative outputs. We are essentially increasing the weights of the other possible tokens. In terms of application, we might want to use a lower temperature for something like fact-based QA to encourage more factual and concise responses. For poem generation or other creative tasks, it might be beneficial to increase the temperature.
Top_p – Similarly, with top_p, a sampling technique with temperature called nucleus sampling, you can control how deterministic the model is at generating a response. If you are looking for exact and factual answers keep this low. If you are looking for more diverse responses, increase to a higher value.
The general recommendation is to alter one, not both.
Before starting with some basic examples, keep in mind that your results may vary depending on the version of LLM you are using.
Standard Prompts
We have tried a very simple prompt above. A standard prompt has the following format:
<Question>?
This can be formatted into a QA format, which is standard in a lot of QA dataset, as follows:
Q: <Question>?
A:
Given the standard format above, one popular and effective technique for prompting is referred to as few-shot prompting where we provide exemplars. Few-shot prompts can be formatted as follows:
<Question>?
<Answer>
<Question>?
<Answer>
<Question>?
<Answer>
<Question>?
And you can already guess that its QA format version would look like this:
Q: <Question>?
A: <Answer>
Q: <Question>?
A: <Answer>
Q: <Question>?
A: <Answer>
Q: <Question>?
A:
Keep in mind that it’s not required to use QA format. The format depends on the task at hand. For instance, you can perform a simple classification task and give exemplars that demonstrate the task as follows:
Prompt:
This is awesome! // Positive
This is bad! // Negative
Wow that movie was rad! // Positive
What a horrible show! //
Output:
Negative
Few-shot prompts enable in-context learning which is the ability of language models to learn tasks given only a few examples. We will see more of this in action in the upcoming guides.
Elements of a Prompt
As we cover more and more examples and applications that are possible with prompt engineering, you will notice that there are certain elements that make up a prompt.
A prompt can contain any of the following components:
Instruction – a specific task or instruction you want the model to perform
Context – can involve external information or additional context that can steer the model to better responses
Input Data – is the input or question that we are interested to find a response for
Output Indicator – indicates the type or format of the output.
Not all the components are required for a prompt and the format depends on the task at hand. We will touch on more concrete examples in upcoming guides.
General Tips for Designing Prompts
Here are some tips to keep in mind while you are designing your prompts:
Start Simple
As you get started with designing prompts, you should keep in mind that it is an iterative process that requires a lot of experimentation to get optimal results. Using a simple playground like OpenAI’s or Cohere’s is a good starting point.
You can start with simple prompts and keep adding more elements and context as you aim for better results. Versioning your prompt along the way is vital for this reason. As we read the guide you will see many examples where specificity, simplicity, and conciseness will often give you better results.
When you have a big task that involves many different subtasks, you can try to break down the task into simpler subtasks and keep building up as you get better results. This avoids adding too much complexity to the prompt design process at the beginning.
The Instruction
You can design effective prompts for various simple tasks by using commands to instruct the model what you want to achieve such as “Write”, “Classify”, “Summarize”, “Translate”, “Order”, etc.
Keep in mind that you also need to experiment a lot to see what works best. Try different instructions with different keywords, contexts, and data and see what works best for your particular use case and task. Usually, the more specific and relevant the context is to the task you are trying to perform, the better. We will touch on the importance of sampling and adding more context in the upcoming guides.
Others recommend that instructions are placed at the beginning of the prompt. It’s also recommended that some clear separator like “###” is used to separate the instruction and context.
For instance:
Prompt:
### Instruction ###
Translate the text below to Spanish:
Text: "hello!"
Output:
¡Hola!
Specificity
Be very specific about the instruction and task you want the model to perform. The more descriptive and detailed the prompt is, the better the results. This is particularly important when you have a desired outcome or style of generation you are seeking. There aren’t specific tokens or keywords that lead to better results. It’s more important to have a good format and descriptive prompt. Providing examples in the prompt is very effective to get desired output in specific formats.
When designing prompts you should also keep in mind the length of the prompt as there are limitations regarding how long this can be. Thinking about how specific and detailed you should be is something to consider. Too many unnecessary details are not necessarily a good approach. The details should be relevant and contribute to the task at hand. This is something you will need to experiment with a lot. We encourage a lot of experimentation and iteration to optimize prompts for your applications.
As an example, let’s try a simple prompt to extract specific information from a piece of text.
Prompt:
Extract the name of places in the following text.
Desired format:
Place: <comma_separated_list_of_company_names>
Input: "Although these developments are encouraging to researchers, much is still a mystery. “We often have a black box between the brain and the effect we see in the periphery,” says Henrique Veiga-Fernandes, a neuroimmunologist at the Champalimaud Centre for the Unknown in Lisbon. “If we want to use it in the therapeutic context, we need to understand the mechanism."
Output:
Place: Champalimaud Centre for the Unknown, Lisbon
The input text is obtained from this Nature article.
Avoid Impreciseness
Given the tips above about being detailed and improving format, it’s easy to fall into the trap of wanting to be too clever about prompts and potentially creating imprecise descriptions. It’s often better to be specific and direct. The analogy here is very similar to effective communication — the more direct, the more effective the message gets across.
For example, you might be interested in learning the concept of prompt engineering. You might try something like:
Explain the concept of prompt engineering. Keep the explanation short, only a few sentences, and don't be too descriptive.
It’s not clear from the prompt above how many sentences to use and what style. You might still somewhat get good responses with the above prompts but the better prompt would be one that is very specific, concise, and to the point. Something like:
Use 2-3 sentences to explain the concept of prompt engineering to a high school student.
To do or not to do?
Another common tip when designing prompts is to avoid saying what not to do but say what to do instead. This encourages more specificity and focuses on the details that lead to good responses from the model.
Here is an example of a movie recommendation chatbot failing at exactly what I don’t want it to do because of how I wrote the instruction — focusing on what not to do.
Prompt:
The following is an agent that recommends movies to a customer. DO NOT ASK FOR INTERESTS. DO NOT ASK FOR PERSONAL INFORMATION.
Customer: Please recommend a movie based on my interests.
Agent:
Output:
Sure, I can recommend a movie based on your interests. What kind of movie would you like to watch? Do you prefer action, comedy, romance, or something else?
Here is a better prompt:
Prompt:
The following is an agent that recommends movies to a customer. The agent is responsible to recommend a movie from the top global trending movies. It should refrain from asking users for their preferences and avoid asking for personal information. If the agent doesn't have a movie to recommend, it should respond "Sorry, couldn't find a movie to recommend today.".
Customer: Please recommend a movie based on my interests.
Agent:
Output:
Sorry, I don't have any information about your interests. However, here's a list of the top global trending movies right now: [list of movies]. I hope you find something you like!
Text Summarization
One of the standard tasks in natural language generation is text summarization. Text summarization can include many different flavors and domains. One of the most promising applications of language models is the ability to summarize articles and concepts into quick and easy-to-read summaries. Let’s try a basic summarization task using prompts.
Let’s say I am interested to learn about antibiotics, I could try a prompt like this:
Prompt:
Explain antibiotics
A:
Output:
Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body’s immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance.
The “A:” is an explicit prompt format that’s used in question answering. I used it here to tell the model that there is an expected further. In this example, it’s not clear how this is useful vs not using it but we will leave it that for later examples. Let’s just assume that this is too much information and want to summarize it further. We can instruct the model to summarize into one sentence like so:
Prompt:
Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body’s immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance.
Explain the above in one sentence:
Output:
Antibiotics are medications used to treat bacterial infections by either killing the bacteria or stopping them from reproducing, but they are not effective against viruses and overuse can lead to antibiotic resistance.
Without paying too much attention to the accuracy of the output above, which is something we will touch on in a later guide, the model tried to summarize the paragraph in one sentence. You can get clever with the instructions but we will leave that for a later chapter. Feel free to pause here and experiment to see if you get better results.
Information Extraction
While language models are trained to perform natural language generation and related tasks, it’s also very capable of performing classification and a range of other natural language processing (NLP) tasks.
Here is an example of a prompt that extracts information from a given paragraph.
Prompt:
Author-contribution statements and acknowledgements in research papers should state clearly and specifically whether, and to what extent, the authors used AI technologies such as ChatGPT in the preparation of their manuscript and analysis. They should also indicate which LLMs were used. This will alert editors and reviewers to scrutinize manuscripts more carefully for potential biases, inaccuracies and improper source crediting. Likewise, scientific journals should be transparent about their use of LLMs, for example when selecting submitted manuscripts.
Mention the large language model based product mentioned in the paragraph above:
Output:
The large language model based product mentioned in the paragraph above is ChatGPT.
There are many ways we can improve the results above, but this is already very useful.
By now it should be obvious that you can ask the model to perform different tasks by simply instructing it what to do. That’s a powerful capability that AI product builders are already using to build powerful products and experiences.
Paragraph source: ChatGPT: five priorities for research
Question Answering
One of the best ways to get the model to respond to specific answers is to improve the format of the prompt. As covered before, a prompt could combine instructions, context, input, and output indicators to get improved results. While these components are not required, it becomes a good practice as the more specific you are with instruction, the better results you will get. Below is an example of how this would look following a more structured prompt.
Prompt:
Answer the question based on the context below. Keep the answer short. Respond "Unsure about answer" if not sure about the answer.
Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.
Question: What was OKT3 originally sourced from?
Answer:
Output:
Mice.
Context obtained from Nature.
Text Classification
So far, we have used simple instructions to perform a task. As a prompt engineer, you will need to get better at providing better instructions. But that’s not all! You will also find that for harder use cases, just providing instructions won’t be enough. This is where you need to think more about the context and the different elements you can use in a prompt. Other elements you can provide are input data
or examples
.
Let’s try to demonstrate this by providing an example of text classification.
Prompt:
Classify the text into neutral, negative or positive.
Text: I think the food was okay.
Sentiment:
Output:
Neutral
We gave the instruction to classify the text and the model responded with 'Neutral'
which is correct. Nothing is wrong with this but let’s say that what we really need is for the model to give the label in the exact format we want. So instead of Neutral
we want it to return neutral
. How do we achieve this. There are different ways to do this. We care about specificity here, so the more information we can provide the prompt the better results. We can try providing examples to specify the correct behavior. Let’s try again:
Prompt:
Classify the text into neutral, negative or positive.
Text: I think the vacation is okay.
Sentiment: neutral
Text: I think the food was okay.
Sentiment:
Output:
neutral
Perfect! This time the model returned neutral
which is the specific label I was looking for. It seems that the example provided in the prompt helped the model to be specific in its output. To highlight why sometimes being specific is important, checkout this example and spot the problem:
Prompt:
Classify the text into nutral, negative or positive.
Text: I think the vacation is okay.
Sentiment:
Output:
Neutral
What is the problem here?
Conversation
Perhaps one of the more interesting things you can achieve with prompt engineering is instructing the LLM system on how to behave, its intent, and its identity. This is particularly useful when you are building conversational systems like customer service chatbots.
For instance, let’s create a conversational system that’s able to generate more technical and scientific responses to questions. Note how we are explicitly telling it how to behave through the instruction. This is sometimes referred to as role prompting.
Prompt:
The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.
Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of blackholes?
AI:
Output:
Sure! Black holes are regions of spacetime where the gravitational force is so strong that nothing, not even light, can escape from it. They are created when a very massive star dies and its core collapses in on itself, forming a singularity of infinite density. The intense gravity of the singularity pulls in all the matter and radiation around it, creating the black hole.
Our AI research assistant sounds a bit too technical, right? Okay, let’s change this behavior and instruct the system to give more accessible answers.
Prompt:
The following is a conversation with an AI research assistant. The assistant answers should be easy to understand even by primary school students.
Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of black holes?
AI:
Output:
Sure! Black holes are created when a very massive star runs out of fuel and collapses in on itself. The gravity of the star is so strong that nothing, not even light, can escape its pull. This creates a point in space-time with an intense gravitational force called a black hole.
I think we made some progress. You can continue improving it. I am sure if you add more examples you might get even better results.
Code Generation
One application where LLMs are quite effective is code generation. Copilot is a great example of this. There are a vast number of code-generation tasks you can perform with clever prompts. Let’s look at a few examples below.
First, let’s try a simple program that greets the user.
Prompt:
/*
Ask the user for their name and say "Hello"
*/
Output:
let name = prompt("What is your name?");
console.log(`Hello, ${name}!`);
You can see that we didn’t even need to specify the language to use.
Let’s switch levels a bit. I want to show you how powerful LLMs can be with a little more effort in designing the prompts.
Prompt:
"""
Table departments, columns = [DepartmentId, DepartmentName]
Table students, columns = [DepartmentId, StudentId, StudentName]
Create a MySQL query for all students in the Computer Science Department
"""
Output:
SELECT StudentId, StudentName
FROM students
WHERE DepartmentId IN (SELECT DepartmentId FROM departments WHERE DepartmentName = 'Computer Science');
This is very impressive. In this case, we provided data about the database schema and asked it to generate a valid MySQL query.
Reasoning
Perhaps one of the most difficult tasks for an LLM today requires some form of reasoning. Reasoning is one of the areas that I am most excited about due to the types of complex applications that can emerge from LLMs.
There have been some improvements in tasks involving mathematical capabilities. That said, it’s important to note that current LLMs struggle to perform reasoning tasks so this requires even more advanced prompt engineering techniques. We will cover these advanced techniques in the next guide. For now, we will cover a few basic examples to show arithmetic capabilities.
Prompt:
What is 9,000 * 9,000?
Output:
81,000,000
Let’s try something more difficult.
Prompt:
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
Output
No, the odd numbers in this group add up to an odd number: 119.
That’s incorrect! Let’s try to improve this by improving the prompt.
Prompt:
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even.
Output:
Odd numbers: 15, 5, 13, 7, 1
Sum: 41
41 is an odd number.
Much better, right?