Tips for optimizing your Auto-label prompts

There are a number of techniques you can use to improve the accuracy of your Auto-labels. Here are a few things to try:

Break down more complex questions into parts. For example, if you want to know if an article meets many criteria for inclusion and you want the Auto-labeler to provide a Yes or No answer, ask it to work through a set of questions related to each criteria. In the following example, we want to include articles that are systematic reviews or meta-analyses about the impacts of wildfires on health, the environment or economic factors. We can break this down and give the Auto-labeler the following prompt in the label's Question section:

Answer true or false for the following questions about this article.

1. Is this a systematic review or meta-analysis?

2. Does this focus on the impacts of wildfires?

3. Does this include at least one health, environmental or economic impact of wildfires?

If all of the answers are true, include this article. If any of the answers are false, exclude the article.

Try separate labels for each criteria. You may find that separating your eligibility prompt into a series of separate labels (e.g., one label for population, one for intervention, one for outcomes) is more effective and provides a more detailed dataset regarding reasons for inclusion and exclusion by the Auto-labeler.
Give the Auto-labeler examples: Providing examples of correct and/or incorrect answers might improve Auto-label performance.
Use another genAI tool like chatGPT, Microsoft Copilot or Google Gemini to improve your prompt. Provide the prompt and some examples of correct and incorrect answers and ask it to help you revise the prompt to achieve better results.
Filter out records without abstracts. When running the Auto-labeler on just citation data (not PDFs), it performs more accurately when it has abstracts, and not just titles to rely on. We recommend, when doing the initial human screening, to skip records without abstracts. In this way, when you set your article filters for the Auto-labeler, it will only include articles with abstracts. You might also consider skipping records with unusually short abstracts.
Review the Reasoning with Confidence. When this option is enabled for a label, you can review the Auto-labeler's reasoning process and its estimation of how likely the answer is correct (i.e. its confidence in its answer). To find this information, click on the individual article, scroll down to Auto-labels, and click on the dropdown arrow to open the reasoning text for that label.
Consider a scale of relevance to improve sensitivity. Rather than asking for a yes/no answer for inclusion, you can use a categorical label to ask the Auto-labeler to supply an answer based on a 5-point scale of relevance. In this way, you can filter results that the Auto-labeler found highly likely to be relevant (1) to unlikely to be relevant (5), including an 'undecidable' category. For more information about this approach, see Sandnor et al. (2024).