Overview of Sysrev's Auto-labeler
Sysrev has a built-in generative AI Auto-label feature that uses the OpenAI's GPT-4o model. This can be used to automate the labeling process. Sysrev generates an Auto-label report that allows users to compare auto-labeling results to human labeling, such that labels can be improved, assessed and optimized to maximize accuracy and provide a transparent assessment process.
There are a few important things to know about the Auto-label feature:
- Restricted to Premium and Enterprise Accounts: The Auto-labeler is only available to users with verified payment methods.
- It costs money to run the Auto-labeler: Running the Auto-labeler requires funds in your Sysrev project.
- The cost of running the Auto-labeler varies based on how many records you are labeling, the number of labels that are activated for auto-labeling and the amount of content that is being 'read' by the Auto-labeler (e.g., citation only vs full PDF), among other factors.
- Only owners and admins on the project can run the Auto-labeler: If your user status for a project is set at 'member', you will not be able to use the Auto-labeler.
- The Auto-labeler does not operate across labels: In other words, each label is independent from other labels. The Auto-labeler will not be able to 'read' answers from other labels.
- Be careful when editing your prompts if they've already run on large numbers of records: If you change an Auto-label—such as fixing a typo in the prompt, adding a category, or adjusting settings like “Require answer”—it will rerun on all previously labeled articles the next time you run it on the full (unfiltered) project. If you add new articles and do not change a previously run Auto-label, it will run only on the new articles to avoid unnecessary costs. To prevent reprocessing all records, run the Auto-labeler on a filtered set or avoid making any changes later in a project.
There are some useful settings that you can apply to control how the Auto-labeler runs, including:
- You can choose which Auto-labels to activate: Each label you create has settings to indicate whether the label should be included or ignored in the the auto-labeling process. Note that the default is to include the label in the auto-labeling process.
- You can choose what information the Auto-label should 'read': You can choose, at the label level, whether the Auto-labeler should consider citation information (e.g., title and abstract) only or should also 'read' attached content like PDFs.
- You can have the Auto-label pre-populate label answers or not: You can set whether or not the Auto-label fills in label answers. In this way, you can use the Auto-labeler to speed up human screening (rather than replace it), where human screeners verify or correct auto-label answers rather than starting from scratch.
- You can choose to Auto-label only a subset of records: You can set an article filter and/or limit the Auto-labeler to a certain number of records to control how many records are labeled in each run of the Auto-labeler .
- You can allow the Auto-label to skip articles when no category options are appropriate or it can't locate an answer: By turning off 'Require answer' for a categorical or string Auto-label, you allow the Auto-label to skip articles if no category options are a good match or it can't find an answer to a string label question. If 'Require answer' is on, it will be forced to choose an answer from the available options you give it. Consider providing a 'null' or 'not applicable' option in your categories if you want to require answers to a label.