Figure 1: In real-world applications, we believe a human-machine loop exists where humans and machines augment each other. We call it Augmented Artificial Intelligence.
How do we build and evaluate an AI system for real-world applications? In most AI research, evaluating AI methods involves a process of training, validation, and testing. Experiments are usually stopped when the models have good test performance on the reported data sets because the real-world data distribution is assumed to be modeled by the validation and test data. However, real-world applications are often more complicated than a single validation and training test process. The biggest difference is the constantly changing data. For example, wildlife datasets change in class composition all the time due to animal invasion, reintroduction, recolonization, and seasonal movements of animals. A model trained, validated and tested on existing datasets can easily break down when newly collected data contains new species. Fortunately, we have out-of-range detection methods that can help us detect samples of new species. However, when we want to extend the recognition capacity (ie, to be able to recognize new species in the future), the best we can do is to fine-tune the models with new annotations made. In other words, we need to incorporate human effort/annotations regardless of how the models perform on previous test sets.
When human annotations are unavoidable, real-world recognition systems become an endless loop data collection → annotation → model fitting (Figure 2). As a result, performing a single model evaluation step does not represent the true generalization of the entire recognition system because the model will be updated with new data annotations and a new round of evaluation will be performed. Given this loop, we think that instead of building a model with better test performancefocusing on how much human effort can be saved it is a more general and practical goal in real-world applications.
Figure 2: In the data collection, annotation, and model update loop, the goal of optimization becomes minimizing the human annotation requirement rather than single-pass recognition performance.
In the paper we published last year in Nature-Machine Intelligence , we discussed the incorporation of human-in-the-loop in wildlife recognition and proposed to examine the efficiency of human effort in model updates rather than simple test performance. For the demonstration, we designed a recognition framework that was a combination of active learning, semi-supervised learning, and human-in-the-loop (Figure 3). We also incorporated a time component into this framework to indicate that the recognition models were not stopped at any time step. Generally speaking, in the framework, at each time step, when new data is collected, a recognition model actively selects which data to annotate based on a prediction confidence metric. Low-confidence predictions are sent for human annotation, and high-confidence predictions are sent for further tasks or pseudo-labels for model updates.
Figure 3: Here we present an iterative recognition framework that can maximize the utility of modern image recognition methods and minimize reliance on manual annotations for model updating.
Regarding the efficiency of human annotation for model updates, we divide the evaluation into 1) the percentage of high-confidence predictions over validation (i.e., the human effort saved for the ‘annotation); 2) the accuracy of high-confidence predictions (ie, reliability); and 3) the percentage of novel categories that are detected as low-confidence predictions (i.e., sensitivity to novelty). With these three metrics, the optimization of the framework becomes minimizing human efforts (i.e., maximizing high confidence percentage) and maximizing model update performance and high confidence accuracy.
We reported a two-step experiment on a large-scale wildlife camera trap dataset collected in Mozambique National Park for demonstration purposes. The first step was an initialization step to initialize a model with only part of the data set. In the second step, a new dataset with known and new classes was applied to the initialized model. Following the framework, the model made predictions on the new dataset with confidence, where high-confidence predictions were trusted as pseudo-labels and low-confidence predictions were provided with human annotations. The model was then updated with pseudolabels and annotations and prepared for future time steps. As a result, the percentage of high-confidence predictions in the second-step validation was 72.2%, the accuracy of high-confidence predictions was 90.2%, and the percentage of new classes detected as to low confidence was 82.6%. In other words, our framework saved 72% of the human effort in annotating all the data in the second step. As long as the model was confident, 90% of the predictions were correct. In addition, 82% of new samples were successfully detected. Details of the framework and experiments can be found in the original paper.
Looking more closely at Figure 3, in addition to data collection – human annotation – model update loop, there’s another one human-machine loop hidden in the frame (Figure 1). This is a loop where both humans and machines constantly improve each other through model updates and human intervention. For example, when AI models cannot recognize new classes, human intervention can provide information to extend the model’s recognition capability. On the other hand, as AI models become more and more general, the requirement for human effort is reduced. In other words, the use of human effort becomes more efficient.
Moreover, the trust-based human-in-the-loop framework we proposed is not limited to new class detection, but can also help with problems such as long-tailed distribution and cross-domain discrepancies. As AI models feel less confident, human intervention steps in to help improve the model. Likewise, human effort is saved as long as the AI models feel safe, and sometimes human errors can even be corrected (Figure 4). In this case, the relationship between humans and machines becomes synergistic. Thus, the goal of AI development shifts from replacing human intelligence to mutually augmenting human and machine intelligence. We call this type of AI: Augmented Artificial Intelligence (A2I).
Ever since we started working on artificial intelligence, we’ve asked ourselves what are we creating AI for? At first, we believed that AI should ideally completely replace human effort in simple and tedious tasks such as large-scale image recognition and driving cars. So we’ve been pushing our models to something called “human-level performance” for a long time. However, this goal of replacing human effort is inherently to create opposition or a mutually exclusive relationship between humans and machines. In real-world applications, the performance of AI methods is limited by so many affecting factors such as long-tailed distribution, multi-domain discrepancies, label noise, weak supervision, out-of-distribution detection , etc. Most of these problems can be somewhat alleviated with appropriate human intervention. The framework we proposed is just one example of how these separate problems can be boiled down into high-confidence and low-confidence prediction problems and how human effort can be introduced into the entire AI system. We believe that it is not cheating or indulging in difficult problems. It is a more human-centric form of AI development, where the focus is on how much human effort is saved rather than how many test images a model can recognize. Before the realization of Artificial General Intelligence (AGI), we believe it is worthwhile to further explore the direction of human-machine interactions and A2It seems to me that AI can start to have more impacts in various practical fields.
Figure 4: Examples of high confidence predictions that do not match the original annotations. Many high-confidence predictions that were flagged as incorrect based on validation labels (provided by students and citizen scientists) were in fact correct upon closer inspection by wildlife experts.
Acknowledgments: We thank all co-authors of the paper “Iterative human and automated identification of wildlife images” for their contributions and discussions in the preparation of this blog. The views and opinions expressed in this blog are solely those of the authors of this article.
This blog post is based on the following paper published in Nature – Machine Intelligence:
 Miao, Zhongqi, Ziwei Liu, Kaitlyn M. Gaynor, Meredith S. Palmer, Stella X. Yu, and Wayne M. Getz. “Human and automated iterative identification of wildlife images”. Nature Machine Intelligence 3, no. 10 (2021): 885-895. (Link to preprint)
Ikaroa is proud to be at the forefront of Artificial Augmented Intelligence research. Recently, the Berkeley Artificial Intelligence Research team published a blog post titled ‘Rethinking Human-in-the-Loop for Artificial Augmented Intelligence’ which provides a comprehensive overview of the potential applications and implications of human-in-the-loop machine learning systems.
Recent advancements in machine learning and artificial intelligence have demonstrated promising results in tasks of greater complexity. However, traditional AI algorithms often require a large amount of data and an expensive infrastructure to train, deploy, and maintain the model. To address this issue, ‘human-in-the-loop’ powered machine learning systems have been proposed as an alternative to training costly deep learning networks.
In the blog, the researchers discuss various challenges related to human-in-the-loop machine learning systems. It examines how these problems can be minimized to improve the overall performance of artificial augmented intelligence systems and maximize the potential of machine learning models. Through a combination of more efficient data collection processes, better model selection, and careful planning around data privacy, the researchers explore various ways of incorporating human-in-the-loop algorithms into artificial augmented intelligence.
This research provides some invaluable insight into the usage of human-in-the-loop machine learning systems at the enterprise level. In a rapidly evolving technological landscape, where smart robots and artificially intelligent systems are becoming ever more embedded in our daily lives, the methods outlined in the blog can provide companies with a way to create cost-effective AI models that are both ethical and effective.
At Ikaroa, we believe that artificial augmented intelligence is the future. We are committed to pushing the boundaries of this technology and embracing it in order to create innovative solutions that will benefit society. We are constantly looking for ways to maximize the impact of human-in-the-loop machine learning systems for the betterment of our customers. We are proud to be part of the Berkeley Artificial Intelligence Research team’s mission to reshape the technology landscape and expand the capabilities of our products.