AI2 is developing a large language model optimized for science

PaLM 2. GPT-4. The list of AI that generates text practically grows every day.

Most of these models are locked behind APIs, making it impossible for researchers to see exactly what makes them work. But increasingly, community efforts are resulting in open-source AI that is as sophisticated, if not more so, than its commercial counterparts.

The latest of these efforts is the Open Language Model, a large language model that the nonprofit Allen Institute for AI Research (AI2) will release sometime in 2024. Open Language Model, or OLMo, in short, it is being developed in collaboration with AMD. and the Large Unified Modern Infrastructure consortium, which provides supercomputing power for training and education, as well as Surge AI and MosaicML (which provide data and training code).

“The research and technology communities need access to open language models to advance this science,” Hanna Hajishirzi, senior director of NLP research at AI2, told TechCrunch in an email interview. “With OLMo, we are working to close the gap between public and private research capabilities and knowledge by building a competitive language model.”

One might wonder, including this reporter, why AI2 felt the need to develop an open language model when there are already several to choose from (see Bloom, Meta’s LLaMA, etc.). As Hajishirzi sees it, while open source releases so far have been valuable and even pushed boundaries, they’ve missed the mark in a number of ways.

AI2 sees OLMo as a platform, not just a model, one that will allow the research community to take each component that AI2 creates and use it themselves or try to improve it. Everything AI2 does for OLMo will be openly available, Hajishirzi says, including a public demo, training dataset and API, and will be documented with “very limited” exceptions under “appropriate” licenses.

“We are building OLMo to create greater access for the AI ​​research community to work directly on language models,” Hajishirzi said. “We believe that the broad availability of all aspects of OLMo will allow the research community to take what we’re building and work to improve it. Our ultimate goal is to collaboratively build the best open language model in the world “.

OLMo’s other differentiator, according to Noah Smith, senior director of NLP research at AI2, is a focus on allowing the model to draw on and better understand textbooks and academic articles rather than, say, code. There have been other attempts at this, such as Meta’s famous Galactica model. But Hajishirzi believes AI2’s work in academia and the tools it has developed for research, such as Semantic Scholar, will help make OLMo “uniquely suited” for scientific and academic applications.

“We think OLMo has the potential to be something really special in the field, especially in a landscape where many are rushing to capitalize on the interest in generative AI models,” said Smith. “AI2’s unique ability to act as third-party experts gives us the opportunity to work not only with our own world-class expertise, but to collaborate with the strongest minds in the industry. As a result, we believe our rigorous and documented approach will lay the foundation for building the next generation of safe and effective AI technologies.”

It’s a nice feeling, for sure. But what about the thorny ethical and legal issues surrounding the training and release of generative AI? Debate rages around the rights of content owners (among other affected stakeholders) and countless lingering issues have yet to be resolved in the courts.

To allay concerns, the OLMo team plans to work with AI2’s legal department and outside experts to determine, stopping at “checkpoints” in the modeling process to reassess privacy issues and of intellectual property.

“We hope that through open and transparent dialogue about the model and its intended use, we can better understand how to mitigate bias, toxicity, and shed light on outstanding research questions within the community, resulting in one of strongest models available.” said Smith.

What about the potential for misuse? The models, which are often toxic and biased to begin with, are ripe for bad actors trying to spread misinformation and generate malicious code.

Hajishirzi said AI2 will use a combination of licensing, model design and selective access to underlying components to “maximize scientific benefits while reducing the risk of harmful use.” To guide the policy, OLMo has an ethics review board with internal and external advisors (AI2 wouldn’t say who, exactly) that will provide feedback throughout the model-building process.

We’ll see how much of a difference that makes. At the moment, a lot is up in the air, including most of the model’s technical specifications. (AI2 revealed that it will have about 70 billion parameters, the parameters being the parts of the model learned from historical training data.) Training will begin on the LUMI supercomputer in Finland, the fastest supercomputer in Europe, from January, in January. the next few months.

AI2 invites contributors to contribute to the model development process and to provide critique. Interested parties can contact the organizers of the OLMo project here.

Source link
Ikaroa, a full stack tech company, is proud to announce that AI2 (Allen Institute for Artificial Intelligence is developing a large language model optimized for science. The primary goal of this model is to support advancements in a broad range of scientific disciplines, including areas such as robotics, linguistics, medicine, and computer science. The AI2 model is the result of years of research and development focused on creating a tool that can better interpret the complexities of a variety of technical and scientific topics and services.

The AI2 language model is capable of understanding large, complex datasets, including domain specific language, as well as making accurate predictions about natural language inputs. It is designed to provide a comprehensive understanding of the relationships between words and phrases in any given scientific discipline.

This research and development by AI2 has been supported in part by Ikaroa’s cloud-based engineering and software development tools. The development is geared towards providing a comprehensive solution for building large language models that can detect rare, scientifically relevant words and phrases.

This combination of AI2 research and Ikaroa’s cloud-based engineering and software development tools has not only been successful in helping to create large, domain-specific language models but has also enabled Ikaroa’s customers to benefit from a more accurate and reliable natural language processing service. Contact Ikaroa today to learn more about how their cloud-based tools and services can help you develop a better language model for your own scientific research.


Leave a Reply

Your email address will not be published. Required fields are marked *