Building the World's Most Powerful Open Source AI Model

On Monday, about a dozen engineers and officials from Databricks, a company that works in data science and AI, got together in conference rooms that Zoom linked to find out if they had been able to make a top AI language model. The team trained DBRX for months and about $10 million. DBRX is a big language model that works like the one behind OpenAI’s ChatGPT. But they wouldn’t know how strong their creation was until the last tests of its powers were over.

“We’ve surpassed everything,” Jonathan Frankle, chief neural network architect at Databricks and leader of the team that built DBRX, finally told the team. The team reacted with cheers, whoops, and applause emojis. Frankle doesn’t drink coffee very often, but she was making her way through an iced latte after staying up all night to write up the results.

Battle Approved Motors. Invest Today!!

Get a $250 Amazon Gift Card. Apply Today!

Databricks will make DBRX public under an open-source license, which means that other people can build on top of it. Frankle shared information that showed DBRX was better than all other open-source models in about a dozen tests that measured how well the AI model could answer general knowledge questions, understand what it read, solve hard logical problems, and write good code.

It was better than Meta’s Llama 2 and Mistral’s Mixtral, which are two of the most popular open-source AI models out there right now. Ali Ghodsi, CEO of Databricks, yelled “Yes!” when the scores came up. “Hold on, did we beat Elon’s thing?” In response, Frankle said that they had beaten the Grok AI model that Musk’s xAI had just made public. He also said, “I will consider it a success if we get a mean tweet from him.”

The researchers were startled to find DBRX identical to GPT-4 in several measurements. ChatGPT uses OpenAI’s closed model GPT-4, the greatest machine intelligence example. He beamingly added, “We’ve set a new state of the art for open-source LLMs.”

Pieces of Building

By going open-source, DBRX Databricks is giving a movement more power that is fighting against how private the biggest companies in the generative AI boom are being. Some competitors, like OpenAI and Google, have made their GPT-4 and Gemini large language models public so that others can use them. They say that this will encourage innovation by giving the technology to more researchers, entrepreneurs, startups, and established businesses.

Databricks also says it wants to be open about the work that went into making its open-source model. This is something Meta hasn’t done for some important information about how it made its Llama 2 model. The company will write a blog post about the work that went into making the model, and they also invited WIRED to spend time with Databricks engineers as they made important choices in the last few stages of training DBRX, which cost millions of dollars. That showed how hard and complicated it is to make a good AI model. However, new developments in the field look like they will make it cheaper in the future. This, along with the fact that open-source models like DBRX are available, shows that AI growth isn’t going to stop any time soon.

Ali Farhadi, CEO of the Allen Institute for AI, argues AI model building and training should be more transparent. In the past few years, companies have tried to get an edge over rivals by keeping more information about the field secret. He advises being ambiguous when worried about advanced AI models’ hazards. “Any effort to be open makes me happy,” says Farhadi. “I do think that a big part of the market will move toward open models.” This is what we need more of.

There is a reason why Databricks is so open. In the past year, tech giants like Google have quickly put in place new AI systems. However, Ghodsi says that many big companies in other fields have not yet widely used the technology on their data. He says that companies in finance, medicine, and other fields want tools like ChatGPT but are afraid to send sensitive data to the cloud. Databricks wants to help these businesses.

“The smarts to understand your data is what we call it,” Ghodsi says. Databricks can make DBRX fit the needs of a customer or start from scratch to make one that is just right for their business. He says it makes sense for big businesses to pay the price of building something like DBRX. “There’s a huge business chance for us there.” In July of last year, Databricks bought a startup called MosaicML that focuses on making AI models more quickly. Frankle and other people who worked on building DBRX joined the company. That big of a project had never been done before at either business.

Inside Workings

DBRX, like other big language models, is a very large artificial neural network that has been fed a huge amount of text data. An artificial neural network is a mathematical framework that is loosely based on biological neurons. In 2017, Google researchers built the transformer neural network, which revolutionized language machine learning. DBRX and other programs like it are based on it.

OpenAI researchers began training several transformer models on bigger quantities of online and other text soon after its creation. It may take months. Most crucially, they found that larger models and data sets produced smarter, more valuable, and more consistent output.

OpenAI and other leading AI businesses are focused on growth. The Wall Street Journal reports that OpenAI CEO Sam Altman has requested $7 trillion to create AI-specific processors. But when making a language model, size isn’t the only thing that counts. Frankle says that there are a lot of choices that go into making an advanced neural network. Some tips on how to train more efficiently can be found in study papers, and other information is shared within the community. Keeping thousands of computers linked by switches and fiber-optic lines that don’t always work right is especially hard.

Before the last training run was over, Frankle said, “You’ve got these unstable [network] switches that do terabits per second of bandwidth coming in from a bunch of different directions.” “Even someone who has worked in computer science their whole life would find it hard to understand.” The fact that Frankle and others at MosaicML are experts in this strange science may help explain why Databricks paid $1.3 billion for the startup last year.

Databricks may not be transparent about this because model data has a huge impact. A vice president at Databricks and the founder and CEO of MosaicML, Naveen Rao, says, “Data quality, data cleaning, data filtering, and data prep are all very important.” “Those are the main things that these models are used for.” That’s pretty much the most important thing for model quality.

AI experts are continually tweaking architecture to improve the latest AI models. A new design termed “mixture of experts,” which only responds to queries based on specific sections of a model, has enabled one of the biggest advances in recent years. This produces a simpler model to train and employ. About 136 billion DBRX parameters change during model training. Llama 2 has 70 billion characteristics, Mixtral 45 billion, and Grok 314 billion. However, DBRX only activates 36 billion of them for a typical query. Databricks claims that model tweaks to better exploit hardware improved training efficacy by 30 to 50%. The company claims it speeds up model responses and saves power.

Start-Up

The very technical art of teaching a huge AI model can come down to an emotional choice every once in a while. Two weeks ago, the Databricks team had to figure out how to get the most out of a model that was worth a lot of money.

DBRX had been training the model for two months on 3,072 powerful Nvidia H100s GPUs rented from a cloud source. It was already doing very well in a number of benchmarks, and it still had about a week’s worth of supercomputer time left.

Everyone on the team shared thoughts in Slack about how to use the computers for the rest of the week. One notion was to create a model that could write computer code or a smaller model for amusement. The team also thought about not adding any more features to the model and instead giving it carefully chosen data that would help it do better at a certain set of tasks. This is known as curriculum learning. They could also keep doing what they were doing and make the model bigger and, hopefully, more powerful. Some people on the team called the last choice the “fuck it” option, and they seemed excited about it.

Even though the conversation stayed friendly, strong views came up as each engineer pushed for their preferred method. Frankle skillfully led the team to the data-centric method in the end. And after two weeks, it looks like it paid off hugely. Frankle says, “The curriculum learning was better, it made a big difference.”

Frankle wasn’t as good at guessing what other things would happen with the project. He didn’t think DBRX would be very good at writing computer code because that wasn’t what the team was focusing on. He was so sure that if he was wrong, he would dye his hair blue. The results released on Monday showed that DBRX did better on normal coding tests than any other open AI model. “We have a really good code model on our hands,” he said at the big reveal on Monday. “I have an appointment today to dye my hair.”

Evaluation of Risk

The final version of DBRX is the most powerful AI model that has been made public so far. Anyone can use it or change it. (Well, unless they are a company with more than 700 million people. Meta also limits its own open-source AI model Llama 2 in this way.) A recent argument about the possible risks of AI that is smarter and more powerful has sometimes been about whether letting anyone use AI models could be too dangerous. Open models may be too easy for terrorists or thieves to utilize for cybercrime or making biological or chemical weapons, say experts. Databricks says its model has been safety-tested and will be further investigated.

Stella Biderman, executive head of EleutherAI, a collaborative research project that supports open AI research, says there isn’t much proof that being open makes risks higher. She and others have said that we don’t know how dangerous AI models are or what might make them dangerous. They think that more openness could help with this. As Biderman says, “Most of the time, there’s no reason to think that open models pose a substantially higher risk than existing closed models.”

This month, EleutherAI, Mozilla, and about 50 other groups and researchers sent an open letter to Gina Raimondo, the US Secretary of Commerce, asking her to make sure that any future rules about AI leave room for open-source AI projects. The letter stated that open models boost economic growth by helping new and small enterprises start. They also “speed up scientific research.”

DBRX might be able to do both, according to Databricks. Frankle says that DBRX may help people learn more about how AI works by giving other AI experts a new model to play with and helpful tips on how to make their own. His team is going to look into how the model changed during the last week of training. This could show how a strong model gains new skills. He says, “The science we get to do at this scale is what gets me most excited.”

Pieces of Building

Inside Workings

Start-Up

Evaluation of Risk

More Stories