Arkadium today launched GameLab, a new AI training and evaluation platform designed to help make AI more dependable in real-world environments.
Even today’s most popular AI models still hallucinate and make dumb mistakes. They need practical data, like reasoning and problem solving, on a voracious level. And Arkadium CEO Kenny Rosenblatt has figured out exactly what to feed these models: gaming data.
Through live gameplay benchmarking and access to one of the world’s largest datasets of complex human decision-making, GameLab helps AI labs measure, train, and improve model reasoning.
While today’s frontier AI models can write code, draft research, and solve complex math equations, many still struggle with planning, memory, and judgment. As AI expands into higher-stakes domains, models must operate in dynamic environments where conditions change, tradeoffs emerge, and decisions have consequences.
“I am having more fun now than I have in the last decade, and where most game developers are saying, ‘How do I use AI to make game production faster or cheaper, with my art or my coding?’ We’re flipping that whole thing on its head and saying, ‘How do we use games to make the models better, right?'” Rosenblatt said.
He added, “It’s so revealing to see what I’ve seen over the last year and work on what we’ve been working on.”
Rosenblatt paused to show me how some popular AI models play some of Arkadium’s popular games like Blockchamp, which is a Tetris-like puzzle game. He gave Elon Musk’s Grok all the instructions it needed to understand and play the game. But Grok only went a couple of moves before it made a mistake that any thinking human would never make. It dropped a puzzle piece in a way that it would cause a pile-up later in the game. The stupidity went on and on.
“We’re launching Game Lab, and we have several games that we’re launching with, and they’re simple games. We didn’t want to go with the high-fidelity Call of Duty style games. We wanted to start simple because the current models are terrible, even at simple games, and my whole purpose is that I want AI to be real-world ready. And it’s not there yet.”
GameLab is a dedicated division within Arkadium built to help close that gap. Powered by Arkadium’s gameplay ecosystem – 22 million monthly players, 1.7 billion gameplays per year, 150 billion human decisions, 200 billion gameplay images, more than 100 games, and 50+ player archetypes – GameLab gives AI developers access to one of the world’s richest datasets of human reasoning.
Unlike static internet data, gameplay captures how people make decisions, adapt to changing conditions, recover from mistakes, and solve problems over time. These signals can help train more capable AI models while providing a more realistic way to evaluate performance. GameLab has already signed agreements with the leading Frontier Labs.
“We launched GameLab to close the gap between where AI is today and where it needs to be tomorrow,” said Rosenblatt. “Through the power of gameplay, GameLab reveals the hidden capability gaps in today’s frontier models.”
Rosenblatt added, “While these systems are remarkably impressive, they still make basic mistakes, struggle with judgment under uncertainty, and frequently fail at long-term strategic decision-making when information is incomplete. GameLab doesn’t just identify these weaknesses, we help solve them. By combining real human gameplay data, rigorous evaluations, training pipelines, and interactive environments, we give AI developers the tools they need to build models that are more capable, reliable, and ultimately more real-world ready.”
What Gamelab offers to AI models

At launch, GameLab offers a suite of human data, benchmarking, training, and evaluation capabilities powered by Arkadium’s gameplay ecosystem, including:
- Human Data: Access to one of the world’s richest datasets of human decision-making generated from billions of gameplay interactions.
- Bespoke Solutions: Custom datasets and purpose-built games tailored to specific research objectives.
- Environments: Structured game environments for training, evaluation, and reinforcement learning.
- Benchmarking: Real-world model evaluation through head-to-head competitions, leaderboards, Cognitive Index Scores, and custom testing.
- RL & Training: Support for fine-tuning, reinforcement learning, and multimodal model development across language, vision, and world models.
GameLab launches with public leaderboards across three games (Block Champ, Daily Crossword, and Gin Rummy), with new games added every month. Each leaderboard compares leading AI models not only against one another, but also against an average human baseline. This allows GameLab to measure and visualize the gap between AI and human decision-making.
When models encounter games they have not been explicitly trained on, their performance often drops significantly, revealing limitations in reasoning, planning, and generalization. Humans, by contrast, can apply prior knowledge to unfamiliar situations and make effective decisions even under uncertainty.
Today’s most advanced AI models still struggle with this type of generalization. That is why games provide such a valuable testing environment: they are safe, measurable, and highly effective at exposing the strengths and weaknesses of AI cognition.
GameLab is built by a multidisciplinary team of game developers, designers, software engineers, data scientists, and AI/ML researchers operating inside Arkadium’s profitable, 25-year-old business, giving the team continuous access to real human gameplay, production-grade infrastructure, and a live player ecosystem that purpose-built research labs cannot easily replicate.
GameLab is available now at https://www.gamelab.com. Organizations interested in AI benchmarking, human decision-making datasets, or model training partnerships can learn more through the platform, or by contacting [email protected].
Examples of how far AI has to go to beat human players

The average score in the Blockchamps game for Grok was 165 points. The average human score is 1,379, and the topic human player was 707,000. There are similar results for every model. Claude, said to the the smartest, also made errors. The same goes for a crossword puzzle game.
“These models are great at coding. They can help you with research, but they’re not ready for the real world, and we’re just using games to showcase their lack of intelligence,” Rosenblatt said.
In Arkadium’s Gin Rummy, Rosenblatt got ChatGPT 5 to go up against DeepSeek — both popular AI models. DeepSeek performed the worst and it wins in Gin Rummy les than 1% of the time.
“We watch their strategic decision making. It’s really fun because they make such bozo mistakes,” Rosenblatt said. “DeepSeek performed the worst. It’s most important for a human to play gin rummy. It just cost us time. But it costs ChatGPT $2.27 [in token costs] to think through that game. That’s how expensive it is for these games to reason.”
Token-based cost estimates can be alarming, consider ChatGPT gets about 900 points in Gin Rummy on the leaderboards, while the average human player gets about 1,300 points.
The fix? Arkadium’s Gamelab teaches the AI models to think

Rosenblatt said Arkadium’s 25-year history of making games — and learning from the data of human gameplay — is so important. Rosenblatt started the company with his wife, Jessica Rovello, who is executive chair of the company. Together, they built it on the strength of simple games that were casual in nature and could be played on sites such as newspaper webistes. Over time, the branched out and pivoted and Gamelab represents another change.
“We have non personally identifiable information about human play, so. We can take our human play data and use that to train the models to understand human decision making. The AI that exists has used something called reinforcement learning, which is like what AlphaGo did. It just played to win, or like IBM Watson, which just played to crush an opponent,” Rosenblatt said.
Those computer programs didn’t learn how humans played. They just played to win.
“Our human data shows all the trajectories of a player who’s aggressive, a player who is a risk minimizer, a conservative player. So these player archetypes teach the AI other ways that humans reason,” Rosenblatt said.
Why this matters

Rosenblatt noted that when the AI models can solve these types of problems in games, they can use that knowledge to solve other problems better.
“This is the beautiful thing, it’s called generalizing. Now that I’ve gotten better at planning. I’ve used games that have what’s called long-horizon planning,” Rosenblatt said. “I can take that knowledge and apply it to realms outside of the game industry. Because it’s still the same skill set, right? So the whole point is not to build the best AI game playing agent. No, it’s to use games to learn skills that can be applied in other places.”
As an example, he noted that, after the team trained some open source models, those models got better at finance. It got better at coding. It got better at math, because these games, like gin rummy, have probabilities. They have strategy, so it’s generalizing outside of the game space
and then the data that you have makes the AI more capable.
Rosenblatt said that Arkadium’s data is very expensive. That’s because you can’t scrape it off the internet. He noted that many of the models being used now “scraped their way to intelligence.”
He added, “They watched every video and sucked in that data. The [data in games] is all data that is not scrapeable and behind the scenes, right? You can’t easily get these trajectories, and this is not bot playing bot. This is real human decision making data that is very hard to get. When I talk to researchers to get humans into their lab at $25 an hour. It’s very hard to fill up a room and get statistically significant number of participants.”
He said Arkadium’s 22 million users gets them human decision making data at scale to make their research more sound. Arkadium’s data focuses on the cognitive side, where games test visual processing, physical reasoning, memory, strategic decision making, long-horizon planning and more. Games can be good for that. But not every game helps. Some physical world data is useful in training robots how to operate in the real world. But games often have characters with superhuman capabilities, and those characters don’t obey the laws of physics, according to Jensen Huang, CEO of Nvidia, when he answered my question about whether games were useful for physical AI training. It turns out that kind of game data is not as useful in training robots to be smarter.
“Just give you an idea, when we put our human players against AI, humans win every time, and this is the difference between a specialized model like AlphaGo that only knew how to play one game versus humans and general models that do a bunch of things,” Rosenblatt said. “So our humans win against AI general models.”
Rosenblatt said, “To Jensen’s point, yes, on the robotic side, first-person shooter games and superhero games may be helpful for robotics, but the AI models today, they still don’t know how to think. They predict the next token. They don’t think, and we’re trying to show that you don’t need these high fidelity first person shooters in order to train models. We are giving them better data, so that they can learn how to do things in better ways. So we’re trying to make the models better.”
Investing in your data

Rosenblatt said that Arkadium has 100 games and a body of owned intellectual property and owned game data that has built up over 25 years based on play from 22 million humans. Rosenblatt caught on to the value of the data when the AI model companies started reaching out to ask for the data.
“I was like scratching my head, and then I went deep over the last year to try to understand this space and the value of the data,” he said.
The data contains more than 150 billion human decisions and more than 200 billion images. Collecting that data is the hard part. But Arkadium owns its own intellectual property. And Rosenblatt, a data engineer, knew the value of the data for analytics to improve the games. He didn’t have the foresight to anticipate the AI era. But he guesses that 98% of all game companies are failing to capture the data in the right way to train models that computers can understand.
“It started as a small effort, and then it just snowballed, because the number of partners just started doubling,” Rosenblatt said. “I would say it’s a significant effort within Arcadium at the moment.”
And it turns out that this is a game company where AI is actually creating jobs, rather than destroying them. The company is adding roles in research, data science, machine learning, reinforcement learning and more. The team has 86 people right now.
“I think we’re just getting started,” Rosenblatt said. “I’m an entrepreneur who focuses on profits, so I’m ready to scale as necessary, but I’m not going to overbuild, right? I’ve been around 25 years. I’ll be around another 25 years, because I’m fiscally responsible.”
Rosenblatt said his company is working with frontier labs. But there are several hundred more companies that are building specialized models. And those are potential customers for Arkadium’s data, Rosenblatt said. Gamelab is creating games that are good for AI to play, but Arkadium as a company is still creating games that are fun for humans.
“Engaging the end user is first and foremost. It has to be, and it just has this added benefit of capturing super valuable, non-personally identifiable data that helps the researchers advance their work,” Rosenblatt said.
In fact, less than 1% of all gaming companies make it to more than $20 million in revenue and last more than 20 years. Arkadium is unique that way.
Over time, Arkadium hopes to create more complex games for the AI to ingest, so that it can help AI learn cognitive challenges.
“Our focus is really, ‘How do we get more comfortable with AI in the real world?’ Do I want AI flying a commercial airline? No. Do I want AI operating on my ankle? No. The whole point of Game Lab is to show how far away it is from making smart decisions, and if we get to the day where it’s flawless, then my comfort level will grow with AI.”