These startups are building cutting-edge AI models without the need for a data center

By: blockbeats|2025/05/01 18:05:30

Researchers have utilized GPUs distributed globally, combined with private and public data, to train a new type of large language model (LLM). This move indicates that the mainstream approach to building artificial intelligence may be disrupted.

Two unconventional AI-building startups, Flower AI and Vana, collaborated to develop this new model, named Collective-1.

Flower's developed technology allows the training process to be distributed across hundreds of connected computers over the internet. The company's tech has been used by some firms to train AI models without the need for centralized computing resources or data. Vana, on the other hand, provided data sources such as private messages on X, Reddit, and Telegram.

By modern standards, Collective-1 is relatively small-scale, with 7 billion parameters—these parameters collectively empower the model—compared to today's most advanced models (such as those powering ChatGPT, Claude, and Gemini) with hundreds of billion parameters.

Nic Lane, a computer scientist at the University of Cambridge and co-founder of Flower AI, stated that this distributed approach is expected to scale well beyond Collective-1. Lane added that Flower AI is currently training a 300 billion parameter model with conventional data and plans to train a 1 trillion parameter model later this year—approaching the scale offered by industry leaders. "This could fundamentally change people's perception of AI, so we are going all-in," Lane said. He also mentioned that the startup is incorporating images and audio into training to create multimodal models.

Distributed model building may also shake up the power dynamics shaping the AI industry.

Currently, AI companies construct models by combining massive training data with large-scale computing resources centralized in data centers. These data centers are equipped with cutting-edge GPUs and interconnected via ultra-high-speed fiber-optic cables. They also heavily rely on datasets created by scraping public (though sometimes copyrighted) materials such as websites and books.

This approach implies that only the wealthiest companies and nations with a large number of powerful chips can effectively develop the most robust, valuable models. Even open-source models like Meta's Llama and DeepSeek's R1 are constructed by companies with large data centers. A distributed approach could allow small companies and universities to build advanced AI by aggregating homogeneous resources. Alternatively, it could enable countries lacking traditional infrastructure to build stronger models by networking multiple data centers.

Lane believes that the AI industry will increasingly move towards allowing training in novel ways that break out of a single data center. The distributed approach "allows you to scale computation in a more elegant way than a data center model," he said.

Helen Toner, an AI governance expert at the Emerging Technology Security Center, stated that Flower AI's approach is "interesting and potentially quite relevant" to AI competition and governance. "It may be hard to keep up at the cutting edge, but it may be an interesting fast-follower approach," Toner said.

Divide and Conquer

Distributed AI training involves rethinking how computation is allocated to build powerful AI systems. Creating LLMs requires feeding a model large amounts of text, adjusting its parameters to generate useful responses to prompts. In a data center, the training process is segmented to run parts of tasks on different GPUs and then periodically aggregated into a single master model.

The new approach allows work typically done in large data centers to be performed on hardware potentially miles apart and connected by relatively slow or unreliable internet connections.

Some major companies are also exploring distributed learning. Last year, Google researchers demonstrated a new scheme called DIstributed PAth COmposition (DiPaCo) for segmenting and integrating computation to make distributed learning more efficient.

To build Collective-1 and other LLMs, Lane collaborated with academic partners in the UK and China to develop a new tool called Photon to make distributed training more efficient. Lane stated that Photon enhances Google's approach by adopting a more efficient data representation and shared and integrated training schemes. This process is slower than traditional training but more flexible, allowing for the addition of new hardware to accelerate training, Lane said.

Photon was developed through a collaboration between researchers at Beijing University of Posts and Telecommunications and Zhejiang University. The team released the tool under an open-source license last month, allowing anyone to use this approach.

As part of Flower AI's efforts in building Collective-1, their partner Vana is developing a new method for users to share their personal data with AI builders. Vana's software enables users to contribute private data from platforms like X and Reddit to the training of large language models, specifying potential final uses and even receiving financial benefits from their contributions.

Anna Kazlauskas, co-founder of Vana, stated that the idea is to make unused data available for AI training while giving users more control over how their information is used in AI. "This data is usually unable to be included in AI models because it's not public," Kazlauskas said. "This is the first time that data contributed directly by users is being used to train foundational models, with users owning the AI model created from their data."

University College London computer scientist Mirco Musolesi has suggested that a key benefit of distributed AI training approaches may be unlocking novel data. "Extending this to cutting-edge models will allow the AI industry to leverage vast amounts of distributed and privacy-sensitive data, such as in healthcare and finance, for training without the risks of centralization," he said.

Naval personally stepped in as the chairman of the USVC Investment Committee. This SEC-registered fund launched by AngelList attempts to bring top private tech assets like OpenAI, Anthropic, and xAI to the general public with a $500 entry threshold. It is not just a new fund, but a structural experi...

a16z Crypto: 9 Charts to Understand the Evolution Trends of Stablecoins

Stablecoins are evolving from trading tools into universal payment infrastructure, and this process is quieter and more thorough than most people expected.

Refutation of Yang Haipo's "The End of Cryptocurrency"

This may be the true test of cryptocurrency. It's not about whether the price has reached a new high, nor about who will achieve financial freedom in the next bull market, but rather whether, after all the grand narratives have been washed away by cycles, it can still leave behind some simpler, more...

Can a hairdryer earn $34,000? Interpreting the reflexivity paradox of prediction markets

Prediction markets are essentially betting on reality, and when participants can access or even influence this path earlier, the market no longer just reflects reality but begins to shape it in return.

6MV Founder: In 2026, the "landmark turning point" for crypto investment has arrived

"I will deploy funds in 2026, so I will tell you this is the best year in history."

Abraxas Capital Mints $2.89 Billion USDT: Liquidity Boost or Just More Stablecoin Arbitrage?

Abraxas Capital just received $2.89 billion in freshly minted USDT from Tether. Is this a bullish liquidity injection for crypto markets, or is it business as usual for a stablecoin arbitrage giant? We analyze the data and the likely impact on Bitcoin, altcoins, and DeFi.

A VC from the Crypto world said AI is too crazy, and they are very conservative

Amid the Crypto frenzy and with investors who once missed out on Pinduoduo, a new AI fund called Impa Ventures was established, rejecting bubble narratives and adhering to a conservative "problem-first" strategy to seek real business value.

The Evolutionary History of Contract Algorithms: A Decade of Perpetual Contracts, the Curtain Has Yet to Fall

The ten-year evolution of perpetual contracts: from pulling the plug on 312 to the shocking short squeeze of TRB, a deep dive into the pricing machine that averages $200 billion daily, written with countless liquidations and real money, detailing the blood and tears of risk control theory.

Kicked out by PayPal, Musk aims to make a comeback in the cryptocurrency market

Cashtags generated a trading volume of 1 billion dollars just a few days after its launch, marking a strong start for Musk's super app strategy. For the cryptocurrency market, X's layout may be one of the most anticipated sources of retail growth after the meme coin craze subsides.

Solana ETF News: What Is a Solana ETF and Why Is Goldman Sachs Betting $108 Million on SOL?

Solana ETF news today shows Goldman Sachs disclosed a $108M position while total SOL ETF inflows reached $1.45B. Analysts now expect up to $6B in institutional demand as Solana trades 71% below its all-time high.

Bitcoin ETF News Today: $2.1B Inflows Signal Strong Institutional Demand for BTC

Bitcoin ETFs news recorded $2.1B inflows over 8 consecutive days, marking one of the strongest recent accumulation streaks. Here’s what the latest Bitcoin ETF news means for BTC price and whether the $80K breakout level is next.

Michael Saylor: Winter is Over – Is He Right? 5 Key Data Points (2026)

Michael Saylor tweeted yesterday “Winter‘s Over.” It is short. It is bold. And it has the crypto world talking.

But is he right? Or is this just another CEO pumping his bags?

Let us look at the data. Let us be neutral. Let us see if the ice has really melted.

WEEX Bubbles App Now Live Visualizes the Crypto Market at a Glance

WEEX Bubbles is a standalone app designed to help users quickly understand complex crypto market movements through an intuitive bubble visualization.

Polygon co-founder Sandeep: Writing after the chain bridge chain explosion

In three weeks, Drift, Hyperbridge, and KelpDAO were consecutively hacked, resulting in nearly $900 million in losses. Polygon's CEO wrote that the problem lies not with any single team, but with the "notary" style architecture shared by the entire industry—relying on one or two signers to stamp cro...

Major Upgrade on Web: 10+ Advanced Chart Styles for Deeper Market Insights

To deliver more powerful and professional analysis tools, WEEX has rolled out a major upgrade to its web trading charts—now supporting up to 14 advanced chart styles.

Morning Report | Aethir secures a $260 million enterprise contract with Axe Compute; New Fire Technology acquires Avenir Group's trading team; Polymarket's trading volume surpassed by Kalshi

Overview of Important Market Events on April 23

Why a Million-Follower Crypto KOL Chooses WEEX VIP?

Discover why top crypto KOL Carl Moon partnered with WEEX. Explore the WEEX VIP ecosystem, 1,000 BTC protection fund, and exclusive rewards for serious traders.

CoinEx Founder: The Crypto Endgame in My Eyes

The industry will not disappear, but it will shrink significantly.