Skip to content

Sign Up

or

By submitting my information, I agree to the Privacy Policy and Terms of Service.

Already have an account?

D-GN CEO Johanna Cabildo on Why Smarter AI Starts With Better Data

Diana Paluteder

In a recent conversation with Johanna Cabildo, Co-founder and CEO of D-GN, we dug into why the future of AI may hinge less on algorithms and more on who’s behind the data.

While much of the industry focuses on model coordination or blockchain-based agent networks, D-GN is betting on something more fundamental: building high-quality, community-labeled training data to make AI sharper, faster, and better. 

In our chat, Cabildo shared how better data leads to better models, why decentralization matters for equity, and how D-GN is helping businesses gain a real edge in the AI race.


Your team emphasizes that smarter AI starts with better data, not bigger models. How did D-GN become focused on high-quality training data as its core mission?

You know, everyone’s chasing the next breakthrough model, but they’re missing the forest for the trees. We realized early on that the most sophisticated algorithm in the world is only as good as what you feed it. Think about it, would you rather have a brilliant chef working with spoiled ingredients or a good chef with the finest, freshest components?

The real turning point came during my work on the Saudi Aramco x droppGroup AI project. We watched droppGroup’s models outperform giants like OpenAI, not because we had more compute, but because our data pipelines were simply better: more accurate, more structured and purpose-built. It became painfully clear that even the top players were under pressure to cut corners, scraping the internet, chasing scale and sacrificing quality for speed. Their models were fast, but their results were often biased, inconsistent and expensive to run.

That’s when it hit me, we don’t need bigger GPUs, we need better data. Better collection. Better structure. Better delivery to the AI developer. That’s the origin of D-GN. Not just another AI company, but the data infrastructure that makes all AI smarter, faster and more trustworthy.

Many AI companies promise accuracy, but few tackle role-specific performance. Can you explain what “role-specific” training means in practice and why it matters in enterprise AI?

Role-specific training is like the difference between a general practitioner and a heart surgeon. Both are doctors, but you want the specialist when your life depends on it.

Take our work with autonomous vehicles – generic object recognition might identify a stop sign 85% of the time. But our Data Guardians don’t just label it “stop sign.” They annotate regional variations, weather conditions, vandalism patterns, visibility angles. When that data trains an AI model, it doesn’t just see stop signs, it understands them in context. That’s the difference between 85% accuracy and 99% accuracy, which in autonomous driving, literally saves lives.

For enterprises, this means their AI doesn’t just work in ideal conditions, it performs in the messy, complex real world where we actually live.

In a space obsessed with autonomous agents and coordination, D-GN has carved out a unique niche. Why did you deliberately steer away from the agent-layer hype?

Because agents are only as smart as their foundation. It’s like everyone’s building skyscrapers on quicksand and wondering why they keep falling down.

The agent layer is exciting – I get it. But what’s the value of an autonomous agent making decisions based on biased, incomplete or outright flawed data? The whole promise of the agentic economy is untethered autonomy and intelligent efficiency. You can’t achieve that if the underlying data is polluted by poor methods or compromised ethics. We decided to solve the foundational problem first. Once you have AI that truly understands the world accurately, then you can unleash it to act autonomously.

We’re not anti-agent. We’re pro-intelligence. And real intelligence starts with understanding reality correctly, which starts with data that reflects reality accurately.

D-GN’s datasets are designed to make AI smarter and more efficient. Can you share examples of clients using this data to improve task-specific results?

I can’t name specific clients due to NDAs, but I can share the impact patterns we’re seeing. One digital human platform saw their lip-sync accuracy jump from 72% to 99.2% after training on our continuously-captured facial movement dataset. That’s the difference between an uncanny valley gimmick and a believable virtual assistant based on a real human.

We had an emotion AI company reduce their cross-cultural misinterpretation errors by 73% using our globally-diverse emotional expression data. In customer service applications, misreading emotions doesn’t just create awkward interactions – it destroys trust and can escalate conflicts unnecessarily.

A game studio working on next-generation NPCs increased their behavioral realism scores by 95% after implementing our continuous learning pipeline for human micro-expressions and contextual responses. Without this kind of live, nuanced data, their characters felt robotic and players couldn’t maintain immersion.

The pattern is consistent: companies come to us when they are stuck in the gimmicky phase like say with digital humans that work in demos but fail in real-world scenarios. Our human-verified, continuously-learning emotional and behavioral datasets help them break through to actual commercial deployment – the difference between a novelty and a tool people actually need to use.

A lot of public attention is still focused on model architecture. What do you think is being overlooked when it comes to the actual substance of AI: the data?

The entire conversation about AI ethics, bias and safety is really a conversation about data quality, but nobody wants to admit it. 

You can build the most elegant, sophisticated model architecture in the world, but if you train it on biased data, you get biased AI. If you train it on incomplete data, you get brittle AI. If you train it on synthetic or recycled data, you get AI that’s increasingly disconnected from reality.

What’s being overlooked is that data isn’t just fuel for AI – it’s the DNA. It determines not just what AI can do, but who it serves, how it behaves and whether it actually makes the world better or just amplifies existing problems at scale.

In 2025, many organizations are trying to fine-tune AI for very specific tasks. How is your team building datasets that are not just clean, but context-rich and adaptable to nuanced roles?

We’ve built what we call a “human-in-the-loop” approach, but not the way most people think of it. Our Data Guardians aren’t just labeling data – they’re cultural translators, context providers, edge-case identifiers.

When we’re building a dataset for, say, content moderation, we don’t just flag “inappropriate content.” Our contributors identify cultural nuances, context-dependent appropriateness, evolving slang, regional sensitivities. The AI doesn’t just learn rules – it learns the subtle art of human judgment.

Our Quality Assurance Score (QAS) system ensures that contributors with deep domain expertise in specific areas are the ones shaping datasets for those domains. A medical professional labels medical data, a legal expert handles legal content, native speakers tackle multilingual challenges.

What infrastructure or tooling is needed to support the high-integrity data labeling D-GN aims for, and how do you maintain that quality at scale?

We’ve built our entire infrastructure on blockchain for a reason – immutable audit trails. Every annotation, every quality score, every contributor action is recorded permanently. You can trace any piece of labeled data back to who created it, when and with what quality metrics.

Our Dynamic Truth Discovery system uses AI to flag anomalies in real-time, but humans make the final calls. We’ve gamified the process – contributors earn reputation scores, unlock better missions, join elite squads. Quality isn’t just required, it’s rewarded and celebrated.

The key insight is that scale without quality is just scaled mediocrity. We’d rather have 10,000 highly skilled, motivated contributors than 100,000 people just clicking through tasks like bots.

You’ve previously talked about making AI development more equitable. How does your data approach help shift power away from centralized players and toward a more diverse ecosystem?

Data is power, and for too long, that power has been concentrated in the hands of a few tech giants who can afford to scrape the entire internet. We’re democratizing data creation.

Our Data Guardians retain ownership stakes in the datasets they help create. When an enterprise licenses a dataset, contributors get ongoing royalties in USDT. For the first time, the people whose intelligence and effort train AI systems actually benefit from that AI’s success.

More importantly, we’re creating data that represents diverse perspectives, cultures and contexts – not just what’s easily scrapable from English-language websites. This means AI trained on our data actually works for everyone, not just Silicon Valley’s worldview.

What’s your take on the current regulatory push around AI transparency and safety? How can better training data help companies stay compliant and ethical?

Regulators are asking the right questions, but most AI companies can’t answer them because they don’t know what’s in their training data. They scraped it from unknown sources, processed it through black boxes and hoped for the best.

Our onchain approach means every piece of training data has a provenance. Companies can prove their AI wasn’t trained on copyrighted material, biased datasets or questionable sources. They can demonstrate compliance not with promises, but with immutable proof.

Better training data isn’t just about better performance – it’s about building AI you can actually trust and regulate. I see a world where regulators demand all AI models are onchain from GPT to Grok.

Finally, with so much noise in the AI space, what’s one misconception about “training data” you’d like to clear up once and for all?

That more data always equals better AI. It doesn’t.

Quality beats quantity every single time. A thousand perfectly labeled, contextually rich examples will outperform a million poorly labeled ones. We’ve seen companies spend millions on massive datasets that actually make their AI worse because the data was noisy, biased or just wrong. As they say in computing, garbage in, garbage out.

But here’s the real differentiating factor that everyone’s missing: data variances. Think about it logically for a second – if we’re all training AI models on the same homogenized datasets, what do you get? The same predictable outputs. That’s not moving the human race forward or advancing the AI race.

The variances in data – the edge cases, cultural nuances, real-world messiness – that’s what separates best-in-class models from mediocre ones. It’s the difference between AI that works in controlled environments and AI that thrives in the chaos of actual human experience.

The future belongs to AI trained on smaller, smarter, human-verified datasets with rich variances – not bigger, dumber, scraped ones that flatten human complexity into algorithmic sameness. That’s not just our business model, it’s our mission, making AI actually intelligent, not just impressively large.

At D-GN, we’re building an ecosystem of partner organizations that share this vision – stakeholders who live by ‘AI for the good of humanity.’ Because human-verified, context-rich, purpose-built data with real-world variances isn’t just better – it’s the only path to AI we can actually trust with the decisions that matter.


Read more interviews here.  

Best Crypto Exchange for Intermediate Traders and Investors

  • Invest in cryptocurrencies and 3,000+ other assets including stocks and precious metals.

  • 0% commission on stocks - buy in bulk or just a fraction from as little as $10. Other fees apply. For more information, visit etoro.com/trading/fees.

  • Copy top-performing traders in real time, automatically.

  • eToro USA is registered with FINRA for securities trading.

30+ million Users worldwide
Securities trading offered by eToro USA Securities, Inc. (“the BD”), member of FINRA and SIPC. Cryptocurrency offered by eToro USA LLC (“the MSB”) (NMLS: 1769299) and is not FDIC or SIPC insured. Investing involves risk, and content is provided for educational purposes only, does not imply a recommendation, and is not a guarantee of future performance. Finbold.com is not an affiliate and may be compensated if you access certain products or services offered by the MSB and/or the BD

Latest posts

Finance Digest

By subscribing you agree with Finbold T&C’s & Privacy Policy

Related posts

Interviews

Trade, Swap & Stake Crypto on Uphold

Buy, sell, and swap crypto. Stake crypto, earn rewards and securely manage 300+ assets—all in one trusted platform. Terms apply. Capital at risk.

Get Started

IMPORTANT NOTICE

Finbold is a news and information website. This Site may contain sponsored content, advertisements, and third-party materials, for which Finbold expressly disclaims any liability.

RISK WARNING: Cryptocurrencies are high-risk investments and you should not expect to be protected if something goes wrong. Don’t invest unless you’re prepared to lose all the money you invest. (Click here to learn more about cryptocurrency risks.)

By accessing this Site, you acknowledge that you understand these risks and that Finbold bears no responsibility for any losses, damages, or consequences resulting from your use of the Site or reliance on its content. Click here to learn more.