Diffbot’s AI model doesn’t guess — it knows, thanks to a trillion-fact knowledge graph



Subscribe to our daily and weekly newsletters for the latest updates and exclusive content on cutting-edge AI coverage. Learn More









Diffbot, a small Silicon Valley company renowned for curating one of the largest indexes of web knowledge, has unveiled a groundbreaking AI model designed to tackle the challenge of factual accuracy in the field.



The new model, based on Meta’s LLama 3.3, introduces an open-source implementation called Graph Retrieval-Augmented Generation (GraphRAG).



Unlike traditional AI models that rely on static training data, Diffbot’s LLM leverages real-time data from its Knowledge Graph, a dynamic database with over a trillion interconnected facts.



According to Mike Tung, Diffbot’s CEO, the goal is to distill general-purpose reasoning into approximately 1 billion parameters, emphasizing the model’s ability to query external knowledge sources rather than storing vast amounts of data internally.



How it works



Diffbot’s Knowledge Graph is an automated database that continuously crawls the web, categorizing entities and extracting structured information through computer vision and NLP.



Refreshed every few days with new data, the Knowledge Graph serves as a real-time resource for Diffbot’s AI model, enabling it to retrieve information dynamically rather than relying solely on preloaded knowledge.



By querying live sources for up-to-date information, the model aims to enhance accuracy and transparency compared to traditional LLMs.



How Diffbot’s Knowledge Graph beats traditional AI at finding facts



Diffbot’s model has demonstrated impressive results, achieving an 81% accuracy score on FreshQA and 70.36% on MMLU-Pro benchmarks, surpassing other AI models.



Notably, Diffbot has made its model fully open-source, enabling organizations to customize and deploy it internally, addressing concerns about data privacy and vendor lock-in.



Open-source AI could transform how enterprises handle sensitive data



Amid criticisms of large language models generating false information, Diffbot’s approach offers a path focused on factual grounding rather than sheer model size.



Industry experts see potential in Diffbot’s Knowledge Graph for enterprise applications requiring accuracy and auditability, with clients like Cisco, DuckDuckGo, and Snapchat already benefiting from the platform.



The open-source release of the model on GitHub and a public demo at diffy.chat provide accessibility, with deployment options ranging from a single Nvidia A100 GPU to dual H100 GPUs.



Looking ahead, Tung envisions AI’s future in better organizing and accessing human knowledge, emphasizing the importance of real-time information over static data for improved decision-making.



Diffbot’s innovative approach challenges the status quo in AI development, emphasizing the value of accuracy and transparency over model size.