The boom in artificial intelligence-based technologies continues to shape industries around the world. Businesses in the midst of such change should be considering one pivotal question: Is AI technology’s real value rooted in its algorithm, or in the datasets with which it is trained? Here, we explore why the latter may hold the greater long-term value.
By 2033, artificial intelligence (AI) will have overtaken the Internet of Things in its share of the global frontier technology market. That is according to a report by UN Trade & Development (UNCTAD), which also estimates that the global AI market will soar from $189 billion in 2023 to a value of $4.8 trillion in the space of a decade.
What opportunities does AI present in the UK?
AI clearly offers a significant amount of opportunity and risk to businesses that are willing to research, develop, and invest in its capabilities, both globally and nationally.
In the UK, the private sector reportedly invests an average £200m into the nation’s AI sector a day. This, according to the Department for Science, Innovation and Technology, equates to an average of £8.3m every hour.
AI also features heavily in the UK’s economic plans, with Secretary of State for Science, Technology and Innovation Peter Kyle stating that the UK Government will be “putting AI in the driving seat to power the government’s Plan for Change and deliver better lives for everyone in the country”.
Why is valuing AI’s IP so challenging?
There are, however, many practical challenges that must be addressed if businesses are to develop, and commercially exploit, AI capabilities.
One such challenge will concern protecting and valuing the intellectual property (IP) behind AI technology, something that most businesses will need to consider as they look to grow, scale, and secure further investment.
Furthermore, the most valuable IP may not be in the most obvious place.
If we consider a ‘traditional’ software-as-a-service business, an immediate priority is often to review the technical effect of the system containing the algorithm and highlight any elements that may be suitable for patent protection, notwithstanding that software code is not recognised as patentable subject matter per se in many jurisdictions, including the EU. In addition, the patent system operates on completely different, and longer, timescales than software innovation, making it hard for patents to ‘keep up with’ innovation in software. The overall IP package can therefore be re-enforced by copyright on the code itself, design rights on the user interface and by any associated trade marks or branding.
Because of the inherent difficulties of patenting software, however, performance breakthroughs are often rapidly replicated or re-coded, and even commoditised, by competitors. As such, businesses choosing to make an algorithm the defining asset of their organisation will often have a narrow window in which to seek protection for their IP and, in so doing, offer a competitive edge.
Many AI-based technology businesses may, understandably, replicate the software-as-a-service approach. At a foundational level, algorithms are the ‘engines’ of AI, determining how the system will ‘learn’, adapt, and make decisions.
The performance capabilities of such systems have also increased significantly in recent years, due largely to advances in neural network architectures, transformer models, and optimisation techniques.
The big difference between traditional software and AI software, however, is the capability of AI to ‘learn’ from training data sets.
Why are AI training data so valuable?
In contrast to some of the algorithms themselves, high quality, proprietary datasets can be rare, expensive, and difficult to replicate. They can require significant amounts of human input to curate them and flush out any biases and inaccuracies. They can also require careful and sensitive navigation of usage licences and copyright issues.
These datasets are the backbone of most AI technologies, determining the quality of their output and providing the context, nuance and depth required for such technologies to function effectively in real-world applications.
Indeed, similar model architectures can yield dramatically different results depending on the data sets with which they are trained. Put nonsense in, and you can expect nonsense out, regardless of how sophisticated the algorithm is.
In sectors such as healthcare, finance, and legal services, the quality of datasets is even more crucial. Data in these industries are often highly regulated and siloed, making access to specialised, regulatory compliant, and responsibly sourced datasets a highly prized strategic advantage for AI technology businesses that wish to tap into these markets.
Many businesses are already alive to the value of training data as a valuable commodity. One San Francisco-based startup, for example, is launching an AI data marketplace that will “help content creators monetise their intellectual property for AI training while giving developers and businesses a way to source licensed training data,” according to an article by Digiday.
TechCrunch also states that the market for AI training data will grow from roughly $2.5bn to $30bn within a decade, demonstrating just how valuable legally compliant and properly curated datasets have the potential to be, although this figure is dwarfed by the value of AI quoted above.
Which is more valuable: AI algorithm or dataset?
The answer to this question will, of course, depend on the nature of the business. Algorithms are not irrelevant and retain value as data interpreters, particularly as they are made more efficient, scalable, and generalised.
Nevertheless, it is likely that any training data held by a business may be more valuable over the long term and should not be ignored as a valuable asset.
Whilst the technical effect of an algorithm can be copied, unique, evolving and content-specific datasets are less easy to replicate.
In the race to build smarter AI technology, it is therefore arguably datasets that could prove more capable of differentiating a business from its competitors and delivering a competitive advantage, thus making them the defining asset when it comes to protecting and commercialising a business’s IP.