This article is part of the New Beings series of articles from Zenitech and Stevens & Bolton, examining the practicalities, issues and possibilities of using artificial intelligence (AI) in development.
1. AI and intellectual property rights
There are two main points when considering intellectual property (IP) in the context of AI – protection and ownership.
The question of ownership of IP rights in AI generated content is not straightforward and, at the moment, varies between jurisdictions. In some countries material generated by AI is not eligible for copyright protection due to the lack of human authorship.
In the UK, "computer-generated works" are eligible for copyright protection and the author/owner of such computer-generated works is held to be the person by whom the arrangements necessary for the creation of the work are undertaken. However, the legislation dates from 1988 and does not necessarily translate well to modern, complex AI platforms.
Infringement and use of third-party IP
All AI systems need to be trained and learn from data. In some cases, that data is specialised/unique and provided by the parties developing the AI (e.g. medical data sets). However, in others, it is obtained from publicly available (but not necessarily freely usable) data – including unsophisticated scraping of the web.
Again, there is a divergence of how this is addressed in different jurisdictions. The UK government recently put its plans to introduce a data mining exemption to copyright infringement on hold, but on 29 June 2023 the UK Intellectual Property Office announced that work had started to develop a voluntary code of practice for copyright and AI; and that if the code of practice is not adopted, or agreement is not reached, legislation could be considered.
Rights holders across various jurisdictions are already bringing claims for infringement, including the claim brought by Getty Images against Stable Diffusion.
Of particular relevance to software developers is the extent to which an AI may have been trained on third-party or open-source code. Is it putting it into the code it is generating for you? Is the correct licence included?
Thaler v Comptroller General
Stephen Thaler created the AI known as DABUS (Device for the Autonomous Bootstrapping of Unified Sentience). The AI went on to create two new products – a food container that it constructed using fractal geometry (which would facilitate rapid reheating of food) and a flashing beacon that could be used for emergencies.
Thaler has been attempting to patent the products made by DABUS, but various courts around the world have ruled against him – saying that Thaler cannot patent something that wasn’t created by a human. The final appeal against the UK ruling was heard by the UK Supreme Court in March.
Whilst the case focuses on ownership and inventorship in the field of patents, the implications of the judgement could be huge. For example, what does this mean for the things you create for companies that use AI as part of the development process?
2. Liability and accountability
Then there’s the question of liability. Some questions that need to be considered include:
- If an AI is generating faulty/infringing code, is that because it was trained on bad code in the first place, or due to how it is generating the results?
- Is the data scientist or developer liable, or the company executives who signed off on its deployment?
- Can you even tell where the problem originated?
- If AI-generated code that is licensed out to another entity causes harm to an end user, would the licensor and/or licensee be liable?
- Who will compensate the injured parties? What position might the parties’ respective insurers adopt?
- What would be the applicable standard of care that would apply if AI is making the decisions that cause harm?
3. Privacy and data protection
AI requires access to large amounts of data to function, which raises concerns about the privacy and security of personal information. AI should comply with privacy and data protection laws to ensure that personal data is processed lawfully.
A number of the typical AI characteristics may appear to be at odds with the underlying principles of data protection law, including the principles of transparency, data minimisation and accountability:
- Transparency – it can be difficult for organisations to explain AI systems to individuals because of their inherent complexity.
- Data minimisation – are you striking the balance between data minimisation and statistical accuracy?
- Accountability – can you demonstrate compliance with GDPR and other data protection principles?
- Cross-border considerations – where is the data being processed? Can this be identified? Is it all UK-based?
4. Discrimination and bias
AI systems can perpetuate and amplify existing biases and prejudices, which could result in discriminatory practices. Addressing these issues requires a legal framework that addresses issues of fairness and transparency.
The Equality and Human Rights Commission (EHRC) has drawn attention to the fact that there is a real risk posed by the use of AI, with biases within the systems often stemming from its use of training data.
Imbalanced training data:
- Imbalanced training data can lead to discriminatory results. For example, if men are over-represented in the training data, women are statistically "less important" – this may impact the results of AI (e.g. by suggesting men are more likely to repay loans, if more men are represented).
- These issues will apply to any population under-represented in the training data. For example, if a facial recognition model is trained on a disproportionate number of faces belonging to a particular ethnicity and gender, it will perform better when recognising individuals in that group and worse on others.
- It may be possible to balance it out by adding or removing data about under/ overrepresented subsets of the population.
Training data could reflect past discrimination:
- For example, if, in the past, loan applications from women were rejected more frequently than those from men due to prejudice, then any model based on such training data is likely to reproduce the same pattern of discrimination.
- You could either modify the data, change the learning process, or modify the model after training.
5. Governance and regulation
The rapid development of AI requires an adequate legal framework to regulate its use, development, and deployment. The regulatory framework should balance innovation and safety while also promoting ethical practices. It’s a situation that’s evolving rapidly, with the UK government playing a particularly high-profile role.
AI is ever-changing and quickly adapting, so regulation is often trying to catch up. This means regulation can often fall behind the AI, and it can become unclear which regulations apply, whether new regulations will come into force etc.
UK – current legislation
Currently, in the UK, AI is governed by numerous different bits of legislation (data protection law, Equality Act 2010, product safety laws, consumer rights law, tort law, financial services regulation etc.), so it can be difficult to ascertain what is applicable and whether you are abiding by it.
White paper on regulating AI
The government recently (29 March 2023) released a white paper discussing the regulation of AI and proposing a change in regulatory approach.
It acknowledged industry concerns “that conflicting or uncoordinated requirements from regulators create unnecessary burdens and that regulatory gaps may leave risks unmitigated, harming public trust and slowing AI adoption.”.
It proposes a principles-based framework for regulators to interpret and apply to AI within their remits. The proposed principles are pro-innovation, proportionate, trustworthy, adaptable, clear and collaborative.
A strength of this approach is that regulators would still be able to exercise discretion and expert judgement regarding the relevance of each principle to their individual domains. Initially, the principles will be issued by the government on a non-statutory basis and applied by regulators within their remits.
Following a period of non-statutory implementation, and when parliamentary time allows, the government anticipates that it will want to strengthen and clarify regulators’ mandates by introducing a new duty requiring them to have due regard to the principles.
The government has also just published the Terms of Reference for a working group, the role of which will include “identifying, developing and codifying good practice on the use of copyright, performance and database material in relation to AI, including data mining”.
G7 leaders are calling for some international standards around AI (generative AI in particular).
In summary: the law is playing catch up with the development of AI. There are hefty issues to consider in terms of ownership, bias, copyright and IP protection, and regulation, and clarity will come from legal test cases in the coming months and years.
Application to development and testing
The wider issue of governance and regulation – while doubtless important – is less applicable to the specific uses of AI for software development that we have been looking at as part of this project. Similarly, discrimination, bias and the processing of personal data within AI systems has the potential to have hugely negative effects on society as the use of such tools become more widespread — but again is beyond the scope of this work.
Chat GPT, AutoGPT, CoPilot and Tabnine
The first thing to note is that there are variations in approach, even across just these three tools. This is not a problem per se, but a developer that uses multiple tools in a single project is potentially introducing a high degree of legal complexity should any issue or claims arise.
For example even as regards jurisdiction:
- AutoGPT makes use of OpenAI’s GPT-4 language model and both it and Chat GPT are covered by OpenAI’s terms. Those terms are governed by Californian law and require mandatory arbitration to resolve disputes arising from them.
- Copilot’s terms are governed by the law of British Columbia, Canada and again require arbitration of disputes under Canadian rules, and
- Tabnine’s terms are governed by Israeli law and users are quired to submit to the jurisdiction of the courts of Tel Aviv.
The overarching theme is that, as one would expect, the tools all go to huge lengths to exclude any liability whatsoever arising from their use and/or cap that liability at a low level.
In the case of OpenAI and Copilot, the terms state that the services are provided “AS IS”, which is language recognised under US law as excluding all implied warranties through which the law might otherwise protect a buyer. All three sets of terms also include broad exclusions of liability and indemnities in favour of the provider for loss and damage arising from customers’ use.
As an experimental open-source application that makes use of (and requires a subscription to) GPT-4, AutoGPT’s GitHub repository includes an overlaid disclaimer, which states that it:
“is an experimental application and is provided “as-is” without any warranty, express or implied. By using this software, you agree to assume all risks associated with its use, including but not limited to data loss, system failure, or any other issues that may arise.”
- OpenAI’s terms limit its liability to the greater of: the amount the user paid for the service that gave rise to the claim during the 12 months before the liability arose; or one hundred dollars (USD100), and
- Copilot’s total aggregate liability from any and all claims is limited to the total amount of fees paid by the customer in the 12 months immediately preceding the date the cause of action first arose.
- In all three cases the terms seek to impose a 12-month limitation on bringing claims arising from their use; and OpenAI and CoPilot also seek to exclude users from joining class actions relating to the services.
Tabnine is the odd one out insofar as it does give specific reassurance to users concerning the open source software that is uses to deliver code, saying:
“We note that as part of the development of the Services provided by Codotoa, Tabnine uses certain “Free and Open Source Software” or “FOSS”. In that respect Tabnine represents that its use of such FOSS is in compliance with the licence terms thereof (however Tabnine makes no other representations and/or warranties in connection with such FOSS.”
All the platforms are protective of their own technologies, expressly prohibiting the use of scraping/spidering technology to “steal” their underlying data. However, they take slightly differing approaches to their outputs and to what they will do with data uploaded by users.
OpenAI assigns all its right title and interest in its outputs to the user. However, it also makes clear that an output may not be unique and, therefore, that a user does not acquire exclusive rights to particular outputs or solutions.
Copilot retains all rights but instead grants the user a licence to outputs. It also states that customer grants Copilot:
- An irrevocable, perpetual licence to use or incorporate into the service any suggestions, enhancement requests, recommendations or other feedback provided by customer, and
- A worldwide, royalty-free, non-exclusive, irrevocable licence to reproduce, process and display the customer’s data in an aggregated and anonymized format for Copilot’s internal business purposes, including without limitation to develop and improve the service, the system and Copilot’s other products and services.
Tabnine also grants users a licence to outputs, rather than assigning rights. However, it does provide reassurance that if users give Tabnine permission to access their code for analysis and “Tailor Made Services”:
“such code shall be used by Tabnine solely in order to adjust and upgrade the standard Services to provide you, and you only, with the Tailor Made Services. No other users shall be granted with any access to the Tailor Made Services provided to you, …[and]…, any code provided by you to Tabnine shall not be stored and/or used by Tabnine, and (d) for the avoidance of doubt, except with respect to creating the Tailor Made Services, Tabnine shall not be granted any intellectual property rights in the code shared by you which was provided solely for the limited use by Tabnine for creating the Tailor Made Services”.
QA Testing Tools
In addition to these development tools we have also looked at some of the QA and testing tools used by the team, namely:
These tools are, to an extent, different to Chat GPT, AutoGPT, Copilot and Tabnine as their focus is more on testing code than creating it. However, from a legal perspective their user terms are similar. They seek to prevent users from reproducing the tools or using them for unauthorised purposes, and seek as far as possible to limit liability arising from their use.
They do, though, generally recognise the distinction between: (i) users’ own data that they upload into the tool, and (ii) the results/reports that the tools produce.
In summary, therefore, the legal relationship between AI platforms and users is generally one-sided and (even from the traditionally risk-averse point of view of a lawyer) the advice is best summed up as “proceed with caution”.
Of the development tools we have looked at Tabnine seems to have the clearest and best-drafted terms for software development. It also appears to have the lowest risk-profile, making a selling point of the fact it uses properly licensed open-source software as its sole training source.
There have already been some very high-profile cases in this area, most notably the ongoing US claim involving Copilot in the US, in which Microsoft and GitHub are defending allegations that Copilot infringes the copyright in a large number of publicly-available source-code repositories on which it has been trained. That claim may, from a US perspective at least, provide some helpful clarity as to how the courts might treat some of these issues, but in the meantime the complexity, cost and potential jurisdictional issues involved in such claims will make it difficult for any single developer to seek redress should liability arise from their use of AI platforms.
That being the case, developers should take what steps they can to control their “downstream” risk. In particular, it is usual for clients to require warranties around the code that is delivered to them – including as to authorship and whether it includes open-source code.
Using AI tools can almost immediately cut across those warranties, and it is interesting to note that OpenAI’s terms specifically state that it is a breach to represent that output from the service was human-generated when it is not. It would be advisable, therefore, for developers to look at the terms of their customer contracts to be clear about the warranties they are offering and, if they have not done so already, make specific allowance for the use of AI platforms.
As regards testing tools, the risks seem on the whole slightly lower, but it is still really important to be clear about which tools you are using and to check that their terms are compatible with your use.
Clearly these are incredible tools, but they do pose some very specific risks for developers. And just as the technical members of Zenitech’s team have highlighted the need to review carefully the outputs from AI platforms before putting them into use, the same is doubtless true from a legal perspective.
This article was first published on Zenitech and can be accessed here.