The use of generative AI in software development is on the rise but not widely understood. Danny Jeffrey and Paul Armstrong from Womble Bond Dickinson explains how these new tools are being used, as well as some of the risks associated with their application.

What is generative AI?

Generative AI is a branch of artificial intelligence able to produce novel and realistic content, such as text, images, or audio, in response to data inputted by users of the platform. Unlike traditional AI systems that perform specific tasks based on pre-defined rules or supervised learning, generative AI systems can learn from unlabelled data and generate new data that follows the same distribution, outputting data in variety of formats. Generative AI systems rely on advanced techniques, such as foundation models, which are large-scale neural networks (acting in a manner not dissimilar to the human brain) that can be pre-trained on large amounts of data and fine-tuned for various downstream tasks, producing contextually appropriate responses. Examples of foundation models include generative pretrained transformers (GPT), which can generate natural language text; variational autoencoders (VAE), which can generate images or sounds; and generative adversarial networks (GAN), which can generate realistic images or videos by pitting two neural networks against each other.

Generative AI has enormous potential for innovation and value creation across various industries and domains, such as marketing, healthcare, finance, entertainment, education, and art. Whilst it may seem from recent headlines that this technology is brand new, similar tools and models date back some years now, albeit they were not publicly accessible. There is a range of generative AI tools freely available on the market already including ChatGPT, a chatbot that can have natural and engaging conversations; DALL-E, an image generator that can create realistic images from text prompts; Stable Diffusion, an art generator that can create high-quality paintings from sketches; and GitHub Copilot, a code generator that can write software code from natural language descriptions. 

How is generative AI used in software development?

Generative AI has the potential to unleash developer productivity, enhance the customer experience, and foster innovation. According to research conducted by McKinsey, generative AI-assisted tools can deliver impressive time (and cost) savings in many common developer tasks, such as documenting code functionality (50% faster), writing new code (47% faster), and optimising existing code (63% faster). One of the most prolific examples of this, GitHub Copilot, is a cloud-based artificial intelligence tool which assists users by auto-completing and suggesting code. Not only that, but when provided with a programming problem described in a natural language, Copilot is capable of generating code that solves the problem. Whilst Copilot is not the only AI-assisted tool that can perform (some) of these functions, it does have some unique features which make it both particularly helpful and particularly risky to use – we will cover these in detail later.

Generative AI can also enable developers to create novel and diverse content, such as images, video, music, and speech.  This ability to create novel outputs has been deployed with a variety of functionalities, but Ubisoft's R&D department, La Forge, has widely touted its utility in video game development for particularly mundane world-building tasks. Ghostwriter, a tool developed in-house, alleviates the developers of one of video games production's most laborious tasks: writing NPC dialogue.  Whilst these particular tools aren’t used to write or orchestrate the wider lore or questlines, it can generate conversations between background NPCs based only on a short subject matter input by the developer. This unique tool allows scriptwriters and developers more time to focus on the narrative of the main storylines. 

Legal considerations

Generative AI poses some unique challenges and risks in software development that must be addressed by developers and their clients. Generative AI can raise some complex questions regarding the ownership, authorship, liability, and accountability of the generated code and content. For instance, who owns the intellectual property rights of the generated code and content? Who is responsible for any errors or damages caused by the generated code and content? How can developers ensure that they are not infringing on any existing copyrights or patents?

The primary concern with a generative AI-assisted software writing application is that its code suggestions can be lifted from open-source software, and by failing to identify or attribute the original work, it violates open-source licences. The implication, therefore, is that developers who use this sort of tool are subject to those same risks.

Whilst open-source software licences generally refer to a set of terms and conditions which stipulate end-user obligations when a component of open-source code is used in software, there are differing types which require different standards as to how they may be used and redistributed:

  • Permissive licences: these licences allow use with very few restrictions. Developers are able to modify and distribute the code on the basis that they provide attribution of the original code to the original developers
  • Copyleft licences: includes a reciprocity obligation stating that derivative works based on original code provided under a copyleft licence are released under the same terms and conditions as the original code, and that the source code containing changes must be available or provided upon request. It is these licences that commercial entities are particularly wary of, as it may require making in-house code publicly available.

Ultimately, what matters most in relation to open-source software licences, is that they require developers to provide attribution by including the original copyright notice. Given that some of these generative AI applications strips code of its licences, developers who use it are completely unaware that they may be violating licence terms. The question, then, is whether AI-assisted software development tools are inadvertently creating derivate works of copyleft-licensed code. This remains a question for the courts, and indeed there are ongoing cases in both the UK and US to this effect. Whilst the precise answer remains to be confirmed, it seems reasonable that a lot would depend on the length and comprehensiveness of the software's suggestions. The more complex and specific the suggestion, the more likely it is that it is a derivative work of copyleft-licensed code.

While there are a few players in this space, some tools have an approach to licences that means they are particularly risky when compared to other software options that are trained only on permissive licences. The latter kind also generally stick to standardised suggestions, and so are less likely to suggest code which can be traced back to a copyleft-licensed code.

Other challenges

Another problem that may arise from using these tools is that the application could copy code that has a security vulnerability and introduce it into code you are going to use. Without reviewing the code implemented by the assistive software properly, this could meaningfully impact the integrity of any code implemented by the developers.

What’s more, the code may not work or even make sense. By design, the application predictively suggests commonly-used code, but in doing so, may produce code with common mistakes. This can result in code which is either unnecessarily imprecise, or otherwise operates improperly. 

Software developers can overcome some of these issues by treating the code in the same way you would treat code produced by a junior developer. The code should be labelled as AI-generated, and treated as though it was produced by an employee that requires supervision.

Solutions

Some of the solutions to the problems outlined above include setting up your application so it blocks public code suggestions and sticks to using an internal source code bank only. It’s also clear that thoroughly testing and analysing all code, as if it was produced by a junior developer, is necessary. You can also run projects through licence-checking tools that analyse code for plagiarism. Incidentally, the way to combat exuberant AI-assisted software may well be more AI.

As in all areas of business where you’re trying something new, use common sense and develop a good understanding of how it works, as well as it’s applications and risks, before you use it in a commercial operation. It’s worth remembering - any piece of suggested code that is very clearly from another source, or even has comments still attached - shouldn’t be used.