Global Courant
“Learn to code.” That three-word pejorative is perpetually on the lips and at the fingertips of internet trolls and tech bros whenever media layoffs are announced. A useless sentiment in its own right, but with the recent advent of code generating AIs, knowing the ins and outs of a programming language like Python could soon be about as useful as knowing how to fluently speak a dead language like Sanskrit. In fact, these genAIs are already helping professional software developers code faster and more effectively by handling much of the programming grunt work.
How coding works
Two of today’s most widely distributed and written coding languages are Java and Python. The former almost single-handedly revolutionized cross-platform operation when it was released in the mid-’90s and now drives “everything from smartcards to space vehicles,” according to Java Magazine in 2020 — not to mention Wikipedia’s search function and all of Minecraft. The latter actually predates Java by a few years and serves as the code base for many modern apps like Dropbox, Spotify and Instagram.
They differ significantly in their operation in that Java needs to be compiled (having its human-readable code translated into computer-executable machine code) before it can run, while Python is an interpreted language which means that its human code is converted into machine code line-by-line as the program executes, enabling it to run without first being compiled. The interpretation method allows code to be more easily written for multiple platforms while compiled code tends to be focused on a specific processor type. Regardless of how they run, the actual code-writing process is nearly identical between the two: somebody has to sit down, crack open a text editor or Integrated Development Environment (IDE) and actually write out all those lines of instruction. And up until recently, that somebody was typically a human.
The “classical programming” writing process of today isn’t that different from the process those of ENIAC, with a software engineer taking a problem, breaking it down into a series of sub-problems, writing code to solve each of those sub-problems in order, and then repeatedly debugging and recompiling the code until it runs. “Automatic programming,” on the other hand, removes the programmer by a degree of separation. Instead of a human writing each line of code individually, the person creates a high-level abstraction of the task for the computer to then generate low-level code to address. This differs from “interactive” programming, which allows you to code a program while it is already running.
Today’s conversational AI coding systems, like what we see in Github’s Copilot or OpenAI’s ChatGPT, remove the programmer even further by hiding the coding process behind a veneer of natural language. The programmer tells the AI what they want programmed and how, and the machine can automatically generate the required code.
Among the first of this new breed of conversational coding AIs was Codex, which was developed by OpenAI and released in late 2021. OpenAI had already implemented GPT-3 (precursor to GPT-3.5 that powers BingChat public) by this point, the large language model remarkably adept at mimicking human speech and writing after being trained on billions of words from the public web. The company then fine-tuned that model using 100-plus gigabytes of GitHub data to create Codex. It is capable of generating code in 12 different languages and can translate existing programs between them.
Codex is adept at generating small, simple or repeatable assets, like “a big red button that briefly shakes the screen when clicked” or regular functions like the email address validator on a Google Web Form. But no matter how prolific your prose, you won’t be using it for complex projects like coding a server-side load balancing program — it’s just too complicated an ask.
Google’s DeepMind developed AlphaCode specifically to address such challenges. Like Codex, AlphaCode was first trained on multiple gigabytes of existing GitHub code archives, but was then fed thousands of coding challenges pulled from online programming competitions, like figuring out how many binary strings with a given length do not contain consecutive zeroes.
To do this, AlphaCode will generate as many as a million code candidates, then reject all but the top 1 percent to pass its test cases. The system will then group the remaining programs based on the similarity of their outputs and sequentially test them until it finds a candidate that successfully solves the given problem. Per a 2022 study published in Science, AlphaCode managed to correctly answer those challenge questions 34 percent of the time (compared to Codex’s single-digit success on the same benchmarks, that’s not bad). DeepMind even entered AlphaCode in a 5,000-competitor online programming contest, where it surpassed nearly 46 percent of the human competitors.
Now even the AI has notes
Just as GPT-3.5 serves as a foundational model for ChatGPT, Codex serves as the basis for GitHub’s Copilot AI. Trained on billions of lines of code assembled from the public web, Copilot offers cloud-based AI-assisted coding autocomplete features through a subscription plugin for the Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs).
Initially released as a developer’s preview in June of 2021, Copilot was among the very first coding capable AIs to reach the market. More than a million devs have leveraged the system in the two years since, GitHub’s VP of Product Ryan J Salva, told Engadget during a recent interview. With Copilot, users can generate runnable code from natural language text inputs as well as autocomplete commonly repeated code sections and programming functions.
Salva notes that prior to Copilot’s release, GitHub’s previous machine-generated coding suggestions were only accepted by users 14 – 17 percent of the time, “which is fine. It means it was helping developers along.” In the two years since Copilot’s debut, that figure has grown to 35 percent, “and that’s netting out to just under half of the amount of code being written (on GitHub) — 46 percent by AI to be exact.”
“(It’s) not a matter of just percentage of code written,” Salva clarified. “It’s really about the productivity, the focus, the satisfaction of the developers who are creating.”
As with the outputs of natural language generators like ChatGPT, the code coming from Copilot is largely legible, but like any large language model trained on the open internet, GitHub made sure to incorporate additional safeguards against the system unintentionally producing exploitable code.
“Between when the model produces a suggestion and when that suggestion is presented to the developer,” Salva said, “we at runtime perform … a code quality analysis for the developer, looking for common errors or vulnerabilities in the code like cross-site scripting or path injection.”
That auditing step is meant to improve the quality of recommended code over time rather than monitor or police what the code might be used for. Copilot can help developers create the code that makes up malware, the system won’t prevent it. “We’ve taken the position that Copilot is there as a tool to help developers produce code,” Salva said, pointing to the numerous White Hat applications for such a system. “Putting a tool like Copilot in their hands … makes them more capable security researchers,” he continued.
As the technology continues to develop, Salva sees generative AI coding to expand far beyond its current technological boundaries. That includes “taking a big bet” on conversational AI. “We also see AI-assisted development really percolating up into other parts of the software development life cycle,” he said, like using AI to autonomously repair a CI/CD build errors, patch security vulnerabilities, or have the AI review human-written code.
“Just as we use compilers to produce machine-level code today, I do think they’ll eventually get to another layer of abstraction with AI that allows developers to express themselves in a different language,” Salva said. “Maybe it’s a natural language like English or French, or Korean. And that then gets ‘compiled down’ to something that the machines can understand,” freeing up engineers and developers to focus on the overall growth of the project rather than the nuts and bolts of its construction.
From coders to gabbers
With human decision-making still firmly wedged within the AI programming loop, at least for now, we have little to fear from having software written. As Salva noted, computers already do this to a degree when compiling code, and digital gray geese have yet to take over because of it. Instead, the most immediate challenges facing programming AI mirror those of generative AI in general: inherent biases skewing training data, model outputs that violate copyright, and concerns surrounding user data privacy when it comes to training large language models.
GitHub is far from alone in its efforts to build an AI programming buddy. OpenAI’s ChatGPT is capable of generating code — as are the already countless indie variants being built on top of the GPT platform. So too is Amazon’s AWS CodeWhisperer system, which provides much of the same autocomplete functionality as Copilot, but optimized for use within the AWS framework. After multiple requests from users, Google incorporated code generation and debugging capabilities into Bard this past April as well, ahead of its ecosystem-wide pivot to embrace AI at I/O 2023 and the release of Codey, Alphabet’s answer to Copilot. We can’t be sure yet what generative coding systems will eventually become or how it might impact the tech industry — we could be looking at the earliest iterations of a transformative democratizing technology, or it could be Clippy for a new generation.
All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission. All prices are correct at the time of publishing.