Does GitHub Copilot improve code quality? Here’s what the data says

[ad_1]

AI has fundamentally changed software development in the two years since GitHub Copilot was made available to the public. During this time, GitHub has Copilot helped developers code up to 55% faster. Previous research also showed that 85% of developers felt more confident in their code and 88% felt more in flow with GitHub Copilot.

However, the question remains: is the quality of code written with GitHub Copilot objectively better or worse?

To answer this question, we conducted a randomized controlled trial to understand how functional, readable, reliable, maintainable, conciseAnd likely to be approved Code created with GitHub Copilot is.

For the study, we recruited 202 developers with at least five years of experience. Half were randomly assigned GitHub Copilot access and the other half were told not to use AI tools. Participants were all asked to complete a coding task and write API endpoints for a web server. We then evaluated the code with unit tests and expert developer review.

Overall, our results show that code created with GitHub Copilot has higher functionality, better readability, is of better quality, and receives higher approval rates.

Key findings of this study:

Increased functionality: Developers with GitHub Copilot access were 56% more likely to pass all 10 unit tests in the study, suggesting that GitHub Copilot helps developers write more functional code by a wide margin.
Improved readability: In blind reviews, code written with GitHub Copilot had significantly fewer code readability errors, allowing developers to write, on average, 13.6% more lines of code without encountering readability issues.
Overall better code quality: Readability improved by 3.62%, reliability by 2.94%, maintainability by 2.47%, and conciseness by 4.16%. All numbers were statistically significant. These quality improvements were consistent with those in the DORA report 2024.
Higher approval rates: Developers were 5% more likely to approve code written with GitHub Copilot, meaning that code is ready for merging sooner, reducing the time to fix bugs or deliver new features.

Here’s a deep dive into what we found:

Code written with GitHub Copilot was more functional

If the code doesn’t work, it can’t be said to be of high quality. So we looked at each other Functionality and measured it by analyzing how many unit tests the code passed. We found that code created with GitHub Copilot passed significantly more tests (p=0.002). In fact, developers with GitHub Copilot access were 56% more likely to pass all 10 unit tests in the study (p=0.04). This means that using GitHub Copilot helps developers write code that is significantly more functional.

Editor’s note: The image above has been updated to only show the percentage of study participants who passed all 10 unit tests. An earlier version included both those who passed all unit tests in the study and those who failed.

Developers found code written with GitHub Copilot to be easier to read

The 25 developers who wrote code that passed all 10 unit tests in the first phase of the study were randomly assigned to a blind review of anonymized submissions, both those written with and without GitHub Copilot. Reviewers found that code created with GitHub Copilot had fewer code readability errors.

Our analysis of developers’ line-by-line code review found that code written with GitHub Copilot had significantly fewer code errors: developers using GitHub Copilot wrote 18.2 lines of code per code error, but only 16.0 without. This corresponds to an average of 13.6% more lines of code with GitHub Copilot without code errors (p=0.002). This can result in real-time savings as each of these code errors requires developer intervention. For example, without using GitHub Copilot, teams can find themselves with up to 13% more comments or suggestions to consider, accumulating over time.

	Average number of code errors	Medium lines of code	Average lines of code per code error	% Difference
With GitHub Copilot	4.63	84.3	18.2	13.6%
GitHub Copilot is not used	5.35	85.7	16.0	-11.9%

Additionally, the differences noted by developers were not limited to errors per line of code. They also rated the code created with GitHub Copilot as better readable, reliable, maintainable, And concise by 1-3% (p=0.003, p=0.01, p=0.041, p=0.002). Although these differences were small, they were statistically significant and contribute to a better code base.

Dependent variable	Medium difference	P-value
Readable	3.62%	0.003
Reliable	2.94%	0.01
Maintainable	2.47%	0.041
Concise	4.16%	0.002

Code created with GitHub Copilot was more likely to be approved

Finally, we found that developers are also 5% more likely to approve code created with GitHub Copilot (p=0.014). In practice, this means that developers using GitHub Copilot write code that can be merged more quickly, reducing the time to fix bugs or deliver new features.

The end result

So what do these results say about how GitHub Copilot improves code quality? While the number of commits and lines of code changed was significantly higher for the GitHub Copilot group, the average commit size was slightly smaller. This suggests that GitHub Copilot allowed developers to iterate on the code to improve its quality. Our hypothesis is that because developers spent less time making their code work, they were able to focus more on improving its quality. This is consistent with our previous findings that developers felt more confident using GitHub Copilot. It also shows that with the greater trust that GitHub Copilot gave them, they were likely able to iterate without fear of introducing bugs in the code.

As the first controlled study to examine the impact of GitHub Copilot on code quality, it shows that GitHub Copilot helps write high-quality code. We suspect that other studies may not have found an improvement in code quality with GitHub Copilot, not because of the tool itself, but because developers may have lacked the opportunity or incentive to focus on quality. This data builds on our previous research and shows that GitHub Copilot is a powerful product that helps developers code faster and increase job satisfaction, as well as empowering teams to move quickly and increase their creativity and innovation.

Here on the GitHub Customer Research team, we’re constantly conducting new research into the effectiveness of our products as we work to be home to 1 billion developers – so stay tuned for more insights and developments in the near future.

methodology

In the first phase of the study, we recruited 243 developers with at least five years of Python experience. You were randomly assigned to either use GitHub Copilot or not. Each group completed a coding exercise for a web server containing fictional restaurant reviews with 10 unit tests to evaluate functionality. We received valid submissions from 202 developers: 104 with GitHub Copilot and 98 without.

In the second phase, developers were randomly assigned posts to review using a provided rubric. They didn’t know if the code was built with GitHub Copilot. Each submission was reviewed by at least 10 different participants, resulting in 1,293 reviews. The developers used the rubric to provide a line-by-line review focused on identifying code errors. They also gave an overall rating of the submission Readability, reliability, maintainability and conciseness, and whether filing should occur approved. If you have any further questions about the methodology, please contact press@github.com

How do we define a code error?

In this study, we defined code errors as any code that affects the understandability of the code. These were not functional errors that would prevent the code from working as intended, but rather errors that resulted from poor coding practices. These code errors were derived from the scientific literature on code readability^{and code complexity^{. The code errors were used in the rubric provided during the code review. These included: inconsistent naming, unclear identifiers, excessive line length, excessive white space, lack of documentation, repeated code, excessive branching or loop depth, inadequate separation of functionality, and variable complexity.}}

Acknowledgments: We would like to thank Lizzie Redford, Ph.D. thank. And Sida Peng, Ph.D. for their assistance with the study design and statistical analysis of this research.

Ready to create high-quality code quickly?
Get started with GitHub Copilot >

Notes

Written by

[ad_2]

Source link

Welcome! Please hold on...

Kashif Sohail

Residence:

City:

Age:

Magento 1x 2x

Wordpress

Laravel

Angular

React

English

Urdu