Overwhelmed by duplicate issues being filed on your GitHub project? Or by competing pull requests that are chewing up your team’s time? Your communication style, or way of delegating work, may be parts of the problem, researchers have found.
As any project manager knows, developers work by issues and pull requests (PR) on git and git-services such as GitHub and GitLab.
A group of researchers from Federal University of Pará, Brazil and University of British Columbia studied these coding behaviors, charting them on a graph to see what “hidden patterns” could be found.
Searching for these patterns in your own team’s development may reveal areas where workflow can be optimized.
“If your project has a disproportionate amount of competing PRs, or ‘duplicate issue hubs,’ it might be assigned to revisit your code review or bug reporting practices,” noted Emilie Ma, one of the researchers who spoke at the Linux Foundation Open Source Summit earlier this year.
How Researchers Track GitHub Behavior?
The team looked at 56 GitHub projects, capturing all the issues and PRs these projects generated. In a graph model (captured in Neo4j), issues and PRs were represented as nodes, and the links between them were represented as edges.
In a git workflow, issues are created to identify work to be done. The resulting code created is then bundled into PRs that then are typically reviewed before being merged into the core body of code. In an ideal world, a single PR resolves the issue.
In terms of links, issues could be open (meaning work or discussion was under way), or they can be closed. Same with PRs, except that once finished, they hit another status, being merged. Both issues and PRs can be duplicates. Authors and timestamps were also collected.
“This graph-based approach provides a window into a set of collaborative software engineering practices that have not been previously described,” the researchers wrote.
In the process, they built a visualization tool, WorkflowsExplorer to display the results.
Previous studies looked at issues and pull requests independently, though there is a value in studying them in tandem. “Issues and PRs are coupled in practice:
Issues are frequently resolved with PRs, and PRs are associated with Issues,” the researchers wrote.

The researchers’ methodology (“Revealing Software Development Work Patterns with
PR-Issue Graph Topologies”)
Basic GitHub Workflow Patterns
As a result of these labors, the researchers found eight distinct workflow types, or behavior patterns, which made up over 1,000 instances of dev actions.
“Each of these workflow type definitions is associated with a work practice,” Ma said.

Graphs of the different types of workflow patterns, or behaviors, found by the research team (“Revealing Software Development Work Patterns with
PR-Issue Graph Topologies”).
Not surprisingly, 35.7% of relationship types were of a simple resolution to an issue. But there were lots of other patterns, some good and others not so much.
Here is one workflow type they found, “Competing PRs”: Two or more coders separately propose a feature, and each submits a PR.
In the case of Competing PRs, “Contributors tend to be overeager to contribute their own implementations of a task without otherwise communicating,” Ma said.
That only one of the PRs has been accepted suggests that the project has less-than-optimal communications, as there is duplicate work going on. One PR may be rejected because it hampers performance too much, but another is accepted.
On the upside, however, this behavior allows the project can “be more picky” about accepting PRs, Ma said.
Another pattern: Duplicate Issues. Here multiple issues are raised, independent of one another. If you get a few duplicate issue, you have a “duplicate issue hub,” Ma explained. This is another potential negative for the project.
Breaking changes is a frequent cause of duplicate issue hubs. They are a sign that the project can be more articulate in its messaging about upcoming changes.
“Duplicate issue hubs tend to arise by contributors aren’t aware of the work being going going on in a project, or if they just haven’t bothered to search through previous issues. And this causes additional maintenance burden,” Ma said.
“It might be assigned to reevaluate how you’re messaging, the change that’s causing those duplicate issues to better inform and users,” Ma said.
Overall duplicate issues come up less frequently than you would guess, however. The researchers only 15 instance of duplicate issue nodes across the 90,000 nodes studied.
She pointed to one Apache project that crated a one weekly bot to assemble an issue of all the PRs that were merged that week, letting everyone know what updates have been made.
An overeager developer may lead to another problematic pattern, that of solving several issues in a single PR (Divergent PR). This can slow down the team because these hydra-headed PRs will require more time to review, as the reviewer may not be conversant in all the issues being addressed.
Not all patterns are problematic, though.
For adding big features, a shop may use Decomposed PR, which involves not one but multiple PRs chained together. Each one is part of a job (“frontend”) and may rely on other PRs (“backend”) to complete the issue. Often, these are completed by a single author, who can submit one PR, then start on another one while the first is being reviewed.
This approach is often regarded as a positive pattern, as it makes code easier to write, review and commit, especially for larger projects.
What Workflow Types Say About Your Project
Overall, workflow types were found in all the projects studied, though larger projects had more of these patterns. The largest of these projects had more than 150 workflow types.
“There’s a link between the maturity of a project and its need for structured and highly organized collaboration that manifests itself in these workflow types,” Ma said.
To validate their findings, the research team interviewed a number of project developers, who saw value in this approach in helping improve code review and project management practices. Divergent PRs, for example, could be a signal for more code review prioritization.
“Think of this PR/Issue Graph as a sort of Grafana to monitor your project’s collaboration health, [one] that can help identify problem areas and serve as a global reference point to understand your project as a whole,” Ma said.
The paper detailing all the work, “Revealing Software Development Work Patterns with PR-Issue Graph Topologies,” will be presented July 18 to The ACM International Conference on the Foundations of Software Engineering taking place in Brazil. Cleidson de Souza, Jesse Wong, Dongwook Yoon, and Ivan Beschastnikh were the other authors in this work.
The post What GitHub Pull Requests Reveal about Your Team’s Dev Habits appeared first on The New Stack.
Does your team suffer from duplicate git Issues? How about competing or over-stuffed pull requests? A group of researchers have discovered all sorts of ways your dev team may be working with less-than-optimal efficiency.