- Home
- >
- Software Development
- >
- Uniqueness Is Rare on GitHub – InApps 2025
Uniqueness Is Rare on GitHub – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Uniqueness Is Rare on GitHub – InApps in today’s post !
Key Summary
This InApps.net article, published in 2022, discusses a study reviewed by Adrian Colyer on code duplication in GitHub, highlighting the prevalence of non-original code in open-source projects. Delivered with an analytical, tech-focused tone, it aligns with InApps Technology’s mission to explore software development trends, offering an accessible perspective on collaboration, dependency management, and their implications.
Key Points:
- Context: The paper DéjàVu: A Map of Code Duplicates on GitHub reveals that 82% of files in non-forked Java, C++, Python, and JavaScript projects on GitHub are duplicated or slightly modified from other projects, challenging the notion of code uniqueness.
- Core Insight: Open-source development thrives on building upon existing work, but widespread code duplication, especially of libraries (e.g., via npm in JavaScript), raises concerns about security and software quality due to untracked dependencies.
- Key Findings:
- Duplication Rates: Java has the least duplication (~50% similar files), while JavaScript’s smaller file structure and library use increase duplication. Modifications are often minor (e.g., added comments, reordered code).
- Challenges: Committing library code as application code reduces the likelihood of incorporating upstream updates, potentially missing critical security or performance improvements.
- Solutions: GitHub and tools like Libraries.io offer dependency tracking to monitor and update components against their original sources, improving security and quality in the software supply chain.
- Outcome: While code reuse is a strength of open source, the article underscores the need for better dependency management to ensure software integrity, encouraging developers to use tools and metrics to maintain robust, secure projects.
This article reflects InApps.net’s focus on innovative software development, providing an inclusive, practical analysis of code duplication on GitHub and its implications for developers and the open-source ecosystem.Read more about Uniqueness Is Rare on GitHub – InApps at Wikipedia
You can find content about Uniqueness Is Rare on GitHub – InApps from the Wikipedia website
OK, I admit it. I rely on Adrian Colyer to read dense computer science articles that are loaded with math beyond my comprehension. On his blog, Colyer promises to review “an interesting/influential/important paper from the world of computer science” each weekday (Whew! He must have a long commute to his day job, as a Venture Partner at London’s Accel).
A recent edition caught the interest of many people — a paper asserting that most files on GitHub are not original. At the heart of many developers’ open source world, GitHub enables collaboration within a version control system. It turns out that most collaboration is building on top of the work of others. According to the authors of the paper Colyer studied, DéjàVu: A Map of Code Duplicates on GitHub, eighty-two percent of files in non-forked projects written in Java, C++, Python or JavaScript are found in another project’s code base.
Java has the fewest duplicated files, but even here about half of the other files can be considered similar. These were likely cloned from another repository and have only been slightly modified, like by adding comments, moving code around or adding a few extra lines. JavaScript’s tendency to use many smaller files means skews the numbers somewhat. More significantly, many projects include libraries available through npm. This is a problem because if library components are committed as application code, then it decreases the likelihood that upstream changes in frameworks and libraries will be implemented.
By its very nature, open source proves that imitation is a form of flattery but has this gone too far? Of course not. Long live copycats. Yet, the prevalence of dependencies creates unique challenges for security and software quality. There are ways to address these issues. GitHub has created tools to identify dependencies. Along with many security companies, Libraries.io has created tools to check your repositories’ components versus their original source in the software supply chain. From a metrics perspective, we continue to gain consensus on just how to track these types of ecosystem dependencies. Stay tuned.
Source: InApps.net
Let’s create the next big thing together!
Coming together is a beginning. Keeping together is progress. Working together is success.