The GitHub Code dataset consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in 1TB of data. The dataset was created ... |
11 апр. 2023 г. · CodeSearchNet is a collection of datasets and benchmarks that explore the problem of code retrieval using natural language. Code of Conduct · Instructions · README.md · MIT License |
A collection of datasets (and other resources) for big code analysis. If you want to contribute to this list, please send a pull request. |
This is a cleaner version of Github-code dataset, we add the following filters: Average line length < 100; Alpha numeric characters fraction > ... |
This repository contains all the needed tools and scripts to reproduce the datasets, as well as the academic papers they may relate to. |
This is a list of topic-centric public data sources in high quality. They are collected and tidied from blogs, answers, and user responses. README.rst · Issues 68 · Pull requests 61 · Actions |
This repository gathers all the code used to build the BigCode datasets such as The Stack as well as the preprocessing necessary used for model training. |
CoDesc is a noise removed, large parallel dataset of source codes and corresponding natural language descriptions. This dataset is procured from several similar ... |
This dataset is a collection of 1052 GitHub repositories, along with other columns such as the primary language used in it, fork count, open pull requests, and ... |
A new vulnerable source code dataset for deep learning based vulnerability detection (RAID 2023) https://surrealyz.github.io/files/pubs/raid23-diversevul.pdf |
Novbeti > |
Axtarisha Qayit Anarim.Az Anarim.Az Sayt Rehberliyi ile Elaqe Saytdan Istifade Qaydalari Anarim.Az 2004-2023 |