The GitHub Code dataset consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in 1TB of data. The dataset was created ... |
Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Create a dataset · Dataset features · Know your dataset · Create a dataset card |
The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the BigCode ... |
This dataset is a summary and reformat pulled from github code. You should make your own assumptions based on this. |
The dataset covers a wide range of programming languages, such as Python, TypeScript, JavaScript, Ruby, Julia, Rust, C++, Bash, Java, C#, and Go. It also ... |
The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools - huggingface/datasets. |
This is a cleaner version of Github-code dataset, we add the following filters: Average line length < 100; Alpha numeric characters fraction > ... |
CodeSearchNet corpus is a dataset of 2 milllion (comment, code) pairs from opensource libraries hosted on GitHub. It contains code and documentation for several ... |
This repository is the backend that provides the dataset viewer with pre-computed data through an API, for all the datasets on the Hub. The frontend viewer ... |
Source code dataset is a collection of Github awesome repos, it contains Python, Java, C++, and other programming languages. |
Novbeti > |
Axtarisha Qayit Anarim.Az Anarim.Az Sayt Rehberliyi ile Elaqe Saytdan Istifade Qaydalari Anarim.Az 2004-2023 |