code datasets

codeparrot/github-code · Datasets at Hugging Face huggingface.co › datasets › github-code

The GitHub Code dataset consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in 1TB of data. The dataset was created ...

Machine Learning Datasets - Papers With Code paperswithcode.com › datasets

The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. Code Generation 53 · Images 2977 · CIFAR-10 (Canadian Institute... · Texts 2813

A collection of datasets for machine learning for big code - GitHub github.com › CUHK-ARISE › ml4code-dataset

A collection of datasets (and other resources) for big code analysis. If you want to contribute to this list, please send a pull request.

Machine Learning Datasets | Papers With Code paperswithcode.com › datasets › task=code-gen...

Lyra is a dataset for code generation that consists on Python code with embedded SQL. This dataset contains 2,000 carefully annotated database manipulation ...

List of code generation datasets (open source) - Reddit www.reddit.com › datasets › comments › list_of...

30 мая 2023 г. · The GitHub Code dataset consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in 1TB of data. 7. MBPP.

150k Python Source Code Dataset - Kaggle www.kaggle.com › datasets › veeralakrishna

We provide a dataset consisting of parsed Parsed ASTs that were used to train and evaluate the DeepSyn tool.

github/CodeSearchNet: Datasets, tools, and benchmarks for ... github.com › github › CodeSearchNet

11 апр. 2023 г. · CodeSearchNet is a collection of datasets and benchmarks that explore the problem of code retrieval using natural language.

codeparrot/github-code-clean · Datasets at Hugging Face huggingface.co › datasets › github-code-clean

This is a cleaner version of Github-code dataset, we add the following filters: Average line length < 100; Alpha numeric characters fraction > ...

Code and datasets - Amazon Science www.amazon.science › code-and-datasets

Find the latest code and datasets from Amazon scientists and researchers, which have been released across GitHub and other platforms.

Machine Learning Datasets | Papers With Code paperswithcode.com › datasets

We collect, organize and open-source the large-scale multimodal instruction dataset, Infinity-MM, consisting of tens of millions of samples. Through quality ... Datasets · Anomaly Detection 116 · Face Recognition 65 · Time series 241

Запросы по теме

papers with code datasets

code contests dataset

medical datasets

datasets python