code dataset huggingface

codeparrot/github-code · Datasets at Hugging Face huggingface.co › datasets › github-code

The GitHub Code dataset consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in 1TB of data. The dataset was created ...

Datasets - Hugging Face huggingface.co › docs › datasets

Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Create a dataset · Dataset features · Know your dataset · Create a dataset card

bigcode/the-stack · Datasets at Hugging Face huggingface.co › datasets › bigcode › the-stack

The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the BigCode ...

jtatman/python-code-dataset-500k - Hugging Face huggingface.co › datasets › python-code-datase...

This dataset is a summary and reformat pulled from github code. You should make your own assumptions based on this.

nampdn-ai/tiny-codes · Datasets at Hugging Face huggingface.co › datasets › tiny-codes

The dataset covers a wide range of programming languages, such as Python, TypeScript, JavaScript, Ruby, Julia, Rust, C++, Bash, Java, C#, and Go. It also ...

huggingface/datasets: The largest hub of ready-to-use ... - GitHub github.com › huggingface › datasets

The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools - huggingface/datasets.

codeparrot/github-code-clean · Datasets at Hugging Face huggingface.co › datasets › github-code-clean

This is a cleaner version of Github-code dataset, we add the following filters: Average line length < 100; Alpha numeric characters fraction > ...

code-search-net/code_search_net · Datasets at Hugging Face huggingface.co › datasets › code_search_net

CodeSearchNet corpus is a dataset of 2 milllion (comment, code) pairs from opensource libraries hosted on GitHub. It contains code and documentation for several ...

huggingface/dataset-viewer: Backend that powers the ... - GitHub github.com › huggingface › dataset-viewer

This repository is the backend that provides the dataset viewer with pre-computed data through an API, for all the datasets on the Hub. The frontend viewer ...

shibing624/source_code · Datasets at Hugging Face huggingface.co › datasets › source_code

Source code dataset is a collection of Github awesome repos, it contains Python, Java, C++, and other programming languages.

Запросы по теме

datasets load_dataset

huggingface datasets

как скачать датасет с huggingface

python code dataset

huggingface codesearchnet

huggingface create dataset

code datasets

datasets pytorch