the stack dataset

bigcode/the-stack · Datasets at Hugging Face huggingface.co › datasets › bigcode › the-stack

The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the BigCode ...

Datasets - BigCode www.bigcode-project.org › Docs › About

As part of the BigCode project, we released and will maintain The Stack, a 6.4 TB dataset of permissively licensed source code in 358 programming languages, ...

The Stack Dataset - Papers With Code paperswithcode.com › dataset › the-stack

The Stack contains over 3TB of permissively-licensed source code files covering 30 programming languages crawled from GitHub. The dataset was created as ...

bigcode/the-stack-v2 · Datasets at Hugging Face huggingface.co › datasets › the-stack-v2

1 мар. 2024 г. · The Stack v2 contains over 3B files in 600+ programming and markup languages. The dataset was created as part of the BigCode Project, an open ... Bigcode/the-stack-v2-dedup · The-Stack-v2-train-smol-ids · Files Files and versions

The Stack: 3 TB of permissively licensed source code - arXiv arxiv.org › cs

20 нояб. 2022 г. · We introduce The Stack, a 3.1 TB dataset consisting of permissively licensed source code in 30 programming languages.

bigcode-project/the-stack-v2 - GitHub github.com › bigcode-project › the-stack-v2

In this repository you can find the code for building The Stack v2 dataset, as well as the extra sources used to make StarCoder2data.

BigCode Dataset - GitHub github.com › bigcode-project › bigcode-dataset

This repository gathers all the code used to build the BigCode datasets such as The Stack as well as the preprocessing necessary used for model training.

The Stack: 3 TB of permissively licensed source code openreview.net › forum

7 февр. 2023 г. · The paper introduces a dataset, called the Stack, consisting of 3.1 TB of permissively licensed code in 30 languages.

BigCode Collaboration Introduces The Stack - ServiceNow Blog www.servicenow.com › blogs › big-code-collab...

27 окт. 2022 г. · Researchers from the project have released The Stack, a 3TB dataset of permissively licensed source code, to the research community.

xarray.Dataset.stack docs.xarray.dev › stable › generated › xarray.D...

Stack any number of existing dimensions into a single new dimension. New dimensions will be added at the end, and by default the corresponding coordinate ...

Запросы по теме