starcoder dataset

bigcode/starcoderdata · Datasets at Hugging Face huggingface.co › datasets › starcoderdata

This is the dataset used for training StarCoder and StarCoderBase. It contains 783GB of code in 86 programming languages, and includes 54GB GitHub Issues + 13 ...

bigcode/starcoder - Hugging Face huggingface.co › bigcode › starcoder

17 авг. 2023 г. · The StarCoder models are 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2), with opt-out requests excluded.

Home of StarCoder: fine-tuning & inference! - GitHub github.com › bigcode-project › starcoder

9 мая 2023 г. · StarCoder is a language model (LM) trained on source code and natural language text. Its training data incorporates more that 80 different programming ...

blog/starcoder.md at main · huggingface/blog - GitHub github.com › huggingface › blog › blob › starc...

StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming ...

Models - BigCode www.bigcode-project.org › Docs › About

StarCoder Search: Full-text search code in the pretraining dataset. StarCoder Membership Test: Blazing fast test if code was present in pretraining dataset.

[PDF] StarCoder: may the source be with you! - arXiv arxiv.org › pdf

The Stack consists of. 6.4 TB of permissively licensed source code in 384 programming languages, and includes 54 GB of GitHub issues and repository-level ...

StarCoder 2 and The Stack v2: The Next Generation paperswithcode.com › paper › starcoder-2-and-...

29 февр. 2024 г. · We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM ...

[PDF] StarCoder2 and The Stack v2: The Next Generation - arXiv arxiv.org › pdf

29 февр. 2024 г. · This results in a training set that is 4× larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on.

Shekiller Показать все

Introduction to StarCoder and StarCoder 2 - DebuggerCafe

Vision and Language Group | StarCoder from BigCode is an advanced ...

bigcode/starcoderdata · Datasets at Hugging Face

Показать все

The Stack Dataset - Papers With Code paperswithcode.com › dataset › the-stack

The Stack contains over 3TB of permissively-licensed source code files covering 30 programming languages crawled from GitHub.

Запросы по теме

starcoder как пользоваться