codeparrot github code clean

codeparrot/github-code-clean · Datasets at Hugging Face huggingface.co › datasets › github-code-clean

This is a cleaner version of Github-code dataset, we add the following filters: Average line length < 100; Alpha numeric characters fraction > ...

codeparrot/github-code-clean · Discussions - Hugging Face huggingface.co › datasets › discussions

We're on a journey to advance and democratize artificial intelligence through open source and open science.

blog/codeparrot.md at main · huggingface/blog - GitHub github.com › huggingface › blog › blob › code...

In this step by step guide, we'll learn how to train a large GPT-2 model called CodeParrot, entirely from scratch.

codeparrot/github-code-clean | ATYUN.COM 官网 - 人工智能 www.atyun.com › datasets › info › github-code...

ATYUN(AiTechYun),这是一个更清洁的版本Github-code dataset ，我们添加了以下过滤条件：平均行长度小于100字母数字字符比例大于0.25删除自动生成的文件（关键词搜索）删除 ...

CodeParrot - GitHub github.com › Code-Parrot-ai

Supercharged Frontend Development: Create pixel perfect UI 10x faster - CodeParrot.

SwayamInSync/PythonCoder: Code-gen model for Python ... github.com › SwayamInSync › PythonCoder

PythonCoder is a code generation model only trained on Python dataset (codeparrot/codeparrot-clean) . It is a custom model with context window of 1024 ...

codeparrot_training.py - GitHub github.com › examples › codeparrot › scripts

Iterable dataset that returns constant length chunks of tokens from stream of text files. Args: tokenizer (Tokenizer): The processor used ... Не найдено: clean | Нужно включить: clean

CodeParrot - GitHub github.com › CodeParrot

CodeParrot has 4 repositories available. Follow their code on GitHub.

codeparrot_1M | Kaggle www.kaggle.com › heyytanay › codeparrot-1m

1 Million tokenized Python files from the Codeparrot dataset in Lance format.

Models - Hugging Face hf.rst.im › models › github-code

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Запросы по теме

codeparrot codeparrot clean

github huggingface/datasets

codeparrot huggingface

code explanation dataset