llama-cpp settings site:www.reddit.com

Looking for for folks to share llama.cpp settings/strategies (and ... www.reddit.com › LocalLLaMA › comments › looking_for_for_folks_to_...

8 июн. 2023 г. · Looking for for folks to share llama.cpp settings/strategies (and models) which will help write creative (interesting), verbose (long), true-to-prompt stories.

Updated with corrected settings for Llama.cpp. Battle of ... - Reddit www.reddit.com › LocalLLaMA › comments › updated_with_corrected_se...

28 окт. 2024 г. · Llama.cpp's performance improves from the 36 or 37 token range to 50 -51 for the 1x tests, and from 10 - 11 tokens per second for the 4x test to just above 15.

llama.cpp GGUF settings : r/Oobabooga - Reddit www.reddit.com › Oobabooga › comments › ll...

2 окт. 2023 г. · n-gpu-layers: Comes down to your video card and the size of the model. Set it to "51" and load the model, then look at the command prompt. If ...

Tweaking parameters for llamaCPP? : r/LocalLLaMA - Reddit www.reddit.com › LocalLLaMA › comments

2 мая 2024 г. · To figure out the optimal number of layers, I'd suggest flipping the switch in Nvidia control panel so it crashes if you overflow, and then keep ...

llama.cpp - soooo many options! What are your faves? - Reddit www.reddit.com › LocalLLaMA › comments

29 июл. 2024 г. · With "llama-server -p port_number" you have to simply open localhost:[port number] and have an entire web interface with chat and continue modes ...

Guide to configuring Llama.cpp? I just switched and it's 1/10th ... www.reddit.com › LocalLLaMA › comments

17 мар. 2024 г. · Does anyone have a guide on how to configure Llama.cpp I can't find much past basic installation guide. I have it up and running, but I'm getting <1tk/s.

I've made an "ultimate" guide about building and using `llama ... www.reddit.com › LocalLLaMA › comments › ive_made_an_ultimate_gui...

4 дня назад · It will guide you throught the building process of llama.cpp, for CPU and GPU support (w/ Vulkan), describe how to use some core binaries ( ...

Best settings for Llama 2 models? : r/Oobabooga - Reddit www.reddit.com › Oobabooga › comments › b...

15 авг. 2023 г. · On llama.cpp/llamacpp_HF, set n_ctx to 4096. Make sure to also set Truncate the prompt up to this length to 4096 ...

Try these settings for LLama 3.1 for longer (or even shorter ... www.reddit.com › LocalLLaMA › comments › try_these_settings_for_llam...

3 авг. 2024 г. · The key is to disable top-P, top-K and user very low repetition penalty (around 1.02). Use min-P (around 0.05) and DRY instead. These are way better.

Different LLAMA_CUDA config values for llama.cpp? - Reddit www.reddit.com › LocalLLaMA › comments

30 сент. 2023 г. · I finally got NVLink set up on my dual 3090s, and am getting 17 tok/s on 70B models which is great. Curious to know if I can go even faster.