8 июн. 2023 г. · Looking for for folks to share llama.cpp settings/strategies (and models) which will help write creative (interesting), verbose (long), true-to-prompt stories. |
28 окт. 2024 г. · Llama.cpp's performance improves from the 36 or 37 token range to 50 -51 for the 1x tests, and from 10 - 11 tokens per second for the 4x test to just above 15. |
2 окт. 2023 г. · n-gpu-layers: Comes down to your video card and the size of the model. Set it to "51" and load the model, then look at the command prompt. If ... |
2 мая 2024 г. · To figure out the optimal number of layers, I'd suggest flipping the switch in Nvidia control panel so it crashes if you overflow, and then keep ... |
29 июл. 2024 г. · With "llama-server -p port_number" you have to simply open localhost:[port number] and have an entire web interface with chat and continue modes ... |
17 мар. 2024 г. · Does anyone have a guide on how to configure Llama.cpp I can't find much past basic installation guide. I have it up and running, but I'm getting <1tk/s. |
4 дня назад · It will guide you throught the building process of llama.cpp, for CPU and GPU support (w/ Vulkan), describe how to use some core binaries ( ... |
15 авг. 2023 г. · On llama.cpp/llamacpp_HF, set n_ctx to 4096. Make sure to also set Truncate the prompt up to this length to 4096 ... |
3 авг. 2024 г. · The key is to disable top-P, top-K and user very low repetition penalty (around 1.02). Use min-P (around 0.05) and DRY instead. These are way better. |
30 сент. 2023 г. · I finally got NVLink set up on my dual 3090s, and am getting 17 tok/s on 70B models which is great. Curious to know if I can go even faster. |
Novbeti > |
Axtarisha Qayit Anarim.Az Anarim.Az Sayt Rehberliyi ile Elaqe Saytdan Istifade Qaydalari Anarim.Az 2004-2023 |