I’d like to self host a large language model, LLM.
I don’t mind if I need a GPU and all that, at least it will be running on my own hardware, and probably even cheaper than the $20 everyone is charging per month.
What LLMs are you self hosting? And what are you using to do it?
I’m new to this but I’m curious why not Llama 3.1? Are Mistral-Nemo and Mistral-Small superior?
I’m running a GTX 4070 Super with Llama 3.1 8b and Llama 3.2 and like the results. But I’m open to a higher fidelity model that works with my GPU at a reasonable speed.