• 1 Post
  • 113 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle

  • My last Android phone was a Razer Phone 2, SD845 circa 2018. Basically stock Android 9.

    And it was smooth as butter. It had a 120hz screen while my iPhone 16 is stuck at 60, and I can feel it. And it flew through some heavy web apps I use while the iPhone chugs and jumps around, even though the new SoC should objectively blow away even modern Android devices.

    It wasn’t always this way; iOS used to be (subjectively) so much faster that it’s not even funny, at least back when I had an iPhone 6S(?). Maybe there was an inflection point? Or maybe it’s only the case with “close to stock” Android stuff that isn’t loaded with bloat.







  • Yeah, just paying for LLM APIs is dirt cheap, and they (supposedly) don’t scrape data. Again I’d recommend Openrouter and Cerebras! And you get your pick of models to try from them.

    Even a framework 16 is not good for LLMs TBH. The Framework desktop is (as it uses a special AMD chip), but it’s very expensive. Honestly the whole hardware market is so screwed up, hence most ‘local LLM enthusiasts’ buy a used RTX 3090 and stick them in desktops or servers, as no one wants to produce something affordable apparently :/






  • I don’t understand.

    Ollama is not actually docker, right? It’s running the same llama.cpp engine, it’s just embedded inside the wrapper app, not containerized. It has a docker preset you can use, yeah.

    And basically every LLM project ships a docker container. I know for a fact llama.cpp, TabbyAPI, Aphrodite, Lemonade, vllm and sglang do. It’s basically standard. There’s all sorts of wrappers around them too.

    You are 100% right about security though, in fact there’s a huge concern with compromised Python packages. This one almost got me: https://pytorch.org/blog/compromised-nightly-dependency/

    This is actually a huge advantage for llama.cpp, as it’s free of python and external dependencies by design. This is very unlike ComfyUI which pulls in a gazillian external repos. Theoretically the main llama.cpp git could be compromised, but it’s a single, very well monitored point of failure there, and literally every “outside” architecture and feature is implemented from scratch, making it harder to sneak stuff in.


  • OK.

    Then LM Studio. With Qwen3 30B IQ4_XS, low temperature MinP sampling.

    That’s what I’m trying to say though, there is no one click solution, that’s kind of a lie. LLMs work a bajillion times better with just a little personal configuration. They are not magic boxes, they are specialized tools.

    Random example: on a Mac? Grab an MLX distillation, it’ll be way faster and better.

    Nvidia gaming PC? TabbyAPI with an exl3. Small GPU laptop? ik_llama.cpp APU? Lemonade. Raspberry Pi? That’s important to know!

    What do you ask it to do? Set timers? Look at pictures? Cooking recipes? Search the web? Look at documents? Do you need stuff faster or accurate?

    This is one reason why ollama is so suboptimal, with the other being just bad defaults (Q4_0 quants, 2048 context, no imatrix or anything outside GGUF, bad sampling last I checked, chat template errors, bugs with certain models, I can go on). A lot of people just try “ollama run” I guess, then assume local LLMs are bad when it doesn’t work right.



  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldI've just created c/Ollama!
    link
    fedilink
    English
    arrow-up
    58
    arrow-down
    1
    ·
    edit-2
    11 days ago

    TBH you should fold this into localllama? Or open source AI?

    I have very mixed (mostly bad) feelings on ollama. In a nutshell, they’re kinda Twitter attention grabbers that give zero credit/contribution to the underlying framework (llama.cpp). And that’s just the tip of the iceberg, they’ve made lots of controversial moves, and it seems like they’re headed for commercial enshittification.

    They’re… slimy.

    They like to pretend they’re the only way to run local LLMs and blot out any other discussion, which is why I feel kinda bad about a dedicated ollama community.

    It’s also a highly suboptimal way for most people to run LLMs, especially if you’re willing to tweak.

    I would always recommend Kobold.cpp, tabbyAPI, ik_llama.cpp, Aphrodite, LM Studio, the llama.cpp server, sglang, the AMD lemonade server, any number of backends over them. Literally anything but ollama.


    …TL;DR I don’t the the idea of focusing on ollama at the expense of other backends. Running LLMs locally should be the community, not ollama specifically.


  • Go over there and listen to x lines of dialog only up go back and listen to y lines of dialog only to return to the first person and listen to more…

    Other games use similar mechanics at times but often not with characters that are so far apart and/or not without some kind of fast travel system that can make the whole process faster.

    IMO the problem is character writing.

    As an example, AC Odyssey had some great gems, like the underworld, that island intrigue quest, anything involving Phoebe, though many “mundane” quests had quirky characters too. Kassandra’s VA killed it through the whole game. I still remember all that, and I remember enjoying the in-between because I loved the characters and scenery. It was such a compelling reward.

    It did have filler quests though.

    …If talking and exploring itself feels like a chore, then that’s the problem IMO. It shouldn’t be a low point between gameplay.




  • brucethemoose@lemmy.worldtomemes@lemmy.worldBenefit of the hindsight
    link
    fedilink
    arrow-up
    19
    arrow-down
    2
    ·
    edit-2
    1 month ago

    I guess the problem NFTs try to solve is authority holding the initial verification tied to the video. If it’s on a blockchain, theoretically no one owns it and the date/metadata is etched in stone, whereas otherwise some entity has to publish the initial hash.

    In other words, one can hash a video, yeah, but how do you know when that hashed video was taken? From where? There has to be some kind of hard-to-dispute initial record (and even then that only works in contexts where the videos earliest date is the proof, so to speak, like recording and event as it happens).