aight so google quietly released gemma 4 yesterday and idk why this isnt blowing up more because this shit is actually insane
first thing – the license. previous gemma models were kinda ass because of weird restrictions, you couldnt really do anything commercial with them. now its apache 2.0. finally. use it, fine-tune it, sell it, do whatever. google took long enough lol
the models:
theres 4 of them. two "big" ones for workstations and two tiny ones for edge devices
the big ones:
whats actually good:
honestly if youve been ignoring gemma because of the old license drama this is worth a second look. its not gonna beat gpt-5 or claude 4 on everything but for local/self-hosted use cases its probably the best option rn. the MoE architecture is smart and the edge models are genuinely impressive for the size
only thing i dont love is google cloud pricing if you want to run the big version serverless but whatever you can selfhost
grab it on huggingface: https://huggingface.co/collections/google/gemma-4
if you want api access to this and other models without running your own hardware check https://cheap-api.shop – no subscription just pay per use
more stuff like this in the discord: https://discord.gg/fxmgdwDvbH
first thing – the license. previous gemma models were kinda ass because of weird restrictions, you couldnt really do anything commercial with them. now its apache 2.0. finally. use it, fine-tune it, sell it, do whatever. google took long enough lol
the models:
theres 4 of them. two "big" ones for workstations and two tiny ones for edge devices
the big ones:
- 26B MoE – sounds big but only 3.8B parameters are active at once. so basically 27B quality at 4B compute cost. thats actually clever ngl
- 31B dense – 256K context, vision built in, handles different image aspect ratios natively
whats actually good:
- multimodal is NATIVE not bolted on like with some other models
- thinking mode you can toggle on and off depending on if you need it
- 140 languages pretrained, real time translation works surprisingly well
- audio encoder got compressed hard – from 681M down to 305M params, way faster response times
honestly if youve been ignoring gemma because of the old license drama this is worth a second look. its not gonna beat gpt-5 or claude 4 on everything but for local/self-hosted use cases its probably the best option rn. the MoE architecture is smart and the edge models are genuinely impressive for the size
only thing i dont love is google cloud pricing if you want to run the big version serverless but whatever you can selfhost
grab it on huggingface: https://huggingface.co/collections/google/gemma-4
if you want api access to this and other models without running your own hardware check https://cheap-api.shop – no subscription just pay per use
more stuff like this in the discord: https://discord.gg/fxmgdwDvbH


![[Image: giphy.gif]](https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExdDZjOTlsOW1xaHgwOWQ3YW93OWhkdXlhMWxnam1sczVuZnZ4OTF6YSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/jWA0NJlfwo3YyYOziw/giphy.gif)