#inference (5 件)

ai 2026年5月4日

Featherless.ai $20M調達——5秒ホットスワップで30,000以上のオープンモデルをサーバーレス化する仕組みと実際の使い方

2026年4月30日、Featherless.aiがAMD Ventures・Airbus Ventures主導で$20M Series Aを調達。30,000以上のHugging Faceモデルをサーバーレス・月額定額で提供するホットスワップ技術の詳細と、プロプライエタリAIへの代替として開発者が今すぐ使う方法を解説。

#ai #open-source #inference #serverless #llm #amd #startup #developer-tools #huggingface

記事へ →

ai 2026年5月2日

NVIDIA Nemotron 3 Nano Omni——Apache 2.0・30BパラメータのオープンマルチモーダルモデルでエッジAIエージェントを構築する

2026年4月28日、NVIDIAがNemotron 3 Nano Omniをリリース。視覚・音声・テキスト・コードを単一モデルで処理するMoEアーキテクチャ（30B総量/3Bアクティブ）でApache 2.0ライセンス。Nemotron 2比4倍のスループット。エッジAIエージェントへの実装方法を解説する。

#nvidia #nemotron #open-source #multimodal #ai-agents #llm #moe #edge-ai #inference #apache2

記事へ →

ai 2026年4月30日

Llama 4 Scout実践ガイド：10Mトークンコンテキストの現実と、ローカル・API運用の使い分け

Metaが4月5日にリリースしたLlama 4 Scout（MoE 17B/109B、Llama Community License）は10Mトークンコンテキストを謳うが、実際の制約は多い。$0.08/M入力トークンのAPI運用と、Apple Silicon・H100でのローカル展開の現実的な限界を整理する。

#llama4 #meta #open-source #llm #ai #local-llm #moe #context-window #inference #ollama

記事へ →

ai 2026年4月29日

GoogleのTurboQuantがLLM推論のKVキャッシュを6倍圧縮——ICLR 2026、3ビット量子化で精度ゼロ損失を実現

Googleが2026年4月にICLR 2026で発表したTurboQuantは、LLM推論の最大ボトルネックであるKVキャッシュをトレーニングなしで6倍圧縮しながら精度損失ゼロを達成した。PolarQuantとQJLを組み合わせた2段階アルゴリズムで、GemmaとMistralで検証済み。

#turboquant #kv-cache #llm #quantization #inference #google #iclr #performance #ai #ml

記事へ →

ai 2026年4月23日

Google Cloud Next 2026の核心——Ironwood TPU正式GA・Gemini Enterprise Agent Platformで「推論の時代」が始まる

4月22〜23日開催のGoogle Cloud Next 2026。第7世代TPU「Ironwood」の一般提供開始、Gemini Enterprise Agent Platform発表、第8世代TPUプレビューを開発者視点で解説。

#google #tpu #gemini #ai-agents #cloud #inference #google-cloud-next

記事へ →