[AutoDeploy] Arch2: Model Support: VLM, Long-Context, and Linear Attention

A running epic to track support for increasing model coverage beyond attention-based text-to-text models

Some imminent features / discussions for that entail:

* Add support for handling logit soft capping that is used in (used in Gemini, Grok and Gemma-2, etc.) The attention system currently fails for google/gemma-2-27b-it. ("warm-up")
* Considering linear attention and other state-space model approaches
* Long-context and seqlen-dependent attention masking (sliding window, chunked attention, ...)
* VLM and other many-to-text models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoDeploy] Arch2: Model Support: VLM, Long-Context, and Linear Attention #4593

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AutoDeploy] Arch2: Model Support: VLM, Long-Context, and Linear Attention #4593

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions