close
Skip to content

[WIP] Refactor Model Design#13794

Draft
DN6 wants to merge 21 commits into
mainfrom
refactor-model-metadata
Draft

[WIP] Refactor Model Design#13794
DN6 wants to merge 21 commits into
mainfrom
refactor-model-metadata

Conversation

@DN6
Copy link
Copy Markdown
Collaborator

@DN6 DN6 commented May 22, 2026

What does this PR do?

This refactor turns models into self-contained modules that declare their capabilities in one place. Per-model conversion code moves next to the model and a unified metadata() API makes feature attributes inspectable from any model class.

Motivation

Today, features are added to models through a mix of class attributes and mixins. Mixins define their own class attributes as well, so when examining a model class it isn't immediately clear which attributes and features are relevant or available.

Models are defined in a single file, so we end up using centralized utility files for things like model-specific weight and LoRA conversions. These files have grown enormous as they accumulate code to handle per-model variants and their idiosyncrasies.

The new design makes the mixins model-agnostic and has each mixin reach for the per-model metadata it needs through small handler objects attached to the model class.

Proposed Structure

Using Flux as a reference:

models/transformers/flux/
├── __init__.py
├── _ip_adapter.py        # FluxIPAdapterMixin + converters (internal)
├── _lora.py              # FLUX_LORA handler + per-format converters (internal)
├── _weight_mapping.py    # FLUX_WEIGHT_MAPPING handler + key tables (internal)
└── model.py              # FluxTransformer2DModel class declaration

Two patterns live next to model.py, picked per subsystem based on whether the behavior actually generalizes across models:

  1. handler + shared mixin — for features where the steps are the same across models and only the data/conversion function varies. The model opts in by inheriting the shared mixin in loaders/ and assigning its handler as a class attribute. LoRA and single-file weight mapping fit here:

    class FluxTransformer2DModel(ModelMixin, LoRAModelMixin, ...):
        _lora = LoRAHandler("...")               # handler instance, consumed by LoRAModelMixin
  2. Per-model mixin — for features that vary too much across models for a single shared mixin to be useful. Each model gets its own mixin declared right next to the model and inherited directly. IP-Adapter is the showcase:

    class FluxTransformer2DModel(..., FluxIPAdapterMixin):
        ...

This should simplify developing on top of these models — modifications or enhancements stay within one folder. If a model has a very specific feature that doesn't generalize across others, it can be kept isolated there too (e.g. FreeNoise for the AnimateDiff UNet). Additionally, if a custom model is modifying an existing diffusers model (Self-Forcing Wan), the folder method of organizing the model lends itself well to custom code loading with AutoModel.

Features Introduced

Model capability introspection via Model.metadata()

Each model exposes a metadata() classmethod that returns a metadata object, keyed by the class attribute that controls each feature. The displayed row tells you exactly what to set or inherit to change the behavior.

>>> print(FluxTransformer2DModel.metadata())
FluxTransformer2DModel feature attributes
──────────────────────────────────────────────────────────────────────────────────
  _supports_gradient_checkpointing  True
  _supports_group_offloading        True
  _no_split_modules                 FluxTransformerBlock, FluxSingleTransformerBlock
  _skip_layerwise_casting_patterns  pos_embed, norm
  _repeated_blocks                  FluxTransformerBlock, FluxSingleTransformerBlock
  _cp_plan                          True
  _weight_mapping                   flux-depth, flux-dev, flux-fill, flux-schnell
  _lora                             bfl, kohya, kontext, xlabs
  _supports_cache                   True
  _supports_ip_adapter              True

The returned ModelMetadata exposes each feature value as an attribute (meta._supports_ip_adapter, meta._lora, ...), supports keys() / values() / items() for mapping-style iteration, and in for presence checks. meta.describe(verbose=True) adds an indented description and docs link under each row. Which can be useful for agents.

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant