A framework for ignite
======================

This guideline is written for ignite contributors interested in further enhancing and expanding the PyTorch Ignite feature, usability and broadening the user spectrum. In the following a concept of a rapid-feature-development framework is suggested underlying the current Ignite concept. It is meant to provide the full infrastructural code layer on which feature can quickly evolve. It should help to overcome possible limitations of the current Ignite architecture. For full understanding of this guideline a base knowledge of the current Ignite architecture is very helpful as the text will compare the new framework with the current version.

The guideline is kept short & simple and serves the only purpose to quickly give other contributors an overview of the
current version's shortcomings and arising advantages of the framework. It basically consists of :doc:`this page<index>`, the :doc:`Quickunderstanding Application<quickunderstanding_app>` and the :doc:`Quickunderstanding Feature Dev<quickunderstanding_feature_dev>`. All statement made here are just based on my own impressions & analysis and I'm open for discussion & feedback of any kind, THX!

If you just want to checkout the features of the framework, skip the following sections and proceed with a :doc:`Quickunderstanding Application<quickunderstanding_app>`.

Motivation
----------

This framework concept for PyTorch Ignite aims to enhance the current high-level training application to a
rapid feature prototyping framework that serves the full user spectrum, not only application developers but also researchers & engineers and the feature developer who are enhancing Ignite itself. To be successful on the long term, I believe it is necessary to serve the full spectrum of users, especially the researchers to end up in high quality application APIs.

The personal motivation arose when I tried to implement some new features and efficient APIs and reached restrictions within the architecture and provided infrastructure. Additionally, I was lacking a certain overview of the training processes I just had implemented when debugging.

Two issues
----------

I am a fan of Ignite and that's why I'm trying to contribute, but I discovered 2 shortcomings in the architecture and the implementation, that caused me quite some restrictions and coding infrastructure instead of programming new features (what I actually wanted to do). The issues are:

* **Engine centered architecture**: In current Ignite the `Engine` is the architectural center with the training `state` as attribute. The training `state` atttribute is a transient object that is only instantiated when the `Engine` is in `run`-mode and vanishes afterwards. Also the `state` holds only a selective fraction of all variables and parameters that make up the real training state. So `Engine` is a kind of static object and `state` is transient. This does not represent the reality of the training process. In reality the training starts with an initial state holding all variables, parameters including e.g. model variables, hyperparameters etc. which then are modified while the state goes through different transitions. The main transitions of the `state` are `Engine`\s (normally more than one). **So the state should be the architectual center holding ALL variables, parameters, values, transitions etc. and the Engine is (just) the main trainsition of the state.** This small twist causes quite some complications for features and APIs which are listed below.
* **Event is broken in many pieces**: Currently an `Event` is an `Enum` that has to be explicitly `fire_event`\ed, and implicitly `_fire_event`\ed so the `event_handlers` handle further callbacks. The `Event` is always fired after some other training value has changed, e.g. the model output was updated `ITERATION_COMPLETED` is fired. Also if you want to fire a non-standard event, you first have to create it, register it at each `Engine` that is supposed to use it and then the firing has to be implemented... But in reality an **Event is nothing more than a value change of a training state variable that triggers callbacks**. So all these pieces above can be put together by implementing a state variable as a descriptor.

Resulting Ignite limitations
-----------------------------

Even though Ignite is a perfect, zero-boilerplate tool for standard applications I experienced limitations when working with more complex use cases, e.g. multiple engines (up to 3 very common) with customized events and metrics being compared in shared charts etc.. Again these are just experiences I have made (you may disagree).

Have a look on the "current" Ignite architecture overview for better understanding of the following limitations. (Please excuse if it may be slightly outdated/altered, but the base structure should still be representative.)

.. image:: _static/img/index/architecture_current_ignite_analysis.png
   :target: _static/img/index/architecture_current_ignite_analysis.png

|

On the application level:

* `Event`-system:
    * complicated with many commands
    * new `Event` musst be register for each Engine separatly
    * global `Events` would have to be fired in all `Engines` separately
    * no overview over all existing `Events`
    * only very view default `Events` implemented (currently 9?)
* for non-standardized use cases boilerplate code strongly increased
* every non-standard `Event` causes high boilerplate code amount (despite the nice `every/once` solution)
* charts are restricted in number of engines & x-axis choice
* no overview during debugging, `state` only exists when `Engine` is running
* for simple debugging of your application you need approx. 60% knowledge of the current Ignite code to understand how an `Event` works, where the callbacks are & *when* i can find them (vanishing `state`)
* no analysis tools provided for the application, e.g. data flow graph or time sequential analysis, which may be impossible to implement in current architecture

On feature development level:

* restrictive architecture:
    * If you're honest, the architecture is laid out for a single engine: any current extensions for 2 engines (`another engine`) are breakouts from the current restrictive architecture and end with ``another_engine``... for 3 engines one would really need ``a_third_engine``.
    * Only ``Engine`` can hold and fire ``Event``\s and store variables (in ``Engine``\'s ``state``\) which can be properly reference by other features (referencing is done by passing the complete engine to the feature). This strongly restricts all other features in their possibilities.
    * As seen in the class diagram, the architecture is restricted to one type of ``Engine``, to ``Metric``\s, and ``OutputHandler``\s. The other provided features (red box "Nice Features", e.g. ``EarlyStopping``, ``TerminateOnNan``) are not integrated by the architecture.
* The framework abstraction layer is missing, feature and infrastructure code are entangled:
    * feature development requires full knowledge of code (only new metrics are rapidly implemented with `Metric`)
    * new features very often require new infrastructure code to get the variables you need, especially if they're located on different engines
* no clear design rules for new features (despite for ``Metric``) implying risk of architecture degradation
* no global state values/objects providing system: the distribution of variables on different engines and vanishing variables of `state` forced me into a lot of boilerplate code or even made me brake the architecture ending in a hack

On the infrastructure level:

* no separation of infrastructure and feature code
* developing new features always requires coding infrastructure to get the required objects (e.g. model output of different engines) into the feature
* almost no inheritance structure which leads to redundant code in different classes as each is a superclass on each own
* lacking inheritance structure also implies higher risk of update abnormalities and hinders fast or major infrastructure changes

Improvements from an underlying framework
------------------------------------------

You will experience the improvements given by the framework when working on all 3 levels: application implementation, feature development and framework development. The separation of these working areas is already the first improvement. Try out the benifits in detail & hands-on for the first to levels in the :doc:`Quickunderstanding Application<quickunderstanding_app>` and :doc:`Quickunderstanding Feature Dev<quickunderstanding_feature_dev>`.

Before you go through the theoretically described enhancements these few `no-comment-teasers` of the training ``state`` in the debugger will give you nice insights what's ahead. It shows the Ignite example `mnist_with_tensorboard.py <https://github.com/pytorch/ignite/blob/master/examples/mnist/mnist_with_tensorboardx.py>`_ :doc:`transferred to the framework architecture<examples/mnist_with_tensorboard_logger_and_high_level_apis>` just before the `engines are started`:

.. image:: _static/img/index/teaser_state_in_debugger.png
   :target: _static/img/index/teaser_state_in_debugger.png

|

And here the engine ``state.engines.trainer`` unfolded:

.. image:: _static/img/index/teaser_engine_in_debugger.png
   :target: _static/img/index/teaser_engine_in_debugger.png

|

Or setting up all the below Tensorboard charts with these two simple comands:

.. code-block:: python

    # Automatically identify and generate metric chats comparing the different engines
    EnginesMetricsComparisonCharts(x_axis_ref=state.trainer.n_samples_ref, n_identical_metric_name_suffixes=1)
    # Automatically generate for each engine a summary of all metric charts
    EnginesMetricsCharts(x_axes_refs=state.trainer.n_samples_ref, n_identical_metric_name_suffixes=1)

By the way, if you had set up 10x more metrics and some more engines, these two command would not change to provide all comparative and single metric charts of all engines.

.. image:: _static/img/index/teaser_tensorboard.png
   :target: _static/img/index/teaser_tensorboard.png

|

The framework development is not yet documented. For the moment, I can tell you this: While in the current Ignite architecture I found it challenging to find out how & where I could possibly add a little syntactic sugar, in contrast the challenge with the framework is rather cherry picking, trying to keep the syntax pythonic and not putting in too much magic (I hope I didn't already!). This also remains the ongoing challenge of the future framework programming to adjust the syntax to optimal workflow.

Have a look at the framework scheme showing all main components of the framework.

.. image:: _static/img/index/framework_component_scheme.png
   :target: _static/img/index/framework_component_scheme.png

|

The **framework provides 5 basic components** (& of course some subclasses):

* **state**: a global container providing **all** `state objects`/values/variables/parameters etc. to any feature.
* **state comtainer**: sub containers of `state` to categorize similar features together, e.g. engines, metrics, optimiziers, etc.
* **state object container**: the final sub container which hold the actual `state objects`\/values and is the base class for all features. This base class (to be 110% precise, the supertypeclass `MetaFrameStateObjectContainer`) has by default all functionalities integrated which currently only ``Engine`` provides, i.e. holding referencable state values, a fully blown event system (incl. firing, event_handlers, attaching etc.) and even more. This is why the default ``Engine`` in the framework is also just another feature derived from the `state object container` (see framework architecture diagram below).
* **state object**: finally the `state object` is the actual state value (e.g. engine output) and full event system all-in-one. It attaches to the `state object container` as class bound descriptor (class attribute) and comes with an additional callback list, which holds the callables that are called each time the value of the `state object` is set. The `state object` descriptor is the pendant to the current complete `Event` system of Ignite and `setting the state object value` is the pendant to firing an event.
* **state objects reference**: a integrated referencing object enabling the data flow between the `state object containers`. It creates a reference object of `state objects` which are handed into a `state object container` at initialization to provide update values of the `state objects` during training iterations.

Of course many subclasses of these basic components already exist & there are far more to come.

Have a look a the new framework architecture below and compare it with the `current Ignite architecture above <index.html#resulting-ignite-limitations>`_.

.. image:: _static/img/index/architecture_diagram_analysis.png
   :target: _static/img/index/architecture_diagram_analysis.png

|

Described in a general sense:

The framework is a low-level code layer running under the hood providing feature developers with any kind of training objects, data, outputs, syntactic sugar, analysis tools (and visualizations soon), extendable plugins, and so on... anywhere, anytime and especially with the need of zero code knowledge to use it. The required user knowledge is reduced to basic design rules and features which the framework provides mainly as ``Base*``-classes and from which on she/he starts coding. The framework itself runs mainly on ``Meta*``- and so-called ``Frame*``-classes.

In more details, the main characteritics of the framework architecture are:

* `state` is the architectural center
* `state object` is an all-in-one state value + event system provided in one piece to the feature developer
* every feature is equipped with `referencable state objects`, i.e. a full event system... so ``Engine`` is `just another feature`.
* clear inheritance structure of metaclasses and classes for easy major updates and low risk of update abnormalities
* strict separation of framework code layer and feature development code: framework runs on ``Meta*``\-classes & ``Frame*``\-classes and the interface from framework to feature development is defined by ``Bases*``\-classes. No framework code knowledge is required for feature development.
* framework code is based strongly on metaclasses & dunder methods providing additional flexibility
* every feature is fully integrated into the architecture
* even external PyTorch class instances are fully integrated as `state components` by exposing user defined variables & parameters of the instance as `state objects` to the framework
* next to training features, `state` also offers configuration, maintenance & analysis features/plugins to increase workflow efficiency

If you're still interested after this rough summary, you may now proceed with the :doc:`Quickunderstanding Application<quickunderstanding_app>`

How to integrate the new framework
----------------------------------

After identifying the two issues I tried of course to fix them within the existing code base. But after having a look at the new framework you probably will agree that this would not have been possible. Therefore, integrating the framework means making code adjustments to every feature class. Still all concepts of Ignite were picked up and integrated into the framework so only the code related to the infrastructure has to changed (simplified), basically the concepts of all existing features remain. You may realize 100% identical code blocks of current Ignite feature in already transferred/frameworked features.

Because of above reasons, before daring to suggest a major change, I went a long way on my own to provide you with a fully up & running framework with all basic features already implemented, so you can start playing with it straight away & make up your opinion based on facts.

Just in case enough contributers are interest in this concept, I could provide the full code and pull-request it on an experimental repository branch.

For an overview on the current implementation status, see below.


Current implementation status of the framework
----------------------------------------------

At the moment, I would approximate that 95% of the default framework and APIs are already implemented. However, this estimation excludes parallel & distributed computing features and performance optimization, which would be the next big step to eliminate any GIL-like show stopper risks. The basic framework is up & running. Any further transfer of existing Ignite features to the framework should probably be done after a full discussion about this version of the framework API.

Further I would except the framework code changes to diminish quickly as the syntax and infrastructure will reach their optimum soon. But, thanks to the new architecture tons of new analysis tools, visualization tools and features seem obvious to be implemented now, so I would expect a rapid increase of new developments on the `state plugins` and `feature` side.

In bullet points you will currently find already implemented features & classes i.e.:

* all base features of current Ignite, e.g. all event handling functions, every/once-filters, engines, metrics, output handlers, charts, summary writer...
* additional features demonstrated in :doc:`quickunderstanding_app`, :doc:`quickunderstanding_feature_dev` and the :doc:`other example scripts<examples>`
* ``Base*``-classes for all categories (, e.g. ``BaseMetric``, ``BaseOutputHandler``, ``BaseEngine``, ``BaseChart``,...)
* application classes, e.g. ``Engine``, ``OutputHandler``, ``Accuracy``, ``Loss``, ``AverageOutput``
* high-level application classes/pipelines, e.g. ``SupervisedTrainer/Evaluator`` (roughly), ``EnginesMetricsCharts``, ``EnginesMetricsComparisonCharts``

Missing implementations:

* distributed & parallel computing
* concept for automated output handling

Sorry for...
------------

...currently not providing you with a real documentation and references and no tests so far. At the moment the :doc:`Quickunderstanding Application<quickunderstanding_app>`, the :doc:`Quickunderstanding Feature Dev<quickunderstanding_feature_dev>` and the :doc:`examples` may be the best way to get started. I may provide further documentation on demand and will proceed depending on the feedback...


Content
-------

.. toctree::
   :maxdepth: 2
   :caption: Documentation

   self
   quickunderstanding_app
   quickunderstanding_feature_dev
   quickunderstanding_custom_state
   comments_on_framework_understanding
   examples
   todos
   faq

.. warning:: **GIBBERISH PATCHY & DEPRECIATED REFERENCES AHEAD**

        Take nothing for granted... to be updated.

|

.. toctree::
   :maxdepth: 2
   :caption: Features & Devs References

   engine
   transitions
   metrics
   output_handlers
   tensorboard
   pipelines
   states
   maintenance
   feature_dev_tools
   utils
   exceptions


.. toctree::
   :maxdepth: 2
   :caption: Framework References

   framework/framework
   framework/frame_state_objects
   framework/modular_state_objects
   framework/frame_state_objects_reference
   framework/state_component_attrs_integration
   framework/framework_utils