publications
2024
- wtfWhere to Fuse?L Petersson2024
This thesis investigates fusion techniques in multimodal transformer models, focusing on enhancing the capabilities of large language models in understanding not just text, but also other modalities like images, audio, and sensor data. The study compares late fusion (concatenating modality tokens after separate encoding) and early fusion (concatenating before encoding) techniques, examining their respective advantages and disadvantages. It examines a mid-fusion approach, aiming to combine the strengths of both methods. The effectiveness of this approach is evaluated in terms of accuracy and computational impact on the Visual Question Answering (VQA) task. Using a pretrained T5 model, the research incorporates image tokens (calculaed by Vision Transformer, ViT) into intermediate activations of the model. The findings indicate that standard early fusion techniques underperform with larger decoders, while late fusion with a smaller decoder yields the best results on the VQA task. This conclusion also extends to pooled modality tokens. Additionally, the thesis includes a comprehensive literature study, identifying benchmark datasets for video understanding in multimodal learning and highlighting datasets that demand a robust understanding of all involved modalities. This research contributes to the field by exploring and validating a novel fusion technique in multimodal learning, offering insights into its practical applications and limitations.
2021
- mbseLeveraging the Eclipse Modeling Framework to work with Electronic DatasheetsL. Petersson, and Perillo DModel Based Space Systems and Software Engineering, 2021
This abstract provides a practical guide to leverage the Eclipse Modeling Framework (EMF) for working with Electronic Datasheets (EDS). Starting from the SOIS EDS definition, available on the SANA website, it will be explained how to setup an EMF working environment, and how to generate a Tree Editor for editing and visualizing EDSs. It will also be explained how to exploit the Acceleo Model2Text (M2T) transformation language to navigate EDS models, and to generate artefacts in an almost automated manner. The problem of validating EDS models will also be discussed. A simple EDS use case will serve as a running example throughout the abstract. All the code mentioned in this abstract will be made available on the ESSR website.