Search | arXiv e-print repository

arXiv:2012.04964 [pdf, ps, other]

On Knowledge Distillation for Direct Speech Translation

Authors: Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi

Abstract: Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyz… ▽ More Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyze eventual drawbacks of this approach and how to alleviate them maintaining the benefits in terms of translation quality. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: Accepted at CLiC-IT 2020

arXiv:2012.04955 [pdf, ps, other]

Breeding Gender-aware Direct Speech Translation Systems

Authors: Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi

Abstract: In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vo… ▽ More In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vocal characteristics) that is otherwise lost in the cascade framework. Although such ability proved to be useful for gender translation, direct ST is nonetheless affected by gender bias just like its cascade counterpart, as well as machine translation and numerous other natural language processing applications. Moreover, direct ST systems that exclusively rely on vocal biometric features as a gender cue can be unsuitable and potentially harmful for certain users. Going beyond speech signals, in this paper we compare different approaches to inform direct ST models about the speaker's gender and test their ability to handle gender translation from English into Italian and French. To this aim, we manually annotated large datasets with speakers' gender information and used them for experiments reflecting different possible real-world scenarios. Our results show that gender-aware direct ST solutions can significantly outperform strong - but gender-unaware - direct ST models. In particular, the translation of gender-marked words can increase up to 30 points in accuracy while preserving overall translation quality. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: Outstanding paper at COLING 2020

Journal ref: In Proceedings of the 28th International Conference on Computational Linguistics, Dec 2020, 3951-3964. Online

arXiv:2010.14761 [pdf, other]

doi 10.1088/1742-5468/abcd31

Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures

Authors: Carlo Baldassi, Enrico M. Malatesta, Matteo Negri, Riccardo Zecchina

Abstract: We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape i… ▽ More We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape in the vicinity of a Bayes-optimal solution, and show that the closer we get to such configurations, the higher the local entropy, implying that the Bayes-optimal solution lays inside a wide flat region. We also consider the algorithmically relevant case of targeting wide flat minima of the (differentiable) mean squared error loss. Our analytical and numerical results show not only that in the balanced case the dependence on the norm of the weights is mild, but also, in the unbalanced case, that the performances can be improved. △ Less

Submitted 17 November, 2020; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: 19 pages, 4 figures. arXiv admin note: text overlap with arXiv:2006.07897

arXiv:2009.04707 [pdf, other]

On Target Segmentation for Direct Speech Translation

Authors: Mattia Antonino Di Gangi, Marco Gaido, Matteo Negri, Marco Turchi

Abstract: Recent studies on direct speech translation show continuous improvements by means of data augmentation techniques and bigger deep learning models. While these methods are helping to close the gap between this new approach and the more traditional cascaded one, there are many incongruities among different studies that make it difficult to assess the state of the art. Surprisingly, one point of disc… ▽ More Recent studies on direct speech translation show continuous improvements by means of data augmentation techniques and bigger deep learning models. While these methods are helping to close the gap between this new approach and the more traditional cascaded one, there are many incongruities among different studies that make it difficult to assess the state of the art. Surprisingly, one point of discussion is the segmentation of the target text. Character-level segmentation has been initially proposed to obtain an open vocabulary, but it results on long sequences and long training time. Then, subword-level segmentation became the state of the art in neural machine translation as it produces shorter sequences that reduce the training time, while being superior to word-level models. As such, recent works on speech translation started using target subwords despite the initial use of characters and some recent claims of better results at the character level. In this work, we perform an extensive comparison of the two methods on three benchmarks covering 8 language directions and multilingual training. Subword-level segmentation compares favorably in all settings, outperforming its character-level counterpart in a range of 1 to 3 BLEU points. △ Less

Submitted 10 September, 2020; originally announced September 2020.

Comments: 14 pages single column, 4 figures, accepted for presentation at the AMTA2020 research track

arXiv:2008.02270 [pdf, other]

Contextualized Translation of Automatically Segmented Speech

Authors: Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi

Abstract: Direct speech-to-text translation (ST) models are usually trained on corpora segmented at sentence level, but at inference time they are commonly fed with audio split by a voice activity detector (VAD). Since VAD segmentation is not syntax-informed, the resulting segments do not necessarily correspond to well-formed sentences uttered by the speaker but, most likely, to fragments of one or more sen… ▽ More Direct speech-to-text translation (ST) models are usually trained on corpora segmented at sentence level, but at inference time they are commonly fed with audio split by a voice activity detector (VAD). Since VAD segmentation is not syntax-informed, the resulting segments do not necessarily correspond to well-formed sentences uttered by the speaker but, most likely, to fragments of one or more sentences. This segmentation mismatch degrades considerably the quality of ST models' output. So far, researchers have focused on improving audio segmentation towards producing sentence-like splits. In this paper, instead, we address the issue in the model, making it more robust to a different, potentially sub-optimal segmentation. To this aim, we train our models on randomly segmented data and compare two approaches: fine-tuning and adding the previous segment as context. We show that our context-aware solution is more robust to VAD-segmented input, outperforming a strong base model and the fine-tuning on different VAD segmentations of an English-German test set by up to 4.25 BLEU points. △ Less

Submitted 5 August, 2020; originally announced August 2020.

Comments: Interspeech 2020

arXiv:2007.08174 [pdf, other]

Existence, energy identity and higher time regularity of solutions to a dynamic visco-elastic cohesive interface model

Authors: Matteo Negri, Riccardo Scala

Abstract: We study the dynamics of visco-elastic materials coupled by a common cohesive interface (or, equivalently, {two single domains separated by} a prescribed cohesive crack) in the anti-plane setting. We consider a general class of traction-separation laws featuring an activation threshold on the normal stress, softening and elastic unloading. In strong form, the evolution is described by a system of… ▽ More We study the dynamics of visco-elastic materials coupled by a common cohesive interface (or, equivalently, {two single domains separated by} a prescribed cohesive crack) in the anti-plane setting. We consider a general class of traction-separation laws featuring an activation threshold on the normal stress, softening and elastic unloading. In strong form, the evolution is described by a system of PDEs coupling momentum balance (in the bulk) with transmission and Karush-Kuhn-Tucker conditions (on the interface). We provide a detailed analysis of the system. We first prove existence of a weak solution, employing a time discrete approach and a regularization of the initial data. Then, we prove our main results: the energy identity and the existence of { solutions} with acceleration in $L^\infty (0,T; L^2)$. △ Less

Submitted 16 July, 2020; originally announced July 2020.

arXiv:2006.05754 [pdf, ps, other]

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

Authors: Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia Antonino Di Gangi, Roldano Cattoni, Marco Turchi

Abstract: Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained b… ▽ More Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with: i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian/French). △ Less

Submitted 10 June, 2020; originally announced June 2020.

Comments: 9 pages of content, accepted at ACL 2020

arXiv:2006.02965 [pdf, other]

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

Authors: Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi

Abstract: This paper describes FBK's participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems' ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom… ▽ More This paper describes FBK's participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems' ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom segmentation or not. We used the provided segmentation. Our system is an end-to-end model based on an adaptation of the Transformer for speech data. Its training process is the main focus of this paper and it is based on: i) transfer learning (ASR pretraining and knowledge distillation), ii) data augmentation (SpecAugment, time stretch and synthetic data), iii) combining synthetic and real data marked as different domains, and iv) multi-task learning using the CTC loss. Finally, after the training with word-level knowledge distillation is complete, our ST models are fine-tuned using label smoothed cross entropy. Our best model scored 29 BLEU on the MuST-C En-De test set, which is an excellent result compared to recent papers, and 23.7 BLEU on the same data segmented with VAD, showing the need for researching solutions addressing this specific data condition. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: Accepted at IWSLT2020

arXiv:2006.01080 [pdf, other]

Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?

Authors: Alina Karakanta, Matteo Negri, Marco Turchi

Abstract: Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting and segmenting the text into proper subtitles. Crea… ▽ More Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting and segmenting the text into proper subtitles. Creating proper subtitles in terms of timing and segmentation highly depends on information present in the audio (utterance duration, natural pauses). In this work, we explore two methods for applying Speech Translation (ST) to subtitling: a) a direct end-to-end and b) a classical cascade approach. We discuss the benefit of having access to the source language speech for improving the conformity of the generated subtitles to the spatial and temporal subtitling constraints and show that length is not the answer to everything in the case of subtitling-oriented ST. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: Accepted at IWSLT 2020

arXiv:2003.14402 [pdf, other]

Low Resource Neural Machine Translation: A Benchmark for Five African Languages

Authors: Surafel M. Lakew, Matteo Negri, Marco Turchi

Abstract: Recent advents in Neural Machine Translation (NMT) have shown improvements in low-resource language (LRL) translation tasks. In this work, we benchmark NMT between English and five African LRL pairs (Swahili, Amharic, Tigrigna, Oromo, Somali [SATOS]). We collected the available resources on the SATOS languages to evaluate the current state of NMT for LRLs. Our evaluation, comparing a baseline sing… ▽ More Recent advents in Neural Machine Translation (NMT) have shown improvements in low-resource language (LRL) translation tasks. In this work, we benchmark NMT between English and five African LRL pairs (Swahili, Amharic, Tigrigna, Oromo, Somali [SATOS]). We collected the available resources on the SATOS languages to evaluate the current state of NMT for LRLs. Our evaluation, comparing a baseline single language pair NMT model against semi-supervised learning, transfer learning, and multilingual modeling, shows significant performance improvements both in the En-LRL and LRL-En directions. In terms of averaged BLEU score, the multilingual approach shows the largest gains, up to +5 points, in six out of ten translation directions. To demonstrate the generalization capability of each model, we also report results on multi-domain test sets. We release the standardized experimental data and the test sets for future works addressing the challenges of NMT in under-resourced settings, in particular for the SATOS languages. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: Accepted for AfricaNLP workshop at ICLR 2020

arXiv:2002.10829 [pdf, other]

MuST-Cinema: a Speech-to-Subtitles corpus

Authors: Alina Karakanta, Matteo Negri, Marco Turchi

Abstract: Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires high-quality, large, task-specific training data. Th… ▽ More Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires high-quality, large, task-specific training data. The existing subtitling corpora, however, are missing both alignments to the source language audio and important information about subtitle breaks. This poses a significant limitation for developing efficient automatic approaches for subtitling, since the length and form of a subtitle directly depends on the duration of the utterance. In this work, we present MuST-Cinema, a multilingual speech translation corpus built from TED subtitles. The corpus is comprised of (audio, transcription, translation) triplets. Subtitle breaks are preserved by inserting special symbols. We show that the corpus can be used to build models that efficiently segment sentences into subtitles and propose a method for annotating existing subtitling corpora with subtitle breaks, conforming to the constraint of length. △ Less

Submitted 25 February, 2020; originally announced February 2020.

Comments: Accepted at LREC 2020

arXiv:1910.13998 [pdf, other]

Adapting Multilingual Neural Machine Translation to Unseen Languages

Authors: Surafel M. Lakew, Alina Karakanta, Marcello Federico, Matteo Negri, Marco Turchi

Abstract: Multilingual Neural Machine Translation (MNMT) for low-resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, we explore the problem of adapting an MNMT model to… ▽ More Multilingual Neural Machine Translation (MNMT) for low-resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, we explore the problem of adapting an MNMT model to an unseen LRL using data selection and model adaptation. In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance. We extensively explore data selection in popular multilingual NMT settings, namely in (zero-shot) translation, and in adaptation from a multilingual pre-trained model, for both directions (LRL-en). We further show that dynamic adaptation of the model's vocabulary results in a more favourable segmentation for the LRL in comparison with direct adaptation. Experiments show reductions in training time and significant performance gains over LRL baselines, even with zero LRL data (+13.0 BLEU), up to +17.0 BLEU for pre-trained multilingual model dynamic adaptation with related data selection. Our method outperforms current approaches, such as massively multilingual models and data augmentation, on four LRL. △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: Accepted at the 16th International Workshop on Spoken Language Translation (IWSLT), November, 2019

arXiv:1910.11726 [pdf, other]

A quasi-static model for craquelure patterns

Authors: Matteo Negri

Abstract: We consider the quasi-static evolution of a brittle layer on a stiff substrate; adhesion between layers is assumed to be elastic. Employing a phase-field approach we obtain the quasi-static evolution as the limit of time-discrete evolutions computed by an alternate minimization scheme. We study the limit evolution, providing a qualitative discussion of its behaviour and a rigorous characterization… ▽ More We consider the quasi-static evolution of a brittle layer on a stiff substrate; adhesion between layers is assumed to be elastic. Employing a phase-field approach we obtain the quasi-static evolution as the limit of time-discrete evolutions computed by an alternate minimization scheme. We study the limit evolution, providing a qualitative discussion of its behaviour and a rigorous characterization, in terms of parametrized balanced viscosity evolutions. Further, we study the transition layer of the phase-field, in a simplified setting, and show that it governs the spacing of cracks in the first stages of the evolution. Numerical results show a good consistency with the theoretical study and the local morphology of real life craquelure patterns. △ Less

Submitted 25 October, 2019; originally announced October 2019.

arXiv:1910.10663 [pdf, ps, other]

Instance-Based Model Adaptation For Direct Speech Translation

Authors: Mattia Antonino Di Gangi, Viet-Nhat Nguyen, Matteo Negri, Marco Turchi

Abstract: Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora. We tackle this limitation with a method to improve data exploitation and boost the system's performance at inference time. Our approach allows us to customize "on the fly" an existing model to each incoming t… ▽ More Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora. We tackle this limitation with a method to improve data exploitation and boost the system's performance at inference time. Our approach allows us to customize "on the fly" an existing model to each incoming translation request. At its core, it exploits an instance selection procedure to retrieve, from a given pool of data, a small set of samples similar to the input query in terms of latent properties of its audio signal. The retrieved samples are then used for an instance-specific fine-tuning of the model. We evaluate our approach in three different scenarios. In all data conditions (different languages, in/out-of-domain adaptation), our instance-based adaptation yields coherent performance gains over static models. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: 6 pages, under review at ICASSP 2020

arXiv:1910.03320 [pdf, other]

One-To-Many Multilingual End-to-end Speech Translation

Authors: Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi

Abstract: Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confront with extreme data scarcity conditions. The existing SLT parallel corpora are indeed orders of magnitude smaller than those available for the closely related tasks of automatic speech recognition (ASR) and machine translation (MT), which usually comprise tens of millions of instances. To cope wit… ▽ More Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confront with extreme data scarcity conditions. The existing SLT parallel corpora are indeed orders of magnitude smaller than those available for the closely related tasks of automatic speech recognition (ASR) and machine translation (MT), which usually comprise tens of millions of instances. To cope with data paucity, in this paper we explore the effectiveness of transfer learning in end-to-end SLT by presenting a multilingual approach to the task. Multilingual solutions are widely studied in MT and usually rely on ``\textit{target forcing}'', in which multilingual parallel data are combined to train a single model by prepending to the input sequences a language token that specifies the target language. However, when tested in speech translation, our experiments show that MT-like \textit{target forcing}, used as is, is not effective in discriminating among the target languages. Thus, we propose a variant that uses target-language embeddings to shift the input representations in different portions of the space according to the language, so to better support the production of output in the desired target language. Our experiments on end-to-end SLT from English into six languages show important improvements when translating into similar languages, especially when these are supported by scarce data. Further improvements are obtained when using English ASR data as an additional language (up to $+2.5$ BLEU points). △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 8 pages, one figure, version accepted at ASRU 2019

arXiv:1910.00478 [pdf, other]

Machine Translation for Machines: the Sentiment Classification Use Case

Authors: Amirhossein Tebbifakhr, Luisa Bentivogli, Matteo Negri, Marco Turchi

Abstract: We propose a neural machine translation (NMT) approach that, instead of pursuing adequacy and fluency ("human-oriented" quality criteria), aims to generate translations that are best suited as input to a natural language processing component designed for a specific downstream task (a "machine-oriented" criterion). Towards this objective, we present a reinforcement learning technique based on a new… ▽ More We propose a neural machine translation (NMT) approach that, instead of pursuing adequacy and fluency ("human-oriented" quality criteria), aims to generate translations that are best suited as input to a natural language processing component designed for a specific downstream task (a "machine-oriented" criterion). Towards this objective, we present a reinforcement learning technique based on a new candidate sampling strategy, which exploits the results obtained on the downstream task as weak feedback. Experiments in sentiment classification of Twitter data in German and Italian show that feeding an English classifier with machine-oriented translations significantly improves its performance. Classification results outperform those obtained with translations produced by general-purpose NMT models as well as by an approach based on reinforcement learning. Moreover, our results on both languages approximate the classification accuracy computed on gold standard English tweets. △ Less

Submitted 1 October, 2019; originally announced October 2019.

arXiv:1909.13327 [pdf, other]

Natural representation of composite data with replicated autoencoders

Authors: Matteo Negri, Davide Bergamini, Carlo Baldassi, Riccardo Zecchina, Christoph Feinauer

Abstract: Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a mor… ▽ More Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge. △ Less

Submitted 29 September, 2019; originally announced September 2019.

Comments: 11 pages, 4 figures

arXiv:1909.07342 [pdf, other]

Multilingual Neural Machine Translation for Zero-Resource Languages

Authors: Surafel M. Lakew, Marcello Federico, Matteo Negri, Marco Turchi

Abstract: In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT). However, NMT systems are limited in translating low-resourced languages, due to the significant amount of parallel data that is required to learn useful mappings between languages. In this work, we show… ▽ More In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT). However, NMT systems are limited in translating low-resourced languages, due to the significant amount of parallel data that is required to learn useful mappings between languages. In this work, we show how the so-called multilingual NMT can help to tackle the challenges associated with low-resourced language translation. The underlying principle of multilingual NMT is to force the creation of hidden representations of words in a shared semantic space across multiple languages, thus enabling a positive parameter transfer across languages. Along this direction, we present multilingual translation experiments with three languages (English, Italian, Romanian) covering six translation directions, utilizing both recurrent neural networks and transformer (or self-attentive) neural networks. We then focus on the zero-shot translation problem, that is how to leverage multi-lingual data in order to learn translation directions that are not covered by the available training material. To this aim, we introduce our recently proposed iterative self-training method, which incrementally improves a multilingual NMT on a zero-shot direction by just relying on monolingual data. Our results on TED talks data show that multilingual NMT outperforms conventional bilingual NMT, that the transformer NMT outperforms recurrent NMT, and that zero-shot NMT outperforms conventional pivoting methods and even matches the performance of a fully-trained bilingual system. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: 15 pages, Published on Italian Journal of Computational Linguistics (IJCoL) -- Multilingual Neural Machine Translation for Low-Resource Languages, June 2018

arXiv:1908.10111 [pdf, other]

Weak solutions for gradient flows under monotonicity constraints

Authors: Matteo Negri, Masato Kimura

Abstract: We consider the gradient flow of a quadratic non-autonomous energy under monotonicity constraint in time and natural regularity assumptions. We provide first a notion of weak solution, inspired by the theory of curves of maximal slope, and then existence (employing time-discrete schemes with different "implementations" of the constraint), uniqueness, power and energy identity, comparison principle… ▽ More We consider the gradient flow of a quadratic non-autonomous energy under monotonicity constraint in time and natural regularity assumptions. We provide first a notion of weak solution, inspired by the theory of curves of maximal slope, and then existence (employing time-discrete schemes with different "implementations" of the constraint), uniqueness, power and energy identity, comparison principle and continuous dependence. As a byproduct, we show that the energy identity gives a selection criterion for the (non-unique) evolutions obtained by other notions of solutions. We finally show that, for autonomous energies, the solutions obtained with the monotonicity constraint actually coincide with those obtained with a fixed obstacle, given by the initial datum. △ Less

Submitted 27 August, 2019; originally announced August 2019.

MSC Class: 49J40; 35K86

arXiv:1907.09814 [pdf, other]

doi 10.1016/j.cma.2020.112858

$Γ$-convergence for high order phase field fracture: continuum and isogeometric formulations

Authors: Matteo Negri

Abstract: We consider second order phase field functionals, in the continuum setting, and their discretization with isogeometric tensor product B-splines. We prove that these functionals, continuum and discrete, $Γ$-converge to a brittle fracture energy, defined in the space $GSBD^2$. In particular, in the isogeometric setting, since the projection operator is not Lagrangian (i.e., interpolatory) a special… ▽ More We consider second order phase field functionals, in the continuum setting, and their discretization with isogeometric tensor product B-splines. We prove that these functionals, continuum and discrete, $Γ$-converge to a brittle fracture energy, defined in the space $GSBD^2$. In particular, in the isogeometric setting, since the projection operator is not Lagrangian (i.e., interpolatory) a special construction is needed in order to guarantee that recovery sequences take values in $[0,1]$; convergence holds, as expected, if $h = o (\varepsilon)$, being $h$ the size of the physical mesh and $\varepsilon$ the internal length in the phase field energy. △ Less

Submitted 9 September, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

arXiv:1904.01895 [pdf, ps, other]

doi 10.1007/s00205-019-01468-4

Analysis of staggered evolutions for nonlinear energies in phase field fracture

Authors: Stefano Almi, Matteo Negri

Abstract: We consider a class of separately convex phase field energies employed in fracture mechanics, featuring non-interpenetration and a general softening behavior. We analyze the time-discrete evolutions generated by a staggered minimization scheme, where fracture irreversibility is modeled by a monotonicity constraint on the phase field variable. After recasting the staggered scheme by means of gradie… ▽ More We consider a class of separately convex phase field energies employed in fracture mechanics, featuring non-interpenetration and a general softening behavior. We analyze the time-discrete evolutions generated by a staggered minimization scheme, where fracture irreversibility is modeled by a monotonicity constraint on the phase field variable. After recasting the staggered scheme by means of gradient flows, we characterize the time-continuous limits of the discrete solutions in terms of balanced viscosity evolutions, parametrized by their arc-length with respect to the L2-norm (for the phase field) and the H1-norm (for the displacement field). By a careful study of the energy balance we deduce that time-continuous evolutions may still exhibit an alternate behavior in discontinuity times. △ Less

Submitted 3 April, 2019; originally announced April 2019.

arXiv:1811.01389 [pdf, other]

Improving Zero-Shot Translation of Low-Resource Languages

Authors: Surafel M. Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, Marcello Federico

Abstract: Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zeroshot) translation directions not observed at training time. We investigate here a zero-shot translation in a particularly lowresource multilingual setting. We propose a simple iterative training procedure that leverages a duality of… ▽ More Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zeroshot) translation directions not observed at training time. We investigate here a zero-shot translation in a particularly lowresource multilingual setting. We propose a simple iterative training procedure that leverages a duality of translations directly generated by the system for the zero-shot directions. The translations produced by the system (sub-optimal since they contain mixed language from the shared vocabulary), are then used together with the original parallel data to feed and iteratively re-train the multilingual network. Over time, this allows the system to learn from its own generated and increasingly better output. Our approach shows to be effective in improving the two zero-shot directions of our multilingual model. In particular, we observed gains of about 9 BLEU points over a baseline multilingual model and up to 2.08 BLEU over a pivoting mechanism using two bilingual models. Further analysis shows that there is also a slight improvement in the non-zero-shot language directions. △ Less

Submitted 4 November, 2018; originally announced November 2018.

Comments: Published at the International Workshop on Spoken Language Translation (IWSLT), Tokyo, Japan, December 2017

arXiv:1811.01137 [pdf, other]

Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

Authors: Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, Marco Turchi

Abstract: We propose a method to transfer knowledge across neural machine translation (NMT) models by means of a shared dynamic vocabulary. Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i.e., introducing new vocabulary items if they are not included in the initial model). The parameter transfer… ▽ More We propose a method to transfer knowledge across neural machine translation (NMT) models by means of a shared dynamic vocabulary. Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i.e., introducing new vocabulary items if they are not included in the initial model). The parameter transfer mechanism is evaluated in two scenarios: i) to adapt a trained single language NMT system to work with a new language pair and ii) to continuously add new language pairs to grow to a multilingual NMT system. In both the scenarios our goal is to improve the translation performance, while minimizing the training convergence time. Preliminary experiments spanning five languages with different training data sizes (i.e., 5k and 50k parallel sentences) show a significant performance gain ranging from +3.85 up to +13.63 BLEU in different language directions. Moreover, when compared with training an NMT model from scratch, our transfer-learning approach allows us to reach higher performance after training up to 4% of the total training steps. △ Less

Submitted 2 November, 2018; originally announced November 2018.

Comments: Published at the International Workshop on Spoken Language Translation (IWSLT), 2018

arXiv:1810.07652 [pdf, other]

Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018

Authors: Mattia Antonino Di Gangi, Roberto Dessì, Roldano Cattoni, Matteo Negri, Marco Turchi

Abstract: This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018. Our system relies on a state-of-the-art model based on LSTMs and CNNs, where the CNNs are used to reduce the temporal dimension of the audio input, which is in general much higher than machine translation input. Our model was trained only on the audio-to-text parallel data released for the… ▽ More This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018. Our system relies on a state-of-the-art model based on LSTMs and CNNs, where the CNNs are used to reduce the temporal dimension of the audio input, which is in general much higher than machine translation input. Our model was trained only on the audio-to-text parallel data released for the task, and fine-tuned on cleaned subsets of the original training corpus. The addition of weight normalization and label smoothing improved the baseline system by 1.0 BLEU point on our validation set. The final submission also featured checkpoint averaging within a training run and ensemble decoding of models trained during multiple runs. On test data, our best single model obtained a BLEU score of 9.7, while the ensemble obtained a BLEU score of 10.24. △ Less

Submitted 16 October, 2018; originally announced October 2018.

Comments: 6 pages, 2 figures, system description at the 15th International Workshop on Spoken Language Translation (IWSLT) 2018

arXiv:1809.10026 [pdf, other]

Space-time least-squares isogeometric method and efficient solver for parabolic problems

Authors: Monica Montardini, Matteo Negri, Giancarlo Sangalli, Mattia Tani

Abstract: In this paper, we propose a space-time least-squares isogeometric method to solve parabolic evolution problems, well suited for high-degree smooth splines in the space-time domain. We focus on the linear solver and its computational efficiency: thanks to the proposed formulation and to the tensor-product construction of space-time splines, we can design a preconditioner whose application requires… ▽ More In this paper, we propose a space-time least-squares isogeometric method to solve parabolic evolution problems, well suited for high-degree smooth splines in the space-time domain. We focus on the linear solver and its computational efficiency: thanks to the proposed formulation and to the tensor-product construction of space-time splines, we can design a preconditioner whose application requires the solution of a Sylvester-like equation, which is performed efficiently by the fast diagonalization method. The preconditioner is robust w.r.t. spline degree and mesh size. The computational time required for its application, for a serial execution, is almost proportional to the number of degrees-of-freedom and independent of the polynomial degree. The proposed approach is also well-suited for parallelization. △ Less

Submitted 16 September, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: 29 pages, 8 figures

arXiv:1803.07274 [pdf, ps, other]

eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

Authors: Matteo Negri, Marco Turchi, Rajen Chatterjee, Nicola Bertoldi

Abstract: Training models for the automatic correction of machine-translated text usually relies on data consisting of (source, MT, human post- edit) triplets providing, for each source sentence, examples of translation errors with the corresponding corrections made by a human post-editor. Ideally, a large amount of data of this kind should allow the model to learn reliable correction patterns and effective… ▽ More Training models for the automatic correction of machine-translated text usually relies on data consisting of (source, MT, human post- edit) triplets providing, for each source sentence, examples of translation errors with the corresponding corrections made by a human post-editor. Ideally, a large amount of data of this kind should allow the model to learn reliable correction patterns and effectively apply them at test stage on unseen (source, MT) pairs. In practice, however, their limited availability calls for solutions that also integrate in the training process other sources of knowledge. Along this direction, state-of-the-art results have been recently achieved by systems that, in addition to a limited amount of available training data, exploit artificial corpora that approximate elements of the "gold" training instances with automatic translations. Following this idea, we present eSCAPE, the largest freely-available Synthetic Corpus for Automatic Post-Editing released so far. eSCAPE consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora, and using the target side as an artificial human post-edit. Translations are obtained both with phrase-based and neural models. For each MT paradigm, eSCAPE contains 7.2 million triplets for English-German and 3.3 millions for English-Italian, resulting in a total of 14,4 and 6,6 million instances respectively. The usefulness of eSCAPE is proved through experiments in a general-domain scenario, the most challenging one for automatic post-editing. For both language directions, the models trained on our artificial data always improve MT quality with statistically significant gains. The current version of eSCAPE can be freely downloaded from: http://hltshare.fbk.eu/QT21/eSCAPE.html. △ Less

Submitted 20 March, 2018; originally announced March 2018.

Comments: Accepted at LREC 2018

arXiv:1802.03949 [pdf, other]

Spontaneous domain formation in disordered copolymers as a mechanism for chromosome structuring

Authors: Matteo Negri, Marco Gherardi, Guido Tiana, Marco Cosentino Lagomarsino

Abstract: Motivated by the problem of domain formation in chromosomes, we studied a co--polymer model where only a subset of the monomers feel attractive interactions. These monomers are displaced randomly from a regularly-spaced pattern, thus introducing some quenched disorder in the system. Previous work has shown that in the case of regularly-spaced interacting monomers this chain can fold into structure… ▽ More Motivated by the problem of domain formation in chromosomes, we studied a co--polymer model where only a subset of the monomers feel attractive interactions. These monomers are displaced randomly from a regularly-spaced pattern, thus introducing some quenched disorder in the system. Previous work has shown that in the case of regularly-spaced interacting monomers this chain can fold into structures characterized by multiple distinct domains of consecutive segments. In each domain, attractive interactions are balanced by the entropy cost of forming loops. We show by advanced replica-exchange simulations that adding disorder in the position of the interacting monomers further stabilizes these domains. The model suggests that the partitioning of the chain into well-defined domains of consecutive monomers is a spontaneous property of heteropolymers. In the case of chromosomes, evolution could have acted on the spacing of interacting monomers to modulate in a simple way the underlying domains for functional reasons. △ Less

Submitted 12 February, 2018; originally announced February 2018.

arXiv:1707.09879 [pdf, ps, other]

Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

Authors: Duygu Ataman, Matteo Negri, Marco Turchi, Marcello Federico

Abstract: The necessity of using a fixed-size word vocabulary in order to control the model complexity in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on performance, especially for morphologically rich languages. Conventional methods that aim to overcome this problem by using sub-word or character-level representations solely rely on statistics and disregard the ling… ▽ More The necessity of using a fixed-size word vocabulary in order to control the model complexity in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on performance, especially for morphologically rich languages. Conventional methods that aim to overcome this problem by using sub-word or character-level representations solely rely on statistics and disregard the linguistic properties of words, which leads to interruptions in the word structure and causes semantic and syntactic losses. In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language. Our method is based on unsupervised morphology learning and can be, in principle, used for pre-processing any language pair. We also present an alternative word segmentation method based on supervised morphological analysis, which aids us in measuring the accuracy of our model. We evaluate our method in Turkish-to-English NMT task where the input language is morphologically rich and agglutinative. We analyze different representation methods in terms of translation accuracy as well as the semantic and syntactic properties of the generated output. Our method obtains a significant improvement of 2.3 BLEU points over the conventional vocabulary reduction technique, showing that it can provide better accuracy in open vocabulary translation of morphologically rich languages. △ Less

Submitted 31 July, 2017; originally announced July 2017.

Comments: The 20th Annual Conference of the European Association for Machine Translation (EAMT), Research Paper, 12 pages

Journal ref: The Prague Bulletin of Mathematical Linguistics. No. 108, 2017, pp. 331-342

arXiv:1706.07238 [pdf, other]

doi 10.1016/j.csl.2017.06.003

Automatic Quality Estimation for ASR System Combination

Authors: Shahab Jalalvand, Matteo Negri, Daniele Falavigna, Marco Matassoni, Marco Turchi

Abstract: Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly… ▽ More Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%. △ Less

Submitted 22 June, 2017; originally announced June 2017.

arXiv:1702.01714 [pdf, ps, other]

DNN adaptation by automatic quality estimation of ASR hypotheses

Authors: Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi

Abstract: In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" ins… ▽ More In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline. △ Less

Submitted 6 February, 2017; originally announced February 2017.

Comments: Computer Speech & Language December 2016

arXiv:1612.02660 [pdf, ps, other]

Decision Theory in an Algebraic Setting

Authors: Maurizio Negri

Abstract: In decision theory an act is a function from a set of conditions to the set of real numbers. The set of conditions is a partition in some algebra of events. The expected value of an act can be calculated when a probability measure is given. We adopt an algebraic point of view by substituting the algebra of events with a finite distributive lattice and the probability measure with a lattice valuati… ▽ More In decision theory an act is a function from a set of conditions to the set of real numbers. The set of conditions is a partition in some algebra of events. The expected value of an act can be calculated when a probability measure is given. We adopt an algebraic point of view by substituting the algebra of events with a finite distributive lattice and the probability measure with a lattice valuation. We introduce a partial order on acts that generalizes the dominance relation and show that the set of acts is a lattice with respect to this order. Finally we analyze some different kinds of comparison between acts, without supposing a common set of conditions for the acts to be compared. △ Less

Submitted 2 December, 2016; originally announced December 2016.

Comments: 22 pages, 1 figure

MSC Class: 91B06; 90B50

arXiv:1405.7813 [pdf, ps, other]

A Dutch Book theorem for partial subjective probability

Authors: Maurizio Negri

Abstract: The aim of this paper is to show that partial probability can be justified from the standpoint of subjective probability in much the same way as classical probability does. The seminal works of Ramsey and De Finetti have furnished a method for assessing subjective probabilities: ask about the bets the decision-maker would be willing to place. So we introduce the concept of partial bet and partial… ▽ More The aim of this paper is to show that partial probability can be justified from the standpoint of subjective probability in much the same way as classical probability does. The seminal works of Ramsey and De Finetti have furnished a method for assessing subjective probabilities: ask about the bets the decision-maker would be willing to place. So we introduce the concept of partial bet and partial Dutch Book and prove for partial probability a result similar to the Ramsey-De Finetti theorem. Finally, we make a comparison between two concepts of bet: we can bet our money on a sentence describing an event, or we can bet our money on the event itself, generally conceived as a set. These two ways of understanding a bet are equivalent in classical probability, but not in partial probability. △ Less

Submitted 30 May, 2014; originally announced May 2014.

MSC Class: 60A05; 91B06; 91B16

arXiv:1405.0634 [pdf, other]

Numerical simulations of stick percolation: Application to the study of structured magnetorheologial elastomers

Authors: J. L. Mietta, R. M. Negri, P. I. Tamborenea

Abstract: In this article we explore how structural parameters of composites filled with one-dimensional, electrically conducting elements (such as sticks, needles, chains, or rods) affect the percolation properties of the system. To this end, we perform Monte Carlo simulations of asymmetric two-dimensional stick systems with anisotropic alignments. We compute the percolation probability functions in the di… ▽ More In this article we explore how structural parameters of composites filled with one-dimensional, electrically conducting elements (such as sticks, needles, chains, or rods) affect the percolation properties of the system. To this end, we perform Monte Carlo simulations of asymmetric two-dimensional stick systems with anisotropic alignments. We compute the percolation probability functions in the direction of preferential orientation of the percolating objects and in the orthogonal direction, as functions of the experimental structural parameters. Among these, we considered the average length of the sticks, the standard deviation of the length distribution, and the standard deviation of the angular distribution. We developed a computer algorithm capable of reproducing and verifying known theoretical results for isotropic networks and which allows us to go beyond and study anisotropic systems of experimental interest. Our research shows that the total electrical anisotropy, considered as a direct consequence of the percolation anisotropy, depends mainly on the standard deviation of the angular distribution and on the average length of the sticks. A conclusion of practical interest is that we find that there is a wide and well-defined range of values for the mentioned parameters for which it is possible to obtain reliable anisotropic percolation under relatively accessible experimental conditions when considering composites formed by dispersions of sticks, oriented in elastomeric matrices. △ Less

Submitted 3 May, 2014; originally announced May 2014.

Comments: 27 pages, 11 figures

arXiv:1310.6172 [pdf, ps, other]

Partial Probability and Kleene Logic

Authors: Maurizio Negri

Abstract: There are two main approach to probability, one of set-theoretic character where probability is the measure of a set, and another one of linguistic character where probability is the degree of confidence in a proposition. In this work we give an unified algebraic treatment of these approaches through the concept of valued lattice, obtaining as a by-product a translation between them. Then we intro… ▽ More There are two main approach to probability, one of set-theoretic character where probability is the measure of a set, and another one of linguistic character where probability is the degree of confidence in a proposition. In this work we give an unified algebraic treatment of these approaches through the concept of valued lattice, obtaining as a by-product a translation between them. Then we introduce the concept of partial valuation for DMF-algebras (De Morgan algebras with a single fixed point for negation), giving an algebraic setting for probability of partial events. We introduce the concept of partial probability for propositions, substituting classical logic with Kleene's logic. In this case too we give a translation between set-theoretic and linguistic probability. Finally, we introduce the concept of conditional partial probability and prove a weak form of Bayes's Theorem. △ Less

Submitted 23 October, 2013; originally announced October 2013.

MSC Class: 03G10; 60B99; 03B50; 06D25

arXiv:0810.2693 [pdf, ps, other]

doi 10.1117/12.789621

The Gas Pixel Detector as an X-ray photoelectric polarimeter with a large field of view

Authors: Fabio Muleri, Paolo Soffitta, Ronaldo Bellazzini, Alessandro Brez, Enrico Costa, Sergio Fabiani, Massimo Frutti, Massimo Minuti, Maria Barbara Negri, Michele Pinchera, Alda Rubini, Gloria Spandre

Abstract: The Gas Pixel Detector (GPD) is a new generation device which, thanks to its 50 um pixels, is capable of imaging the photoelectrons tracks produced by photoelectric absorption in a gas. Since the direction of emission of the photoelectrons is strongly correlated with the direction of polarization of the absorbed photons, this device has been proposed as a polarimeter for the study of astrophysic… ▽ More The Gas Pixel Detector (GPD) is a new generation device which, thanks to its 50 um pixels, is capable of imaging the photoelectrons tracks produced by photoelectric absorption in a gas. Since the direction of emission of the photoelectrons is strongly correlated with the direction of polarization of the absorbed photons, this device has been proposed as a polarimeter for the study of astrophysical sources, with a sensitivity far higher than the instruments flown to date. The GPD has been always regarded as a focal plane instrument and then it has been proposed to be included on the next generation space-borne missions together with a grazing incidence optics. Instead in this paper we explore the feasibility of a new kind of application of the GPD and of the photoelectric polarimeters in general, i.e. an instrument with a large field of view. By means of an analytical treatment and measurements, we verify if it is possible to preserve the sensitivity to the polarization for inclined beams, opening the way for the measurement of X-ray polarization for transient astrophysical sources. While severe systematic effects arise for inclination greater than about 20 degrees, methods and algorithms to control them are discussed. △ Less

Submitted 15 October, 2008; originally announced October 2008.

Comments: 11 pages, 8 figures

Journal ref: Proceedings of SPIE Astronomical Instrumentation 2008 Conference, 23-28 June 2008 Marseille, France, vol. 7011-88

arXiv:0709.4623 [pdf, ps, other]

doi 10.1016/j.nima.2007.09.046

Low energy polarization sensitivity of the Gas Pixel Detector

Authors: F. Muleri, P. Soffitta, L. Baldini, R. Bellazzini, J. Bregeon, A. Brez, E. Costa, M. Frutti, L. Latronico, M. Minuti, M. B. Negri, N. Omodei, M. Pinchera, M. Pesce-Rollins, M. Razzano, A. Rubini, C. Sgro', G. Spandre

Abstract: An X-ray photoelectric polarimeter based on the Gas Pixel Detector has been proposed to be included in many upcoming space missions to fill the gap of about 30 years from the first (and to date only) positive measurement of polarized X-ray emission from an astrophysical source. The estimated sensitivity of the current prototype peaks at an energy of about 3 keV, but the lack of readily available… ▽ More An X-ray photoelectric polarimeter based on the Gas Pixel Detector has been proposed to be included in many upcoming space missions to fill the gap of about 30 years from the first (and to date only) positive measurement of polarized X-ray emission from an astrophysical source. The estimated sensitivity of the current prototype peaks at an energy of about 3 keV, but the lack of readily available polarized sources in this energy range has prevented the measurement of detector polarimetric performances. In this paper we present the measurement of the Gas Pixel Detector polarimetric sensitivity at energies of a few keV and the new, light, compact and transportable polarized source that was devised and built to this aim. Polarized photons are produced, from unpolarized radiation generated with an X-ray tube, by means of Bragg diffraction at nearly 45 degrees. The employment of mosaic graphite and flat aluminum crystals allow the production of nearly completely polarized photons at 2.6, 3.7 and 5.2 keV from the diffraction of unpolarized continuum or line emission. The measured modulation factor of the Gas Pixel Detector at these energies is in good agreement with the estimates derived from a Monte Carlo software, which was up to now employed for driving the development of the instrument and for estimating its low energy sensitivity. In this paper we present the excellent polarimetric performance of the Gas Pixel Detector at energies where the peak sensitivity is expected. These measurements not only support our previous claims of high sensitivity but confirm the feasibility of astrophysical X-ray photoelectric polarimetry. △ Less

Submitted 28 September, 2007; originally announced September 2007.

Comments: 15 pages, 12 figures. Accepted for publication in NIMA

Journal ref: Nucl.Instrum.Meth.A584:149-159,2008

arXiv:0709.3978 [pdf, ps, other]

doi 10.1117/12.734647

A very compact polarizer for an X-ray polarimeter calibration

Authors: Fabio Muleri, Paolo Soffitta, Ronaldo Bellazzini, Alessandro Brez, Enrico Costa, Sergio Fabiani, Massimo Frutti, Massimo Minuti, Maria Barbara Negri, Piermarco Pascale, Alda Rubini, Giuseppe Sindoni, Gloria Spandre

Abstract: We devised and built a light, compact and transportable X-ray polarized source based on the Bragg diffraction at nearly 45 degrees. The source is composed by a crystal coupled to a small power X-ray tube. The angles of incidence are selected by means of two orthogonal capillary plates which, due to the small diameter holes (10 um) allow good collimation with limited sizes. All the orders of diff… ▽ More We devised and built a light, compact and transportable X-ray polarized source based on the Bragg diffraction at nearly 45 degrees. The source is composed by a crystal coupled to a small power X-ray tube. The angles of incidence are selected by means of two orthogonal capillary plates which, due to the small diameter holes (10 um) allow good collimation with limited sizes. All the orders of diffraction defined by the crystal lattice spacing are polarized up to the maximum order limited by the X-ray tube voltage. Selecting suitably the crystal and the X-ray tube, either the line or the continuum emission can be diffracted, producing polarized photons at different energies. A very high degree of polarization and reasonable fluxes can be reached with a suitable choice of the capillary plates collimation. We present the source and test its performances with the production of nearly completely polarized radiation at 2.6, 5.2, 3.7 and 7.4 keV thanks to the employment of graphite and aluminum crystals, with copper and calcium X-ray tubes respectively. Triggered by the very compact design of the source, we also present a feasibility study for an on-board polarized source, coupled to a radioactive Fe55 nuclide and a PVC thin film, for the calibration of the next generation space-borne X-ray polarimeters at 2.6 and 5.9 keV. △ Less

Submitted 25 September, 2007; originally announced September 2007.

Comments: 12 pages, 22 figures

Journal ref: Proceedings of SPIE Optics + Photonics 2007 Conference - San Diego, vol. 6686-33

Showing 51–87 of 87 results for author: Negri, M