Research

Domain-specific ontological & terminological resources

After my PhD I started working as a post-doctoral researcher at LIRMM on the PratikPharma project in particular around the LIRMM Bioportal platform. My work focuses on applying semantic web technologies and linguistic linked open data technologies to biomedical ontologies, with the ultimate goal of unifying and consolidating the representation and processing of ontological information, domain terminologies and lexical resources.

Multilingual Lexical Resources & PhD. Work

The objective of my PhD thesis was to propose an architecture for the interoperability, the scalability and the sense level alignment of multilingual lexical resources based on interlingual acceptions (Sérasset, 1994) in the context of Linguistic Linked Open Data. I have proposed a formalisation of interlingual acceptions and algorithms for their construction and updates on the use case of DBNary, a LLOD version of Wiktionary.

PhD Thesis Dissertation (in French) [PDF, 23.5MB] — PhD Defence Slides (in French) [PDF 22.5MB]

Abstract

Key Words: Ontolex, Word-sense alignment, Interlingual Acceptions, Multilingual Lexical Resources

When it comes to the construction of multilingual lexico-semantic resources, the first thing that comes to mind is that the resources we want to align should share the same data model and format (representational interoperability). However, with the emergence of standards such as LMF and their implementation and widespread use for the production of resources in the form of lexical linked data (Ontolex), representational interoperability has ceased to be a major challenge for the production of large-scale multilingual resources. However, as far as the interoperability of sense-level multilingual alignments is concerned, a major challenge is the choice of a suitable interlingual pivot. Many resources make the choice of using English senses as the pivot (e.g. BabelNet, Euro-WordNet), although this choice leads to a loss of contrast between English senses that are lexicalized with different words in other languages. The use of acception-based interlingual representations, a solution proposed over 20 years ago, could be viable. However, the manual construction of such language-independent pivot representations is very difficult due to the lack of expert speaking en- ough languages fluently and algorithms for their automatic constructions have never materialized, mainly because of the lack of a formal axiomatic characterization that ensures the preservation of their correctness properties. In this thesis, we address this issue by first formalizing acception- based interlingual pivot architectures through a set of axiomatic constraints and rules that guaran- tee their correctness. Then, we propose algorithms for the initial construction and the update of interlingual acception-based multilingual resources by exploiting the combinatorial properties of pairwise bilingual translation graphs. Secondly, we study the practical considerations of applying our construction algorithms on a tangible resource, DBNary (a lexical linked data resource extracted from Wiktionary).

Jury

  • Roberto Navigli. — Assoc. Prof. Universita Sapienza di Roma – Reviewer
  • Mathieu Lafourcade — MCF, UM – Reviewer
  • Denis Maurel — Pr. Université de Tours – Examiner
  • Nabil Hathout — DR CNRS. CLLE/ERSS Toulouse – Examiner
  • Eric Gaussier — Pr. Université Grenoble Alpes — President

Word Sense Disambiguation

Most of 2012 and 2013, my work has been focused on Word Sense Disambiguation and more recently Multilingual Word Sense Disambiguation in the context of Task 12 of SemEval 2013 using knowledge-based semantic similarity measures combined with stochastic optimisation algorithms, for the most part Ant Colony Optimisation.

Presentations

Here is a video reenactment of my presentation in COLING 2012 for the paper “Ant Colony Algorithm for the Unsupervised Word Sense Disambiguation of Texts:  Comparison and Evaluation”