Research

Undergoing active update! Check back soon!

UMR EuroMov DHM, Semantics and Taxonomy of Movement (SemTaxM)

Linked Open Data and Knowledge Graphs

Ontology Portals and Semantic Annotation

Keywords: Entity Linking, Entity disambiguation, Ontology alignment, Meta-data, interoperability

I have worked on SIFR BioPortal, a French version of the NCBO BioPortal, the reference platform for Biomedical ontologies and associated services as well as Agroportal, an adaptation of the technology to Agronomy. My particular focus was on semantic annotation and the improvement of the SIFR Annotator component. In this context, I have developed the SIFR Annotator Proxy Servlet that enables a seamless extension of the NCBO Annotator API, which gave birth to NCBO Annotator+ (an extended version of NCBO Annotator) and SFIR Annotator, with several new features such as: Semantic filtering based on UMLS types and semantic groups, negation, clinical context, and temporality detection.

During a visit to the Stanford Center for Biomedical Informatics (BMIR) in 2018, I worked on integrating those extensions to the official NCBO Bioportal service.

Use Case: I participated in the Practik Pharma ANR project, that studied computer sciences approaches to extract, compare, validate state of the art knowledge in the biomedical domain of Pharmacogneomics (Pharmacogenomics studies how genetics impacts drug response phentoypes). SIFR Bioportal and the semantic annotation of clinical text were an integral part of the project, as the basis for identifying concepts (Disorder, Phenotypes, Drugs) that took part in phamacogenomic relationships.

Multilingual Lexical Resources & PhD. Work

The objective of my PhD thesis was to propose an architecture for the interoperability, the scalability and the sense level alignment of multilingual lexical resources based on interlingual acceptions (Sérasset, 1994) in the context of Linguistic Linked Open Data. I have proposed a formalisation of interlingual acceptions and algorithms for their construction and updates on the use case of DBNary, a LLOD version of Wiktionary.

PhD Thesis Dissertation (in French) [PDF, 23.5MB] — PhD Defence Slides (in French) [PDF 22.5MB]

Abstract

Key Words: Ontolex, Word-sense alignment, Interlingual Acceptions, Multilingual Lexical Resources

When it comes to the construction of multilingual lexico-semantic resources, the first thing that comes to mind is that the resources we want to align should share the same data model and format (representational interoperability). However, with the emergence of standards such as LMF and their implementation and widespread use for the production of resources in the form of lexical linked data (Ontolex), representational interoperability has ceased to be a major challenge for the production of large-scale multilingual resources. However, as far as the interoperability of sense-level multilingual alignments is concerned, a major challenge is the choice of a suitable interlingual pivot. Many resources make the choice of using English senses as the pivot (e.g. BabelNet, Euro-WordNet), although this choice leads to a loss of contrast between English senses that are lexicalized with different words in other languages. The use of acception-based interlingual representations, a solution proposed over 20 years ago, could be viable. However, the manual construction of such language-independent pivot representations is very difficult due to the lack of expert speaking en- ough languages fluently and algorithms for their automatic constructions have never materialized, mainly because of the lack of a formal axiomatic characterization that ensures the preservation of their correctness properties. In this thesis, we address this issue by first formalizing acception- based interlingual pivot architectures through a set of axiomatic constraints and rules that guaran- tee their correctness. Then, we propose algorithms for the initial construction and the update of interlingual acception-based multilingual resources by exploiting the combinatorial properties of pairwise bilingual translation graphs. Secondly, we study the practical considerations of applying our construction algorithms on a tangible resource, DBNary (a lexical linked data resource extracted from Wiktionary).

Jury

  • Roberto Navigli. — Assoc. Prof. Universita Sapienza di Roma – Reviewer
  • Mathieu Lafourcade — MCF, UM – Reviewer
  • Denis Maurel — Pr. Université de Tours – Examiner
  • Nabil Hathout — DR CNRS. CLLE/ERSS Toulouse – Examiner
  • Eric Gaussier — Pr. Université Grenoble Alpes — President

Word Sense Disambiguation

Most of 2012 and 2013, my work has been focused on Word Sense Disambiguation and more recently Multilingual Word Sense Disambiguation in the context of Task 12 of SemEval 2013 using knowledge-based semantic similarity measures combined with stochastic optimisation algorithms, for the most part Ant Colony Optimisation.

Presentations

Here is a video reenactment of my presentation in COLING 2012 for the paper “Ant Colony Algorithm for the Unsupervised Word Sense Disambiguation of Texts:  Comparison and Evaluation”