David S. Batista, PhD

David S. Batista, PhD

Metropolregion Berlin/Brandenburg
3943 Follower:innen 500+ Kontakte

Info

I’m an experienced machine learning engineer and software developer, with a strong…

Aktivitäten

Anmelden, um alle Aktivitäten zu sehen

Berufserfahrung

  • deepset Grafik

    deepset

    Berlin, Germany

  • -

    Berlin, Germany

  • -

    Berlin, Germany

  • -

    Berlin, Germany

  • -

    Berlin, Germany

  • -

    Remote

  • -

    Berlin Area, Germany

  • -

    Lisbon Area, Portugal

  • -

    Lisbon Area, Portugal

  • -

    Lisbon Area, Portugal

  • -

    Lisbon, Portugal

  • -

    Lisbon, Portugal

Ausbildung

  • Instituto Superior Técnico Grafik

    Instituto Superior Técnico

    -

    Thesis: “Large-Scale Semantic Relationship Extraction”

    • I researched different methods to perform semantic relationship extraction between named‑entities and proposed a classifier based on the idea of nearest neighbour and leveraging min‑hash and locality sensitive hashing for efficient similarity search.

    • To obtain training data, for the classifier, I proposed a bootstrapping technique relying on distributional word representations which was awarded an Honorable Mention for…

    Thesis: “Large-Scale Semantic Relationship Extraction”

    • I researched different methods to perform semantic relationship extraction between named‑entities and proposed a classifier based on the idea of nearest neighbour and leveraging min‑hash and locality sensitive hashing for efficient similarity search.

    • To obtain training data, for the classifier, I proposed a bootstrapping technique relying on distributional word representations which was awarded an Honorable Mention for Best Short Paper at EMNLP’15.

  • -

    Thesis: “Geographic Text Mining”

    •Developed an information extraction system based on Conditional Random Fields and an ontology, to generate geographic summaries. The summary lists all the geographic entities found in a document mapped to unique concepts in the geographic ontology.

    • The system was applied to a crawl of the Portuguese Web (25GB raw text) using a Hadoop cluster, generating summaries for hundreds of thousands of documents.

  • -

    Exchange student for two semesters at the University of Karlsruhe, Germany.

  • -

Bescheinigungen und Zertifikate

Veröffentlichungen

Projekte

  • Politiquices.PT - https://www.politiquices.pt

    -

    • A semantic graph connecting politicians through support/opposition relationships
    • Archived news articles support the graph relationships.
    • Awarded 2nd place on the ”Arquivo.pt Awards 2021” organised by the Portuguese Web Archive.
    • Gain the interest of journalists, political scientists and social humanities researchers.

    Technical description:
    • Data: news headlines from almost 25 years of Portuguese archived newspapers.
    • Developed supervised models to detect…

    • A semantic graph connecting politicians through support/opposition relationships
    • Archived news articles support the graph relationships.
    • Awarded 2nd place on the ”Arquivo.pt Awards 2021” organised by the Portuguese Web Archive.
    • Gain the interest of journalists, political scientists and social humanities researchers.

    Technical description:
    • Data: news headlines from almost 25 years of Portuguese archived newspapers.
    • Developed supervised models to detect relationships between politicians.
    • Entity Linking between politicians mentioned in the headlines and Wikidata.
    • Semantic graph connecting politicians through relationships supported by news articles.
    • The graph is indexed in a SPARQL engine and published through a web interface

  • REACTION (Retrieval, Extraction and Aggregation Computing Technology for Integrating and Organizing News)

    -

    • I took part in the REACTION (Retrieval, Extraction and Aggregation Computing Technology for Integrating and Organizing News) an initiative for developing a computational journalism platform (mostly) for Portuguese.

    • The project developed information extraction, social media mining and information visualisation technologies for assisting journalists in the production of news stories.

    Andere Mitarbeiter:innen
    Projekt anzeigen
  • GREASE-II - Geographic Reasoning for Search Engines

    -

    • I took part in the GREASE-II which researched information access methods to large collections of documents and objects having geographically rich text and meta-data, with emphasis on the web.

    • The geographic content of a document is characterized by geographic signatures, a set of automatically extracted geographic tags, mapped directly into ontologic geographic concepts.

    • The geographic signatures were evaluated in multiple scenarios, such as improving geographic retrieval…

    • I took part in the GREASE-II which researched information access methods to large collections of documents and objects having geographically rich text and meta-data, with emphasis on the web.

    • The geographic content of a document is characterized by geographic signatures, a set of automatically extracted geographic tags, mapped directly into ontologic geographic concepts.

    • The geographic signatures were evaluated in multiple scenarios, such as improving geographic retrieval methods and faceted interfaces for text and image retrieval applications.

    Andere Mitarbeiter:innen
    Projekt anzeigen

Auszeichnungen/Preise

  • Arquivo.pt Award 2021 - 2nd place

    Arquivo.PT

    2nd prize from Arquivo.pt Awards with the www.politiquices.pt project

    The Arquivo.pt Award aims to annually promote innovative works based on the historical information preserved by Arquivo.pt. The works must be practical applications or studies based on the information accessible through Arquivo.pt, which demonstrate the usefulness of this public service and the importance of preserving the information published online.

  • Honorable Mention for Best Short Paper

    Program Committee Co-Chairs EMNLP'15

    "Semi-Supervised Bootstrapping of Relationship Extractors with Distributional Semantics" by David S. Batista, Bruno Martins and Mário J. Silva

  • PhD Fellowship

    Portuguese Science Foundation (FCT)

    Fellowship granted for the development of PhD research for 4 years (2011-2015)

Sprachen

  • Portuguese

    Muttersprache oder zweisprachig

  • Englisch

    Verhandlungssicher

  • German

    Gute Kenntnisse

Erhaltene Empfehlungen

Weitere Aktivitäten von David S. Batista, PhD

David S. Batista, PhDs vollständiges Profil ansehen

  • Herausfinden, welche gemeinsamen Kontakte Sie haben
  • Sich vorstellen lassen
  • David S. Batista, PhD direkt kontaktieren
Mitglied werden. um das vollständige Profil zu sehen

Weitere ähnliche Profile

Entwickeln Sie mit diesen Kursen neue Kenntnisse und Fähigkeiten