Within the last decade open data concepts has been gaining increasing interest in the area of drug discovery. With the launch of ChEMBL and PubChem, an enormous amount of bioactivity data was made easily accessible to the public domain. In addition, platforms that semantically integrate those data, such as the Open PHACTS Discovery Platform, permit querying across different domains of open life science data beyond the concept of ligand-target-pharmacology. However, most public databases are compiled from literature sources and are thus heterogeneous in their coverage. In addition, assay descriptions are not uniform and most often lack relevant information in the primary literature and, consequently, in databases. This raises the question how useful large public data sources are for deriving computational models. In this perspective, we highlight selected open-source initiatives and outline the possibilities and also the limitations when exploiting this huge amount of bioactivity data.