Towards AI

Overview of DBAI@NeurIPS’21

Author(s): Nantia Makrynioti

The Workshop on Databases and AI (DBAI) was successfully held last December in conjunction with the virtual NeurIPS’22 conference. The purpose of DBAI is to aspire a conversation on the power of the relational data structure and relational database systems (RDBMS) when it comes to machine learning (ML) algorithms. Research on the areas of relational learning, relational algebra and probabilistic programming has demonstrated the benefits of exploiting the relational data structure when it comes to integrating domain knowledge, avoiding redundant calculations and managing workflows among others, when it comes to ML tasks. Yet there is still a disconnection between the relational world and the world of machine learning, as this is most noticeably manifested by the amount of time that is wasted in denormalizing the data and moving them outside of the databases in order to train ML models. Furthermore, although the intersection of database systems with ML is a hot area in the data management, it’s probably the first time that relational databases are discussed in a NeurIPS workshop. Hence, another goal of DBAI is to draw attention to the possibilities that a synergy between the two communities can bring forward. This blog post gives an overview of DBAI’22 and highlights the main themes that were discussed in the invited and contributed talks, as well as during the panel discussion.

Overview of DBAI@NeurIPS’21
Source: xkcd.com

DBAI was held in Eastern timezone and had an online attendance of 35 people. Moreover, gatherings of around 15 students and faculty members were organized in four universities. We (the organizers) are very thankful to Snorkel AI and RelationalAI for their generous sponsorship that funded the registrations and lunches for the physical gatherings. The schedule of the workshop aimed for shorter talks, so that a diverse group of speakers with backgrounds from either ML or data management and from both academia and industry could be accommodated. Hence, there were 7 invited and 5 contributed talks, as well as a panel discussion.

The workshop opened its doors with Dan Olteanu’s (University of Zurich) insightful presentation on a first-principles approach that exploits the algebraic and combinatorial structure of relational data processing to improve the runtime performance of machine learning. Then, Paroma Varma (SnorkelAI) shared her state-of-the-art work on programmatically labeling training data, followed by Arun Kumar (UC San Diego) who highlighted how scalability, usability, and manageability concerns across the entire lifecycle of ML/AI applications can be addressed through the lens of database systems. David Chiang (University of Notre Dame) and Eriq Augustine (UC Santa Cruz) shifted the agenda towards more pure ML topics and presented interesting ideas on different notations for weighted or probabilistic relations and on accelerating grounding in statistical relational learning. Finally, Molham Aref (RelationalAI) shared his insights on deep learning on relational data, whereas Olga Papaemmanouil (Brandeis University) presented a promising vision and preliminary results of AI-optimized database components.

Regarding contributed talks, these spanned many interesting topics, covering data programming with knowledge bases, learned indices and buffer managers, as well as relational algebra libraries for data science pipelines and numerical reasoning in relational databases. Here too there was a balanced representation from both industry labs and universities.

Panel Discussion

DBAI concluded with a very interesting panel discussion among Guy Van den Broeck (UCLA), Alexander Ratner (SnorkelAI), Konstantinos Karanasos (Microsoft’s Gray Systems Lab), Molham Aref and Arun Kumar on AI workloads inside databases, moderated by Parisa Kordjamshidi.

Below is a summary of the main points that came out from this discussion:

After two decades of in-RDBMS machine learning research and implementations, database systems have not made a compelling case for data scientists to move their workflows there. A transition phase is currently under way, where the database community with all the experience of the past is looking for crucial features, such as data versioning and data governance, that would make DBMSes attractive to data scientists, and where the definition of in-RDBMS machine learning becomes less rigid with the adoption of data lakes and the interoperability with systems like TensorFlow and open formats like ONNX.

Source: dilbert.com

Concluding Remarks

Overall, we are very happy with the content of the 1st DBAI, as this included insightful presentations and a constructive panel discussion. I’d like to sincerely thank my fellow organizers (Nikolaos Vasilogou, Parisa Kordjamshidi, Maximilian Schleich, Kirk Pruhs and Zenna Tavares), the PC members, the speakers and panelists, the sponsors, the volunteers and last but not least the authors and attendees for contributing each in his/her own way in making DBAI’21 a successful workshop. I really hope we will have the opportunity to organize another DBAI soon.

Exit mobile version