ChatGPT Combined with Graph Database to Predict a FIFA 2022 Winner but Went Wrong

Last Updated on July 24, 2023 by Editorial Team

Author(s): NebulaGraph Database

Originally published on Towards AI.

*Feature Image Credit: The image was also generated with OpenAI, through the DALL-E 2 model & DALL-E 2 Outpainting. See the* *original image*.

The Hype of ChatGPT

In the hype for FIFA 2022 World Cup, I saw a blog post from Cambridge Intelligence, in which they leveraged limited information and correlations among players, teams, and clubs to predict the final winning team. As a graph technology enthusiast, I always would like to try similar things with NebulaGraph to share the ideas of graph algorithms to extract hidden information from the overall connections in a graph in the community.

The initial attempt was to complete it in about 2 hours, but I noticed the dataset needed to be parsed carefully from Wikipedia, and I happened to be not good at doing this job, so I put the idea on hold for a couple of days.

In the meantime, another craze, the OpenAI ChatGPT, was announced. As I had been a user of DALL-E 2 already(to generate feature images of my blog posts), I also quickly gave it a try. And I witnessed how other guys(via Twitter, blogs, HackerNews) tried to convince ChatGPT to do so many things that are hard to believe they could do:

Help to implement a piece of code at any time
Simulate any prompt interface: shell, python, virtual machine, or even a language you create
Act out almost any given persona, and chat with you
Write poetry, rap, prose
Find a bug in a piece of code
Explain the meaning of a complex regular expression/Open Cypher Query

ChatGPT’s ability to contextualize and understand has never been greater before, so much so that everyone is talking about a new way of working: how to master asking/convincing/triggering machines to help us do our jobs better and faster.

So, after trying to get ChatGPT to help me write complex graph database query statements, explain the meaning of complex graph query statements, and a large chunk of Bison code, which he/she had done them WELL, I realized: why not let ChatGPT write the code that extracts the data for me?

Grabbing data

I really tried it and the result is… good enough.

The whole process was basically like a coding interviewer, or a product manager, presenting my requirements, then ChatGPT giving me the code implementation. I then try to run the code, find the things that don’t make sense in the code, point them out, and give suggestions, and ChatGPT quickly understands the points I point out and makes the appropriate corrections, like:

I won’t list this whole process, but I will share the generated code and the whole discussion here.

The final generated data is a CSV file.

Raw version world_cup_squads.csv
Manually modified, separated columns for birthday and age world_cup_squads_v0.csv
It contains information/columns of a team, group, number, position, player name, birthday, age, number of international matches played, number of goals scored, and club served.

Team,Group,No.,Pos.,Player,DOB,Age,Caps,Goals,Club
Ecuador,A,1,1GK,Hernán Galíndez,(1987-03-30)30 March 1987,35,12,0,Aucas
Ecuador,A,2,2DF,Félix Torres,(1997-01-11)11 January 1997,25,17,2,Santos Laguna
Ecuador,A,3,2DF,Piero Hincapié,(2002-01-09)9 January 2002,20,21,1,Bayer Leverkusen
Ecuador,A,4,2DF,Robert Arboleda,(1991-10-22)22 October 1991,31,33,2,São Paulo
Ecuador,A,5,3MF,José Cifuentes,(1999-03-12)12 March 1999,23,11,0,Los Angeles FC

Final version with header removed world_cup_squads_no_headers.csv

Graph algorithm to predict the FIFA World Cup 2022

With the help of ChatGPT, I could finally try to predict the winner of the game with Graph Magic, before that, I need to map the data into the graph view.

If you don’t care about the process, go to the predicted result directly.

Graph modeling

Prerequisites: This article uses NebulaGraph(Open-Source) and NebulaGraph Explorer(Proprietary), which you can request a free trial of on AWS.

Graph Modeling is the abstraction and representation of real-world information in the form of a “vertex-> edge” graph. In our case, we will project the information parsed from Wikipedia as:

Vertices:

player
team
group
club

Edges:

groupedin (the team belongs to which group)
belongto (players belong to the national team)
serve (players serve in the club)

The age of the players, the number of international caps, and the number of goals scored are naturally fit as properties for the player tag(type of vertex).

The following is a screenshot of this schema in NebulaGraph Explorer (I will just call it Explorer later).

Then, we can click the save icon in the upper right corner and the button: Apply to Space to create a graph space with the defined schema:

Ingesting into NebulaGraph

With the graph modeling, we can upload the CSV file (the no-header version) into Explorer, by pointing and selecting the vid and properties that map the different columns to the vertices and edges.

Click Import, we then import the whole graph to NebulaGraph, and after it succeeded, we could also get the entire CSV → Nebula Importer configuration file: nebula_importer_config_fifa.yml, so that you reuse it in the future whenever to re-import the same data or share it with others.

Note: Refer to the Import Data Document

After importing, we can view the statistics on the schema view page, showing us that 831 players participated in the 2022 Qatar World Cup, serving in 295 different clubs.

Explore the graph

Let’s see what insights we could get from the information/ knowledge in the form of a graph.

Querying the data

Let’s start by showing all the data and see what we will get.

First, with the help of NebulaGraph Explorer, I simply did drag and drop to draw any vertex type (TAG) and any type of edge between vertex types (TAG), here we know that all the vertices are connected with others, so no isolated vertices will be missed by this query pattern:

Let it generate the query statement for me. Here, it defaults to LIMIT 100, so let's change it to something larger (LIMIT 10000) and let it execute in the Console.

Initial observation

The result renders out like this, and you can see that it naturally forms a pattern of clusters.

These peripheral clusters are mostly made up of players from clubs that are not traditionally strong ones (now we learned that they could win, though, who knows!). Many of those clubs have only one or two players who serve in one national team or region, so they are somewhat isolated from other clusters.

Graph algorithm-based analysis

After I clicked on the two buttons(Sized by Degrees, Colored by Louvain Algorithm) in Explorer (refer to the document for details), in the browser, we can see that the entire graph has become something like this:

Two graph algorithms are utilized to analyze the insights here.

change the display size of vertices to highlight importance using their degrees
using Louvain’s algorithm to distinguish the community of the vertices

You can see that the big red circle is the famous Barcelona, and its players are marked in red, too.

Winner Prediction Algorithm

In order to be able to make full use of the graph magic(with the implied conditions and information on the graph), my idea(stolen/inspired from this post) is to choose a graph algorithm that considers edges for node importance analysis, to find out the vertices that have higher importance, iterate and rank them globally, and thus get the top team rankings.

These methods actually reflect the fact that excellent players have greater community and connectivity at the same time. Meanwhile, to increase the differentiation between traditionally strong teams, I am going to take into account the information on appearances and goals scored.

Ultimately, my algorithm is:

Take all the (player)-serve->(club) relationships and filter them for players with too few goals and too few goals per game (to balance out the disproportionate impact of older players from some weaker teams)
Explore outwards from all filtered players to get national teams
Run the Betweenness Centrality algorithm on the above subgraph to calculate the node importance scores

Process of the Prediction

First, we take out the subgraph in the pattern of (player)-serve->(club) for those who have scored more than ten goals and have an average of more than 0.2 goals per game.

MATCH ()-[e]->()
WITH e LIMIT 10000
WITH e AS e WHERE e.goals > 10 AND toFloat(e.goals)/e.caps > 0.2
RETURN e

Note: For convenience, I have included the number of goals and caps as properties in the serve edge, too.

Then, we select all the vertices on the graph in the left toolbar, select the belongto edge of the outgoing direction, expand the graph outwards (traverse), and select the icon that marks the newly expanded vertices as flags.

Now that we have the final subgraph, we use the graph algorithm function within the browser to execute BNC (Betweenness Centrality):

The graph canvas then looks like this:

Result

Ultimately, we sorted according to the value of Betweenness Centrality to get the final winning team: Brazil! U+1F1E7U+1F1F7, followed by Belgium, Germany, England, France, and Argentina.

It turned out that the result was way too wrong…but worth a try, isn’t it? I hope you enjoy this small experiment!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

I Used ChatGPT to Count My Calories

Resource-Efficient Fine-Tuning of DeepSeek-R1

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?

Text Preprocessing for NLP: A Step-by-Step Guide to Clean Raw Text Data

DeepSeek AI — The Future is Here

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.