World Cup: Finding the Goal with $match

With the World Cup starting this week there are going to be a lot of matches to watch for the next month or so and there will hopefully be a lot of goals as well!

To add a little MongoDB related fun to the world party we’re going to do a few short posts around historical World Cup data going to back to the last time the Cup was hosted in the USA in 1994. That gives us a lot of goals to work with and some interesting data on the countries, years and players.

The World Cup is full of matches, and so are MongoDB aggregation pipelines.

Who Was Top That Year?

Let’s start off with a simple question:

Which countries scored the most goals in the 2022 World Cup?

To do this we will use an aggregation pipeline with a couple stages to $match to the correct World Cup year, then $group the goals by country and finally $sort by the most goals!

Here is a our query:

db.goals.aggregate([
  { 
    $match: { "tournament.year": 2022 }
  },
  {
    $group: {
      _id: "$scoringTeam.name",
      goals: { $sum: 1 }
    }
  },
  {
    $sort: { goals: -1 }
  }
]);

When a lot of people start learning about aggregation pipelines they think about stages like $group or $lookup or even $setWindowFields but really the most important stage in many pipelines is actually the simplest, $match.

A $match stage is so important because it reduces the number of documents in your pipeline, so each stage has to do less because you are only working with the data you actually need. So makes sure to match often and early!

Here, in our first stage we $match for documents that apply to the World Cup year we want to know about, 2022 and then group by the country and $sum up the goals, and then sort.

The Goal of Good Document Schema

Of course, part of being able to query well later is good document design (or schema) when we start. So let’s make sure we begin with a good formation!

Our goal documents look something like this:

{
  "_id": {
    "$oid": "6a275780ccdf3f3a0fdda55e"
  },
  "goalId": "G-1543",
  "tournament": {
    "id": "WC-1994",
    "year": 1994,
    "name": "1994 FIFA Men's World Cup"
  },
  "match": {
    "id": "M-1994-01",
    "name": "Germany vs Bolivia",
    "date": {
      "$date": "1994-06-17T00:00:00.000Z"
    },
    "stage": "group stage",
    "group": "Group C"
  },
  "scoringTeam": {
    "id": "T-31",
    "name": "Germany",
    "code": "DEU"
  },
  "player": {
    "id": "P-91373",
    "givenName": "Jürgen",
    "familyName": "Klinsmann",
    "displayName": "Jürgen Klinsmann",
    "team": {
      "id": "T-31",
      "name": "Germany",
      "code": "DEU"
    }
  },
  "goal": {
    "minute": {
      "label": "61'",
      "regulation": 61,
      "stoppage": 0
    },
    "period": "second half",
    "ownGoal": false,
    "penalty": false
  },
  "source": {
    "name": "Fjelstul World Cup Database",
    "collection": "goals"
  }
}

Since this is historical data we can quite robust about what we store without ever worrying about needing to update the data so we’ll store everything we’d need together including the data about the goal, player, team and game.

Finding the Goal Fast!

At the World Cup you want your team to be first to ball (and hopefully goal!) and the best way to do this in MongoDB is to make sure we index our data well.

Since we’re frequently matching on tournament.year, we created an index on that field. This will make our pipeline query even faster!

Looking at the document you’ll see the year is inside a subdocument called tournament … don’t worry you can create an index on a field inside a subdocument just like you would on a normal field in your document, just use dot notation.

db.goals.createIndex({
  "tournament.year": 1
});

And the Winner Is …

Now for what you are probably looking for, who was tops? Here are the top 5:

[
{
  "_id": "France", "goals": 16
},
{
  "_id": "Argentina", "goals": 15
},
{
  "_id": "England", "goals": 13
},
{
  "_id": "Portugal", "goals": 12
},
{
  "_id": "Netherlands", "goals": 10
}
]

So unsurprisingly France and Argentina toped the list!

But funny enough, the winner didn’t finish first in the goals race … so just shows you its not always about the goals you score, but also the goals you don’t let in … the less that go in the better chance you have!

Just like using $match to reduce the data in your pipeline!

Extra Credit

You can find the source of the goal data as jfjelstul/worldcup

Or, download my whole MongoDB goals collection:

What’s Up Next?

Next time we’ll use the same dataset to see which players have scored the most World Cup goals since 1994.

, ,

MongoDB for Jobseekers Book

If you’re fascinated by the intricacies of MongoDB and yearn to explore its limitless possibilities further, I invite you to delve into my comprehensive book, “MongoDB for Jobseekers.”

This book is your passport to unlocking MongoDB’s full potential, whether you’re a beginner or an experienced enthusiast. Seamlessly navigate the intricacies of data management, uncover advanced techniques, and gain insights that will elevate your proficiency.

Available on Amazon and other leading platforms, “MongoDB for Jobseekers” offers a comprehensive roadmap for honing your MongoDB skills. As you embark on your journey through the world of databases, this book will be your trusted companion, guiding you towards mastery and helping you stand out in the competitive landscape.