Sunday, June 30, 2019

What Makes a Great Women's Soccer Team? And Why The USWNT is a Statistical Anomaly

The continued success of the US Women's National Team (USWNT), coupled with the recent under-performance of the Men's National Team, is causing American soccer fans both joy and frustration. While our women's team is on a surefire path to another World Cup title, the men's team didn't even qualify for the last competition. This leads us to ask:

Is there a link between the strengths of a nation's male and female soccer teams? 





We can see that some countries such as France, Brazil, and England, for example, have top-ranked national teams for both genders. However, many other countries, especially those further down the rankings, may have a strong Women's team and only an average Men's team. The opposite may also be true for other countries. Here's a nice graph to illustrate:



If you draw a line from Aruba to France, you'll find good examples of the women's and men's rankings being similar for a particular national team. However, countries such as Canada or Uruguay are easily seen as outliers of this pattern.

So, what's causing what here?


Almost every country had a men's soccer team before a women's soccer team. In fact, there are many countries that still don't have a women's team. Therefore, it makes more sense to predict the strength of the women's team based on the men's team of the same country. As we saw in the graph, the relationship is there, but can we add other factors to more accurately predict the women's soccer team ranking?

The factors that might make a national soccer team great (total population, GDP per capita, popularity of the sport) are already captured within in the original metric - the FIFA World Rankings.When thinking of women's sports, however, it's easy to hypothesize that the countries in which gender equality is higher, women's sports is stronger. Also, while it's somewhat of a stereotype that lesbians are over-represented among female athletes, I think it's worth examining, especially after USA star Megan Rapinoe saying "you can't win without gay players." So, we'll investigate if there's a relationship between how LGBT-tolerant a country is to its performance on the women's soccer pitch.

Megan Rapinoe, a star for the USWNT, recently asserted that LGBT athletes contribute heavily to women's sports.

Data measures used for prediction:

2. The UN's Gender Inequality Index (We'll be using the inverse, which we'll call "equality score") - this is calculated from factors such as the proportion of women in government, women's status in the workplace, and reproductive healthcare availability. 
3. An LGBT rights score collected by Human Truth Foundation, a liberal think tank. It is calculated from factors including the legal status of gay marriage, discrimination protections, and levels of anti-gay hate crimes in a country. 
When we compare all of these against the FIFA Women's Ranking, we can see which of the three data measures contribute, and how much, to the projected strength of a women's soccer team.

Measuring accuracy using R^2 score: 

R^2, or R-squared, is a common statistical measure to describe how well a prediction, typically a best-fit line (linear regression) fits the data. Simply put, if two values are correlated, they'll have an R^2 score closer to 1. If they're not correlated, the R^2 score will be closer to 0. If we were trying to predict ice-cream sales as a function of how hot it is outside, our model might look like this


If we get the results on the left, then the correlation is very high, and a future prediction would be accurate. If we get the results on the right, then the correlation is fairly low, and a prediction is not accurate

For reference, here's how our three predictors match up:


Creating a model and evaluating the results

If we run a multiple regression (available on Excel) we can create a formula in the form y = m1x1 + m2x2 + m3x3 + b   to calculate our desired metric: the women's ranking score. If we normalize everything to a common scale (0-100 in this case), we get this equation:

Women's Team Score = 0.46*(Men's Team Score) + 0.22*(Gender Equality Score) + 0.28 (LGBT Rights Score) + 6.5

From the coefficients (0.46, 0.22, 0.28), we see that the Men's Team Score has the biggest impact of the three variables. The 0.46 coefficient can be read as "for one unit increase in the men's team score, the women's team score would be projected to increase by 0.46 points". This higher coefficient does *not* signify the prediction accuracy of the individual variable, only its relative strength in the model. 

The R-squared value for this combined formula is 0.61. By definition, this means that "61% of the result variable is explained by the predictor variables, and the rest due to other factors or natural variability"

I'll save you a fancy chart- using this prediction formula, the top 5 women's team would be:

1. Belgium
2. Netherlands
3. France
4. Denmark
5. Sweden

The United States:  A Statistical Anomaly

The USWNT is dead-set on bringing another World Cup home. They're the #1 ranked team in the world, however the men's team is only ranked #31. And while the US does rank high on LGBT rights - though not nearly as high as the Netherlands or Iceland, for example - the UN's Gender Equality score for our nation is actually surprisingly...average. If we followed our formula above, the USWNT would actually come in at #35 on the current Women's Rankings. Is the American secret to success great coaching? generational talent? or just big ol' dose of FREEDOM? Either way, our ladies in red, white, and blue are performing above mathematical expectations.

Predictions for countries that currently do not have a women's team

So we can estimate a score and test is versus a score that already exists. Neat, huh? But the real usefulness comes for predicting values which are still unknown. Here are five countries that do not have a women's team, or whose team is currently unranked due to inactivity. Using our formula, let's see how they'd rank, if they hypothetically had a team:




Other possible factors

So was our model good? Technically, yes. It's hard to get an R-squared score much higher than 0.6, especially in sociology. Otherwise, we'd already be able to forecast a lot more about social structures and behaviors. In practice, however, it's tempting to see this model as not much more than a rule of thumb. Is your men's team good? Do you care about women's and LGBT empowerment? Well yeehaw partner, you might have a good women's soccer team!

In future studies, there are other factors we could consider to improve our accuracy. For example, in most European and African countries, soccer is the #1 male sport and the #1 female sport in popularity. However, in the US, American football is the #1 male sport (basketball is #2), but soccer is the #1 female sport. The same goes for Australia and New Zealand, where men might play rugby, a sport not as widely participated in by the fair sex. It's no surprise then that the US, Australia, and New Zealand women's have been outperforming their male counterparts. Another potential consideration could be the average salary of male and female soccer players across. Many more different factors could be studied, some with surprising relationships, and some with none at all.

------------

I hope you've enjoyed reading. Go Team USA!





2 comments:

  1. Hi Yoav, just wondering about your Hyprocure and how you are doing now 5 years later? I tried replying to the original post buy you didn't seem to see it.

    Warmest regards,
    Bob

    ReplyDelete
    Replies
    1. Hi Bob - my apologies for not replying. If you still wanted to talk about Hyprocure, email me at mory32@gmail.com

      Delete

Happy New Year and Don't Blow Yourself Up: Exploring National Injury Data with Python

Every year, hundreds of thousands of Americans go to the Emergency Room for various injuries resulting from common consumer products. Do you...