Rugby Red and Yellow Cards
Databasing and visualisation
Tools
Python
Requests
LXML
BS4
Power BI
Approach
I used a series of functions to pull specific pieces of data from HTML from approximately 32,000 individual webpages.
This data was then written to CSV which was the most lightweight way to store this non-relational data.
I then cleaned this data using python scripts to remove useless whitespace, html code and any odd characters. I made sure that data values were consistent and de-duped.
Once the data was in a usable format I then pulled it into Power BI to visualise a number of factors and to answer a number of core questions:
Has the number of cards given to players risen exponentially?
Does this result correlate to number of games played?
Do the statistics show any change after recent World Rugby rule changes?
