Web scraping and data cleaning:
rugby eligibility
Tools
python
beautiful soup
requests
pandas
Approach
A popular rugby podcast requested anyone with a data background to do some analysis around rugby players' eligibility criteria for the English, Irish, Scottish and Welsh international rugby teams.
I placed the countrys' website links into a list and looped a script which would pull out the featured players listed in their current squads on Wikipedia (noting the unreliable source). I also pulled out the href to each of the players personal pages (if it existed) and put them into a python list.
A second function would loop over these player pages extracting further information such as place of birth, which, if different from the country they played for would create an alert and autofill in a new pandas dataframe column to easily indicate which players were born outside of the country they represent.
Analysis could then be done highlighting where players were born, which countries had the most "foreign born" players etc.
