2024: Week 42 - Solution



Solution by Tom Prowse and you can download the workflow here.



Step 1 - Year, Week, & Theme

After inputting the data source we can start to tidy up the year and weeks for each of the show. To calculate each year we can use the requirements as a guide to when the show started: 

Year
IF [Series]<=2
THEN 2004 
ELSE 2002+[Series]
END

Next we can extract the week information to include the number and theme. For this we want to split out the information by using an automatic split which will remove the word 'week. 

Then we can use that split field to extract the week number and theme using a split on the ':' - an automatic split works here as well. 

Finally we want to tidy up the weeks where there is semi-final or final in the brackets. This time we need to use a custom split on the ' ' which will separate the numbers and text and then remove the punctuation from the text field.

After we have parsed the final & semi final text we can merge this field with the original theme field that we split. Then we can rename the fields to Week & Theme and remove any others that we no longer require. 

The final part is to tidy up the quarter finals, semi finals, and finals text. For this we can create a stage field: 

Stage 
IF CONTAINS([Theme],'Quarter')
THEN 'Quarter Final'
ELSEIF CONTAINS([Theme],'Semi')
THEN 'Semi Final'
ELSEIF [Theme]='Final'
THEN 'Final'
END

We can then extract the Quarter Final string within the brackets by using another split on the '('. Then we can remove the original Theme field, rename the Split to Theme and do some final cleaning: 

Theme 
IF CONTAINS(lower([Theme]),'final')
THEN NULL
ELSE [Theme]
END

We can then remove any punctuation and replace the Night text with a blank: 

Theme
REPLACE([Theme],' Night','')

Finally, we can replace the blank values with Null and group together Movie and Hollywood values.

Now we have the Year, Weeks, and Themes tidied up we can remove any of the table headers from the view by excluding all the 'Couple' values from the Couple field.

At this stage our table should look like this: 




Step 2 - Couples & Scores

Next we can focus on cleaning the couples & scores field. First we can remove all numbers from the Couple field using the in-built clean functionality then we need to split the score values based on the '(' separator.

The scores field contains the individual scores from the judges so we want to split these so we have a total and judges score field. After renaming the split fields we can remove the ')' within the judges field: 

Judges Scores 
REPLACE([Judges Scores],')','')

Then replace the blanks with a null value.

Next we want to deal with the group dances that contain multiple couples:

Couple
IF CONTAINS([Dance],'Group') OR ([Dance],'Marathon')
THEN 'Group'
ELSE [Couple]
END


And then update the scores if it was a group dance: 

Total Score
IF [Couple]='Group'
THEN NULL
ELSE [Total Score]
END

Finally we can use an aggregation to ensure that there is only 1 row for each combination. For this we just want to bring all fields in the group by section of the aggregation step: 




Step 3 - Music

Next we can focus on the music fields and ensuring they are clean and in the correct format. First we want to replace all of the '&' with a '|': 

Music
REPLACE()(
REPLACE()(
    REPLACE(
        [Music],',"','|'
        )
        ,'&"','|'
    )
    ,'& "','|'
)


Then we can use the pipe ('|') to split all fields and then remove the original Music field. We should now have 4 different fields split off that contain music. 

We can then bring this together into the same field by using a columns to rows pivot and the split fields: 



Then we can tidy the field by removing the Pivot Names field, excluding any '' (blank) values, and splitting on — to separate the song and artist. After renaming the split fields to Song & Artist we can do some final tidying on the Song field by removing any ": 

Song 
REPLACE([Song],'"','')

The final step is to create a Theme Detail field. For this we can bring together the Film, Broadway musical, Musical, Country, and Celebrating BBC fields by merging them into a single field and then renaming.

After this we are ready to output the table: 



You can view the output here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank