2019: Week 28 Solution

You can view our full solution workflow below and download it here!

Our full solution workflow.

I’ve seen a number of solutions shared with us via Twitter, some more streamlined than this so for alternate ways of achieving this outcome just search Twitter for the #PreppinData hashtag! In particular, the solution by @ArseneXie streamlines the pivoting steps and the self-join/aggregation step. For this write-up we’ll discuss our original solution as a base point.

1. Use Data Interpreter to fix up field names.

As our input data is poorly formatted, by ticking ‘Use Data Interpreter’ we can get our first and second row headers merged together for something meaningful instead of ‘F6’, ‘F7’, ‘F8’, and so on.

2. Use a ‘Wildcard Union’ to pull the date from the sheet name.

Part of the challenge includes getting an accurately formatted date & time field. However, the data itself is contained only in the sheet name. To get around this we can change our input options on the Input tool to ‘Wildcard Union’. If we then change the Sheets ‘Matching Pattern’ to “*2019” we can import just the sheet containing our data. This provided two new fields for us to use: [File Paths] & [Table Names].

We don’t need the file path so we can untick that field in the input tool, but the [Table Names] field means we now have access to the sheet name, i.e. we can now get hold of our date without having to manually type it into a calculation. This is handy incase we have multiple sheet, each with their own unique date.

Data Interpreter & Wildcard Union settings.


3. Use 3 Column-to-Row pivots in a row: Interaction, Proximity, Task.

In each of these pivots there should be 3 changes after the pivot:

a. Exclude ‘null’ from [Pivot Values];
b. Rename [Pivot1 Names] to [Interaction]/[Manager Proximity]/[Task Engagement];
c. Delete the [Pivot1 Values] field.

After performing each pivot and the above changes you should be left with 90 rows. You could do this stage with 3 calculation fields of the form “If [field1] = ‘X’ then ‘Field 1’, elseif [Field2] = ‘X’ then ‘Field 2’ “ and so on. Pivoting feels like less manual work.

Pivot, PIVOT, PIVOTTT


4. Prepare the Date & Time field

Before calculating the precise start time for each observation interval, we need to get the overall observation start time for each employee. To do this, we can firstly remove all letters from [Table Names] using the ‘Remove Letters’ clean option and then convert the field to a Date type, renaming it as [Date] afterwards.

From here we can update the [Observation Start Time] to include the actual date using the following calculation:

[Observation Start Time]

DATEADD(                                     // Add…
    'day'                                              // in days…
    , DATEDIFF(                               // the difference…
        'day'                                          // in days…
        , [Observation Start Time]       // between the current date…
        , [Date]                                     // and the actual date…
     )
    , [Observation Start Time]           // to the current date.
)



This tops up the [Observation Start Time] with all the missing days as it had defaulted to 30/12/1899.

5. Calculate the Running Total of observation intervals.

We can now use the running total aggregation trick to calculate how many minutes into each employee observation did each observation interval actually occur:

1. Left Outer Join on:
       • [Employee] = [Employee]
       • [Observation Interval] > [Observation Interval].
2. Aggregate the results using:
       • GROUP BY [Employee]
       • GROUP BY [Observation Interval]
       • SUM [Observation Length (Mins)] : Rename this to [Mins since observation start time]

6. Generate final observation interval date & time.

Finally, we now need to add these minute values to their corresponding observation interval start time to get the actual start time for each row. After using an Inner Join on Employee & Observation Interval we can use the following calculation to once again update the [Observation Start Time] field:

[Observation Start Time]

DATEADD(                                                     //Add…
    'minute'                                                        //in minutes…
    ,ZN([Mins since observation start time])    //the mins into each obs. start time (zero if null)…
    ,[Observation Start Time]                           //to the current obs. start time.
)



Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text