One of the toughest aspects of dealing with freeform data is that the input layer may not have proper data validation processes to ensure data cleanliness. This can result in very ugly records, including non-text fields that are riddled with incorrectly formatted values.
Take this example dataset:
[Test Data] table | |
RecordID | DurationField |
1 | 00:24:00 |
2 | 00:22:56 |
3 | 00:54 |
4 | 0:30 |
5 | 01 |
6 | 4 |
7 | 2:44 |
8 | 5 MINUTES |
9 | 6/19 |
Those values in the [DurationField]
column are all different! How would we be able to consistently interpret this field as having a Interval
data type?
One of the ways you might be inclined to handle something like this is to use If()
statements. Let’s see an example of that now.
[New Data]:
Load
[DurationField]
, Interval( If(IsNum( Interval#([DurationField], 'hh:mm:ss') )
, Interval#([DurationField], 'hh:mm:ss')
, If(IsNum( Interval#([DurationField], 'mm:ss') )
, Interval#([DurationField], 'mm:ss')
, If(IsNum( Interval#([DurationField], 'mm') )
, Interval#([DurationField], 'mm')
, If(IsNum( Interval#(Left([DurationField], Index([DurationField], 'MIN')-2), 'm') )
, Interval#(Left([DurationField], Index([DurationField], 'MIN')-2), 'm')
, If(IsNum( Interval#(Replace([DurationField], '/', ':'), 'm:s') )
, Interval#(Replace([DurationField], '/', ':'), 'm:s')
, Null()
))))), 'hh:mm:ss') as [New DurationField]
;
Load
Upper(Trim([DurationField])) as [DurationField]
Resident [Test Data];
It’s a mess! Qlik has to evaluate each Interval#()
function twice in order to, first, check to see if the value was properly interpreted as a duration (“interval”) value, and then, second, to actually return the interpreted duration value itself.
One of the nice alternative ways of handling this is to use a different conditional function, like Alt()
. This function achieves the same thing as using the If()
and IsNum()
functions in conjunction. You can use:
Alt(arg1, arg2, arg3)
…Instead of:
If(IsNum(arg1), arg1, If(IsNum(arg2), arg2, If(IsNum(arg3, arg3))))
Let’s see how that may look using our previous example data:
[New Data]:
Load
[DurationField]
, Interval(Alt(
Interval#([DurationField], 'hh:mm:ss')
, Interval#([DurationField], 'mm:ss')
, Interval#([DurationField], 'mm')
, Interval#(Left([DurationField], Index([DurationField], 'MIN')-2), 'm')
, Interval#(Replace([DurationField], '/', ':'), 'm:s')
, Null()
), 'hh:mm:ss') as [New DurationField]
;
Load
Upper(Trim([DurationField])) as [DurationField]
Resident [Test Data];
Basically what’s happening there is:
- The preceding load happening at the bottom of that script is there to do some basic standardization of the
[DurationField]
field so that it’s easier to pattern-match. - In the rest of the script, we’re using the
Alt()
function (Qlik Help page) to check whether its arguments are numeric type of not. Each of its arguments areInterval#()
functions, which are trying to interpret the values of the
field as the provided format, like[DurationField]
'hh:mm:ss'
or'm:s'
. - So it’s basically saying:
If
Interval#(
returns a value interpreted as an Interval, then return that value (for example,[DurationField]
, 'hh:mm:ss')00:24:00
). But if a value couldn’t be interpreted as an Interval (like5 mins
for example, where theInterval#()
function would return a text type), we go to the nextInterval#()
function. IfInterval#(
returns a value…[DurationField]
, 'mm:ss')
This should all result in a table that looks like this:
Success! 🎉