,

Interpreting Formats in a Field When Several Are Possible

We look at how to use the Alt() function for efficient data type interpretation in an ambiguous field.

By.

min read

Arc Analytics – Varying Formats – Thumbail

One of the toughest aspects of dealing with freeform data is that the input layer may not have proper data validation processes to ensure data cleanliness. This can result in very ugly records, including non-text fields that are riddled with incorrectly formatted values.

Take this example dataset:

[Test Data] table
RecordIDDurationField
100:24:00
200:22:56
300:54
40:30
501
64
72:44
85 MINUTES
96/19

Those values in the [DurationField] column are all different! How would we be able to consistently interpret this field as having a Interval data type?

One of the ways you might be inclined to handle something like this is to use If() statements. Let’s see an example of that now.

[New Data]:
Load
    [DurationField]
  , Interval( If(IsNum( Interval#([DurationField], 'hh:mm:ss') )
      , Interval#([DurationField], 'hh:mm:ss')
      , If(IsNum( Interval#([DurationField], 'mm:ss') )
          , Interval#([DurationField], 'mm:ss')
          , If(IsNum( Interval#([DurationField], 'mm') )
              , Interval#([DurationField], 'mm')
              , If(IsNum( Interval#(Left([DurationField], Index([DurationField], 'MIN')-2), 'm') )
                  , Interval#(Left([DurationField], Index([DurationField], 'MIN')-2), 'm')
                  , If(IsNum( Interval#(Replace([DurationField], '/', ':'), 'm:s') )
                      , Interval#(Replace([DurationField], '/', ':'), 'm:s')
                      , Null()
      ))))), 'hh:mm:ss')  as [New DurationField]
;
Load
    Upper(Trim([DurationField])) as [DurationField]
Resident [Test Data];

It’s a mess! Qlik has to evaluate each Interval#() function twice in order to, first, check to see if the value was properly interpreted as a duration (“interval”) value, and then, second, to actually return the interpreted duration value itself.

One of the nice alternative ways of handling this is to use a different conditional function, like Alt(). This function achieves the same thing as using the If() and IsNum() functions in conjunction. You can use:

Alt(arg1, arg2, arg3)

…Instead of:

If(IsNum(arg1), arg1, If(IsNum(arg2), arg2, If(IsNum(arg3, arg3))))

Let’s see how that may look using our previous example data:

[New Data]:
Load
    [DurationField]
  , Interval(Alt(
        Interval#([DurationField], 'hh:mm:ss')
      , Interval#([DurationField], 'mm:ss')
      , Interval#([DurationField], 'mm')
      , Interval#(Left([DurationField], Index([DurationField], 'MIN')-2), 'm')
      , Interval#(Replace([DurationField], '/', ':'), 'm:s')
      , Null()
    ), 'hh:mm:ss') as [New DurationField]
;
Load
    Upper(Trim([DurationField])) as [DurationField]
Resident [Test Data];

Basically what’s happening there is:

  • The preceding load happening at the bottom of that script is there to do some basic standardization of the [DurationField] field so that it’s easier to pattern-match.
  • In the rest of the script, we’re using the Alt() function (Qlik Help page) to check whether its arguments are numeric type of not. Each of its arguments are Interval#() functions, which are trying to interpret the values of the [DurationField] field as the provided format, like 'hh:mm:ss' or 'm:s'.
  • So it’s basically saying:

If Interval#([DurationField], 'hh:mm:ss') returns a value interpreted as an Interval, then return that value (for example, 00:24:00). But if a value couldn’t be interpreted as an Interval (like 5 mins for example, where the Interval#() function would return a text type), we go to the next Interval#() function. If Interval#([DurationField], 'mm:ss') returns a value…

This should all result in a table that looks like this:

Success! 🎉