Data Category as Attribute / Data Category as Tag

In the previous version of ISO 30042, there was only one “style” of TBX (though it was not called such at the time): Data Category as Attribute (DCA). The history of DCA goes back to the origins of TBX and therefore, DCA is what anyone familiar with older versions of TBX will most readily recognize as TBX. In 2013, work began on a simplified version of TBX which could serve as a bridge format between TBX and Universal Terminology eXchange (UTX) or other spreadsheet-type terminology formats. This simplified version of TBX was an earlier incarnation of TBX-Min. In an attempt to translate into XML the ease with which a human can interact with a spreadsheet, the traditional TBX style of DCA was simplified to a more “modern” Data Category as Tag (DCT) style, using data categories as the generic identifier, or “tag”, of elements in TBX rather than as attributes of meta elements. In the new version of ISO 30042, DCT is no longer restricted to only TBX-Min. Instead, any TBX dialect may be represented in either DCA or DCT. TBX dialects may even be converted from one style to another because the two are isomorphic and conversion will not result in data loss.

The Difference:

The most recognizable difference between DCA and DCT is that which is reflected in their names: the position of the name of the data category.

DCA

<descrip type="subjectField">finance</descrip>

DCT

<subjectField>finance</subjectField>

In the newest version of ISO 30042, DCT has been extended further to make use of XML namespaces. In DCA style, it is possible to edit the TBX Core module to include a simple list of permitted @type attribute values for the various meta type elements (also known as classification elements) which correspond with the included data categories of a dialect. However, in DCT, new element definitions would be needed for each data category included in a dialect. Since this practice would destabalise the structure of the Core module, it was decided the best way to handle data categories in DCT would be to import them via the namespace of their parent modules. Therefore, each DCT element must be introduced via another namespace to be considered valid.

The following example shows the data category /subjectField/ (which is a member of the Min module) being introduced via namespace declaration:

<subjectField xmlns="https://www.tbxinfo.net/ns/min">finance</subjectField>

However, while the above method of declaring the module namespace is possible, it is recommended practice in TBX to declare the module namespace in the root element and associate it with a namespace prefix:

<tbx type="TBX-Min" style="dct" xmlns="urn:iso:std:iso:30042:ed-2" xmlns:min="https://www.tbxinfo.net/ns/min"> ... <min:subjectField>finance</min:subjectField> ... <tbx>

Validating DCT

Due to the use of namespaces in DCT, it is recommended to use the Namespace-based Validation Dispatching Language (NVDL) to tie together the various schemas needed to validate a dialect document instance in DCT style. Each module has its own schema which states its namespace and its permitted elements (which are the data categories) as well as permitted values for said elements.

An example DCT style TBX-Min file is available here as a part of the TBX-Min dialect definition repository: https://raw.githubusercontent.com/LTAC-Global/TBX-Min_dialect/master/DCT/Example_Astronomy_DCT_VALID.tbx

Back to Home

Last updated: November 28, 2018 at 11:15 am

© 2018 LTAC Global, see About Us page for details on Licensing