Validating a TBX File
This guide aims to walk someone through the process of identifying and validating a TBX file. The validation software described in this guide is Oxygen XML, although any software which can validate RNG and SCH schemas will work.
Step 1: Identifying a TBX File
Identifying Manually
The easiest way to test if your file is TBX is to look at its file extension. TBX files usually have a file extension of “.tbx”. However, this is not always a reliable method as the file may be named incorrectly. Alternatively, you can choose to open the file in a text editor. TBX v2 (2008) files will always have a “root” element (the highest level element in angle brackets) of <martif>
and TBX v3 (2019) files will have a root element of <tbx>
Example TBX v2 root element (in red):
<?xml version='1.0'?>
<!DOCTYPE martif SYSTEM "TBXBasiccoreStructV02.dtd">
<!-- Created by the CRITI terminology management system. -->
<martif type="TBX-Basic" xml:lang="en-US">
Example TBX v3 root element (in red):
<?xml version="1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBXcoreStructV03_TBX-Basic_integrated.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBX-Basic_DCA.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<tbx style="dca" type="TBX-Basic" xml:lang="en-US" xmlns="urn:iso:std:iso:30042:ed-2">
Identifying Using the TBX Spyglass Tool (Recommended)
Upload your file to the TBX Spyglass to see if it is a TBX file.
Step 2: Identifying a TBX Dialect
Now that you know that your file is indeed a TBX file, knowing the dialect will allow you to know what exact data categories you can expect to find in it.
Identify Dialect Manually
Open your TBX file in a text editor and look for the value of the @type
attribute on the root element. In TBX v2 files, this is often (unfortunately) simply “TBX”, which is not particularly helpful. However, starting in TBX v3, the value must be a specific dialect name which can then be used to learn exactly what data categories the file will contain.
Example TBX v2 dialect (in red):
<?xml version='1.0'?>
<!DOCTYPE martif SYSTEM "TBXBasiccoreStructV02.dtd">
<!-- Created by the CRITI terminology management system. -->
<martif type="TBX-Basic" xml:lang="en-US">
Example TBX v3 dialect (in red):
<?xml version="1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBXcoreStructV03_TBX-Basic_integrated.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBX-Basic_DCA.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<tbx style="dca" type="TBX-Basic" xml:lang="en-US" xmlns="urn:iso:std:iso:30042:ed-2">
Identify Dialect Using TBX Spyglass
Upload your file to the TBX Spyglass to not only see if the file is TBX, but also see the dialect it claims to be (if it even has a dialect).
Step 3a: Validate TBX v2 File
Use the TBX Checker to validate a TBX v2 File.
Step 3b: Validate TBX v3 File
Use the Correct Schemas
The schemas used depend on the stated dialect of your TBX file. In some cases, the schemas will already be stated in the top of your TBX file:
<?xml version="1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBXcoreStructV03_TBX-Basic_integrated.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBX-Basic_DCA.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<tbx style="dca" type="TBX-Basic" xml:lang="en-US" xmlns="urn:iso:std:iso:30042:ed-2">
In other cases, you will need to add these statements manually. To do this you will need to:
- Identify the schema locations for the TBX dialect you have.
- Insert the schema locations into your TBX file.
Identifying Schema Locations for a TBX Dialect
Once you know the name of the dialect of your TBX file, you can find the correct schemas using the TBX Validation API. The API can identify the schemas for the public dialects of TBX and some of the private dialects.
You will also need to know the style of the TBX file you have. There are two styles, DCA and DCT. For more information on the difference, see DCA vs. DCT. The style is given by the @style
attribute on the <tbx>
root element (see in red below):
<?xml version="1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBXcoreStructV03_TBX-Basic_integrated.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/LTAC-Global/TBX-Basic_dialect/master/DCA/TBX-Basic_DCA.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<tbx style="dca" type="TBX-Basic" xml:lang="en-US" xmlns="urn:iso:std:iso:30042:ed-2">
For a DCT file, take note of the DCT schemas. For a DCA file, take note of the DCA schemas.
Inserting the Schema Locations into the TBX File
If your TBX file does not have schema locations defined (as seen in red in the above “Using the Correct Schemas” section), you can use Oxygen XML to insert them.
To do this, open your file in Oxygen XML, then from the menu items select: Document->Schemas->Associate Schema
Then, in the URL input box, paste the path to the schema for your file:
The schema type should be automatically adjusted for you. If not you can manually select the correct schema type. Then press “OK”. The schema location should be inserted into the top of your TBX file.
Repeat this for each schema required by your dialect (and the style of your TBX file). For DCA, this usually means one RNG and one SCH. For DCT, this usually means one NVDL and one SCH.
Validating a TBX File with Oxygen XML
Now that you know the dialect and style of your TBX file, and have added the schema locations to your TBX file, you can validate the file simply by pressing Ctrl-Shift-V or by selecting the following from the menu: Document->Validate->Validate.
In some cases you may even have the option to simply press the validation button:
If the file is valid, you should see a green box to the right of your opened file:
If the file is invalid, you should see a red box instead. You may also see red lines indicating the locations of specific errors:
You should also see a window below with a list of the errors:
The error messages will tell you what is wrong. In this case, Oxygen has found an element called “taitle” which is not allowed. It also notices that the required element “title” is missing. With these two clues, we can extrapolate that we simply have a typo. By changing “taitle” to “title”, we should have a valid file.
Another type of error occurs when the value of an element is incorrect:
In this case, I have incorrectly used “proper noun” as a picklist value for the /part of speech/ data category. The validation error helpfully tells me that the permitted values are “noun”, “other”, “verb”, “adverb”, or “adjective”. By changing “proper noun” to “noun”, the error should be fixed.
Yet another type of error is a level placement error. This occurs if a data category is placed in a level (concept, language, or term level) in which it is not permitted. For example, in TBX-Basic, /definition/ may occur only at the concept and language levels, but not at a term level. If my file has a /definition/ appearing at the term level, I will get the following error:
If your file contains rogue data categories which are not permitted in the stated dialect of your file, you will get an error like the following:
In this case, my TBX-Basic file contains a /grammatical number/ data category, which is not permitted in TBX-Basic. The error message I get tells me which data categories I am permitted to use. If this information is vital to this project you can address this issue by changing the element to a <note>
or by looking into using a different dialect that better suits your needs. If you do not need this information and simply want a valid TBX-Basic file, then delete the rogue data category element.
Other common errors may be:
- Dates formatted incorrectly. Proper formatting is: YYYY-MM-DD
- External references not using full URIs (i.e., “wikipedia.org” instead of “https://www.wikipedia.org”)
- Internal references not using an id from the target element (i.e., “mouse”, instead of the value of
@id
of the concept which contains “mouse”) - Default namespace is declared incorrectly on the root element or, in DCT, module namespaces are incorrectly declared or the elements which belong to them are not correctly prefixed.
You must be logged in to post a comment.