Writing teasps: Substitutions

Substitutions replace data values found in the MultiTerm termbase with standardized values for TBX. Some kinds of data, such as definitions, do not have a limited, standardized set of values, and need no substitution. In these cases, simply note down the substitution "null" (including the quotation marks) and move on. Null substitution means the values received from MultiTerm will not be changed.

Data fields that require a substitution are most often those that take values from a picklist, such as grammatical gender. A substitution for grammatical gender might look like this:

	{
		"fem" : "feminine",
		"masc" : "masculine",
		"neut" : "neuter"
	}

The values on the left (fem, masc, and neut) are the values we expect to see in the MultiTerm data. Those on the right (feminine, masculine, and neuter) are the standard values in TBX. The surrounding punctuation is JSON syntax that shows how the substitution fits together. Because this substitution is our first large piece of JSON, we will take a moment to describe this syntax.Each expected or standardized value is placed between double quotation marks, making it a string. Each expected value is followed by a colon and its corresponding standardized value, making a pair. The pairs are separated from one another by commas, and the complete list of pairs is enclosed by curly braces, making a JSON object. Spaces, tabs, and newlines arrange the object for human readability and are recommended, but not required.

This example covers only three pairs of values, but a substitution can contain as many pairs as it needs to. It does not need to include pairs where the expected value is the same as the standardized value.

Three special substitutions are defined for rare needs. These are like the null substitution in that they are requested with a string, rather than a JSON object.

  • The "lowercase" substitution substitutes lowercase letters for capitals.
  • The "camel case" substitution removes spaces from any value, and marks the start of a new word with a capital letter instead.
  • The "category tag" substitution marks the data with the name of the MultiTerm data field. It is intended to distinguish among data fields that might all be mapped to a TBX <note>. The marking is human-readable only. To distinguish among kinds of data in a machine-readable way, map them to different data categories.

In each case (as also with "null"), the quotation marks are required to identify the name of the substitution as a string. Future versions of the converter may recognize additional special substitutions if needed.