Chapter 3. Problematic areas for import: Data file

3.1. Data file inconsistent with the hierarchy of the structure file

There are a few types of problems that might arise when importing your data into LEXUS from the .typ data file. The size of a typical data file and numbering hundreds of entries makes it very important to think in advance about the potential issues that you might deal with. The solution to all of them is simple: consistency.

In the previous chapters we have stressed the importance of having a clear idea of how you want to organize your markers. Once that has been accomplished, it is crucial to keep that order consistent throughout your data. It is clear that often, as the data is added to the lexicon, such order becomes less systematic. For Toolbox keeping a certain hierarchy is not as important as it is for LEXUS.

When importing the data from Toolbox, problems usually begin when any of your entries contains a string of markers and their values that is against the hierarchy defined in the .typ file. Let us come back to the xv example. The black box in Figure 3.1 presents the structure of the .typ file with the relevant part of the entry description:

Hierarchy of markers and an entry that follows this pattern

Figure 3.1. Hierarchy of markers and an entry that follows this pattern


The translations xn and xe and the sound file sfx are placed under the appropriate example in Tsafiki - xv. If all the entries follow this order, there will no problems. Notice that the order of xn, xe and sfx does not matter - this is because they are all defined under the xv in the structure, and as long as they follow the xv in the data file, their ordering is of no relevance.

Let us assume, however, that there is an entry in your data file in which the definition markers and their values have a different order. In Figure 3.2 the structure of the .typ file is shown in the box together with the relevant part of the entry:

Hierarchy of markers and an entry that does not follow the pattern

Figure 3.2. Hierarchy of markers and an entry that does not follow the pattern


As LEXUS reads the entries linearly, line after line, and fills the structure that the .typ and the data files provide, it will treat such an entry differently. Whenever it encounters a marker that has a certain value, LEXUS checks under which marker this marker was defined in the .typ file. Subsequently, it looks back through the part of the entry that has already been created to see whether this higher marker has already appeared in the structure or not. If it has, then the currently analyzed marker will be linked under it.

For the purpose of our example, let us assume that (1) the .typ file and the data file follow the same structure, (2) xv in our structure file is linked under rf (reference group), and (3) rf has already appeared in the file and LEXUS has created a node for it. In this situation (see Figure 3.1) LEXUS will behave in the following way.

When encountering xv in the data file, LEXUS will check in the structure file where this definition marker should be linked to – in this case it will be under rf. As rf already exists in the structure of this entry, xv will be linked under rf. Remember, however, that xv has also other nodes linked under it in the structure file. Therefore, first a group node will be created out of it (xv group). It will be linked under rf and xv will be linked under that group node. The next marker that LEXUS will encounter will be xe. Here again LEXUS will check if the marker (that in the hierarchy is above it - xv group) already exists in the structure. Since it does as LEXUS has just created it, the xe marker will be linked under xv. This operation will be repeated until all the relevant markers are linked under the xv group definition marker. As a result, the following structure will be created for that entry in LEXUS. In this example, this is how we want our lexicon to look like:

LEXUS structure for the entry that follows the structure of the .typ file

Figure 3.3. LEXUS structure for the entry that follows the structure of the .typ file


However, when the order of the markers in the .typ file is as presented in Figure 3.2 above, the outcome will be different. LEXUS will first encounter sfx , not xv. Then, it will check under which node sfx is defined in the structure file. As we already know, it is defined under xv group. As the xv group definition marker has not appeared in this entry yet (remember that xv is placed after sfx), LEXUS will create xv group and link sfx under it. Then, LEXUS will encounter xv and create another xv group with xv, xn and xe linked under it. Eventually, the following structure will be the outcome:

LEXUS structure for an entry that does not follow structure of the .typ file

Figure 3.4. LEXUS structure for an entry that does not follow structure of the .typ file


This is problematic, because the information about the sound file (sfx) that goes with its translations (xv, xe, xn) is now lost – it is distributed between two different xv groups: one missing a sound file, the other missing the translation information. Such situations will always happen if a marker that is higher in the hierarchy of the structure file, appears also placed on a lower level.

That is why, when you want to import your data into LEXUS, you have to make sure that such situation does not occur in your data file. As a practical guideline, we suggest, therefore, to follow the hierarchy of markers from your .typ file in your data file. This means manually placing a marker right under the marker under which it was defined, and never above it.