Thursday, April 9, 2015

Ensuring xtext parsers respect UTF-8 characters in language files

Hope this piece of knowledge saves someone a nightlong hair-plucking ordeal.

Assume you are trying to parse a XText domain specific language file and transforming it into say an HTML. Interestingly the source files contain UTF-8 text and you realize to your absolute horror that the generated artifacts are all garbled.

The remedy is to set up a custom encoding provider!

Assume your language is called 'LangX' and you have a corresponding grammar file called 'LangX.xtext'. Once you run the MWE2 workflow, it will generate among other things a 'LangXRuntimeModule.java' in the same directory where you have kept your xtext file.

Go ahead and create a file called MyEncodingProvider.java in the same directory:

package langx ;

import org.eclipse.emf.common.util.URI ;
import org.eclipse.xtext.parser.IEncodingProvider ;

public class MyEncodingProvider implements IEncodingProvider {

    public String getEncoding( URI uri ) {
        return "UTF-8" ;
    }
}

Now open up your LangXRuntimeModule.java and add the new encoding provider:

package langx ;

import org.eclipse.xtext.parser.IEncodingProvider ;
import org.eclipse.xtext.service.DispatchingProvider ;

import com.google.inject.Binder ;

public class LangXRuntimeModule extends langx.AbstractLangXRuntimeModule {

    public void configureRuntimeEncodingProvider(Binder binder) {
        binder.bind(IEncodingProvider.class)
        .annotatedWith(DispatchingProvider.Runtime.class)
        .to(MyEncodingProvider.class);
    }
}

Okay, you are all set!

No comments: