MUTF8(3) Library Functions Manual MUTF8(3)

NAME

mutf8, MUTF8_INIT, MUTF8_DEINIT, mutf8_err, mutf8_str, mutf8_char, mutf8_sess, mutf8_u2utf8, — minimal UTF-8 library

SYNOPSIS

#include <mutf8.h>

const char *
mutf8_err(enum mutf8_err err);

int
mutf8_str(const char *p, mutf8_cb cb, void *arg);

int
mutf8_char(char c, int *state, int col, int line, mutf8_cb cb, void *arg);

int
mutf8_sess(struct mutf8_ss *sess, mutf8_cb cb, void *arg);

size_t
mutf8_u2utf8(unsigned int cp, char out[7]);

DESCRIPTION

The mutf8 library has a minimal set of routines for verifying UTF-8 data. It's designed to accept streams of char data and verify that UTF-8 encoded multi-byte sequences are correctly represented.

REFERENCE

This is a canonical reference of Data Types, Functions, and Macros in mutf8.

Data Types

enum mutf8_err
Error codes arising during verification. May be expressed as text with mutf8_err().
mutf8_cb
Function type called during on errors.
mutf8_ss
Read session. Provides getchar, which must provide a single character, returning -1 on failure, 0 on end of file, and 1 on success. Optionally, vrfy may be provided, which is handed verified characters. If it returns 0, the parse will fail. Stipulating bit-wise flags of MUTF8_BOM will indicate that a BOM should be parsed and MUTF8_IGNBOM to indicate that vrfy will not be called for BOM characters.

Functions

mutf8_err()
Get an English-language error string corresponding to err.
mutf8_str()
Verify the nil-terminated const char *p for correctness. Returns 1 on success, 0 on failure.
mutf8_char()
Verify a single character char c in a stream at position col and line. A call to MUTF8_INIT() initialises the stream with token int *state; MUTF8_DEINIT() should be called subsequent the parse to make sure the stream ends in a correct state.
mutf8_u2utf8()
Convert a Unicode codepoint into a UTF-8 string.
mutf8_sess()
Like mutf8_str() but Accepting a function call-back sess to provide characters.
mutf8_u2utf8()
Convert a Unicode value into a UTF-8 byte-stream. If this function returns a value 0, out will be filled with this many UTF-8 bytes followed by a nil terminator. If it returns 0, the input was not a proper Unicode value.

Macros

MUTF8_INIT()
Initialise a state for use by mutf8_vrfychar(). Accepts an int as its argument.
MUTF8_DEINIT()
Should be invoked following a sequence of mutf8_vrfychar() to make sure that a multi-byte sequence is not open at the end of a token stream.

STANDARDS

The mutf8 library verifies UTF-8 data as specified by RFC 3629, STD 63 (2003).

AUTHORS

The mutf8 library was written by Kristaps Dzonsons <kristaps@bsd.lv>.
May 2, 2011 OpenBSD 4.6