File bom-1.0.1.obscpio of Package bom
07070100000000000081A4000003E8000000640000000161830F5E00000029000000000000000000000000000000000000001200000000bom-1.0.1/AUTHORSArchie L. Cobbs <archie.cobbs@gmail.com>
07070100000001000081A4000003E8000000640000000161830F5E000000B0000000000000000000000000000000000000001200000000bom-1.0.1/CHANGESVersion 1.0.1 Released November 3, 2021
- Fixed bug when multi-byte sequence crossed input buffer boundary
Version 1.0.0 Released October 16, 2021
- Initial release
07070100000002000081A4000003E8000000640000000161830F5E00000052000000000000000000000000000000000000001200000000bom-1.0.1/INSTALLSimplified instructions:
1. ./configure
2. make
3. sudo make install
07070100000003000081A4000003E8000000640000000161830F5E00002C5D000000000000000000000000000000000000001200000000bom-1.0.1/LICENSE Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
07070100000004000081A4000003E8000000640000000161830F5E000003F6000000000000000000000000000000000000001600000000bom-1.0.1/Makefile.am#
# bom - Deals with Unicode byte order marks
#
# Copyright (C) 2021 Archie L. Cobbs. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
bin_PROGRAMS= bom
man_MANS= bom.1
docdir= $(datadir)/doc/packages/$(PACKAGE)
doc_DATA= CHANGES LICENSE README.md INSTALL AUTHORS
EXTRA_DIST= CHANGES LICENSE README.md
bom_SOURCES= main.c \
gitrev.c
.PHONY: tests
tests: bom
cd tests && ./run.sh
gitrev.c:
printf 'const char *const bom_version = "%s";\n' "`git describe`" > gitrev.c
07070100000005000081A4000003E8000000640000000161830F5E0000142A000000000000000000000000000000000000001400000000bom-1.0.1/README.md**bom** is a simple UNIX command line utility for dealing with Unicode byte order marks (BOM's).
Unicode byte order marks are "magic number" byte sequences that sometimes appear at the beginning of a file to indicate the file's character encoding. They're sometimes helpful but usually they're just annoying.
You can read more about byte order marks [here](https://en.wikipedia.org/wiki/Byte_order_mark).
**bom** operates in one of the following modes:
* `bom --detect` Detect which type of byte order mark is present (if any) and print to standard output
* `bom --strip` Strip off the byte order mark (if any) and output the remainder of the file, optionally also converting to UTF-8
* `bom --print` Output the byte sequence corresponding to a byte order mark (useful for adding them to files)
* `bom --list` List the supported byte order mark types
Here is the man page:
```
BOM(1) BSD General Commands Manual BOM(1)
NAME
bom -- Decode Unicode byte order mark
SYNOPSIS
bom --strip [--expect types] [--lenient] [--prefer32] [--utf8] [file]
bom --detect [--expect types] [--prefer32] [file]
bom --print type
bom --list
bom --help
bom --version
DESCRIPTION
bom decodes, verifies, reports, and/or strips the byte order mark (BOM) at the
start of the specified file, if any.
When no file is specified, or when file is -, read standard input.
OPTIONS
-d, --detect
Report the detected BOM type to standard output and then exit.
See SUPPORTED BOM TYPES for possible values.
-e, --expect types
Expect to find one of the specified BOM types, otherwise exit with an
error.
Multiple types may be specified, separated by commas.
Specifying NONE is acceptable and matches when the file has no (sup-
ported) BOM.
-h, --help
Output command line usage help.
-l, --lenient
Silently ignore any illegal byte sequences encountered when converting
the remainder of the file to UTF-8.
Without this flag, bom will exit immediately with an error if an ille-
gal byte sequence is encountered.
This flag has no effect unless the --utf8 flag is given.
--list List the supported BOM types and exit.
-p, --print type
Output the byte sequence corresponding to the type byte order mark.
--prefer32
Used to disambiguate the byte sequence FF FE 00 00, which can be
either a UTF-32LE BOM or a UTF-16LE BOM followed by a NUL character.
Without this flag, UTF-16LE is assumed; with this flag, UTF-32LE is
assumed.
-s, --strip
Strip the BOM, if any, from the beginning of the file and output the
remainder of the file.
-u, --utf8
Convert the remainder of the file to UTF-8, assuming the character
encoding implied by the detected BOM.
For files with no (supported) BOM, this flag has no effect and the
remainder of the file is copied unmodified.
For files with a UTF-8 BOM, the identity transformation is still
applied, so (for example) illegal byte sequences will be detected.
-v, --version
Output program version and exit.
SUPPORTED BOM TYPES
The supported BOM types are:
NONE No supported BOM was detected.
UTF-7 A UTF-7 BOM was detected.
UTF-8 A UTF-8 BOM was detected.
UTF-16BE
A UTF-16 (Big Endian) BOM was detected.
UTF-16LE
A UTF-16 (Little Endian) BOM was detected.
UTF-32BE
A UTF-32 (Big Endian) BOM was detected.
UTF-32LE
A UTF-32 (Little Endian) BOM was detected.
GB18030
A GB18030 (Chinese National Standard) BOM was detected.
EXAMPLES
To tell what kind of byte order mark a file has:
$ bom --detect
To normalize files with byte order marks into UTF-8, and pass other files
through unchanged:
$ bom --strip --utf8
Same as previous example, but discard illegal byte sequences instead of gener-
ating an error:
$ bom --strip --utf8 --lenient
To verify a properly encoded UTF-8 or UTF-16 file with a byte-order-mark and
output it as UTF-8:
$ bom --strip --utf8 --expect UTF-8,UTF-16LE,UTF-16BE
To just remove any byte order mark and get on with your life:
$ bom --strip file
RETURN VALUES
bom exits with one of the following values:
0 Success.
1 A general error occurred.
2 The --expect flag was given but the detected BOM did not match.
3 An illegal byte sequence was detected (and --lenient was not speci-
fied).
SEE ALSO
iconv(1)
bom: Decode Unicode byte order mark, https://github.com/archiecobbs/bom.
AUTHOR
Archie L. Cobbs <archie.cobbs@gmail.com>
BSD October 14, 2021 BSD
```
07070100000006000081ED000003E8000000640000000161830F5E0000039B000000000000000000000000000000000000001500000000bom-1.0.1/autogen.sh#!/bin/bash
#
# Script to regenerate all the GNU auto* gunk.
# Run this from the top directory of the source tree.
#
# If it looks like I don't know what I'm doing here, you're right.
#
set -e
echo "cleaning up"
rm -rf autom4te*.cache scripts aclocal.m4 configure config.log config.status .deps stamp-h1
rm -f config.h.in config.h.in~ config.h
rm -rf scripts
find . \( -name Makefile -o -name Makefile.in \) -print0 | xargs -0 rm -f
rm -f *.o bom bom.1 bom-*.tar.gz gitrev.c
rm -rf a.out.* tags
if [ "${1}" = '-C' ]; then
exit 0
fi
ACLOCAL="aclocal"
AUTOHEADER="autoheader"
AUTOMAKE="automake"
AUTOCONF="autoconf"
echo "running aclocal"
mkdir scripts
${ACLOCAL} ${ACLOCAL_ARGS} -I scripts
echo "running autoheader"
${AUTOHEADER}
echo "running automake"
${AUTOMAKE} --add-missing -c --foreign
echo "running autoconf"
${AUTOCONF} -f -i
if [ "${1}" = '-c' ]; then
echo "running configure"
./configure
fi
07070100000007000081A4000003E8000000640000000161830F5E000012B1000000000000000000000000000000000000001300000000bom-1.0.1/bom.1.in.\" -*- nroff -*-
.\"
.\" bom - Deals with Unicode byte order marks
.\"
.\" Copyright (C) 2021 Archie L. Cobbs. All rights reserved.
.\"
.\" Licensed under the Apache License, Version 2.0 (the "License");
.\" you may not use this file except in compliance with the License.
.\" You may obtain a copy of the License at
.\"
.\" http://www.apache.org/licenses/LICENSE-2.0
.\"
.\" Unless required by applicable law or agreed to in writing, software
.\" distributed under the License is distributed on an "AS IS" BASIS,
.\" WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.\" See the License for the specific language governing permissions and
.\" limitations under the License.
.\"
.Dd October 14, 2021
.Dt BOM 1
.Os
.Sh NAME
.Nm bom
.Nd Decode Unicode byte order mark
.Sh SYNOPSIS
.Nm
.Fl \-strip
.Op Fl \-expect Ar types
.Op Fl \-lenient
.Op Fl \-prefer32
.Op Fl \-utf8
.Op Ar file
.Nm
.Fl \-detect
.Op Fl \-expect Ar types
.Op Fl \-prefer32
.Op Ar file
.Nm
.Fl \-print Ar type
.Nm
.Fl \-list
.Nm
.Fl \-help
.Nm
.Fl \-version
.Sh DESCRIPTION
.Nm
decodes, verifies, reports, and/or strips the byte order mark (BOM) at the start of the specified file, if any.
.Pp
When no
.Ar file
is specified, or when
.Ar file
is \-, read standard input.
.Sh OPTIONS
.Bl -tag -width Ds
.It Fl d , Fl \-detect
Report the detected BOM type to standard output and then exit.
.Pp
See
.Sx "SUPPORTED BOM TYPES"
for possible values.
.It Fl e , Fl \-expect Ar types
Expect to find one of the specified BOM types, otherwise exit with an error.
.Pp
Multiple types may be specified, separated by commas.
.Pp
Specifying
.Ar NONE
is acceptable and matches when the file has no (supported) BOM.
.It Fl h , Fl \-help
Output command line usage help.
.It Fl l , Fl \-lenient
Silently ignore any illegal byte sequences encountered when converting the remainder of the file to UTF-8.
.Pp
Without this flag,
.Nm
will exit immediately with an error if an illegal byte sequence is encountered.
.Pp
This flag has no effect unless the
.Fl \-utf8
flag is given.
.It Fl \-list
List the supported BOM types and exit.
.It Fl p , Fl \-print Ar type
Output the byte sequence corresponding to the
.Ar type
byte order mark.
.It Fl \-prefer32
Used to disambiguate the byte sequence
.Ar "FF FE 00 00" ,
which can be either a
.Ar UTF-32LE
BOM or a
.Ar UTF-16LE
BOM followed by a NUL character.
.Pp
Without this flag,
.Ar UTF-16LE
is assumed; with this flag,
.Ar UTF-32LE
is assumed.
.It Fl s , Fl \-strip
Strip the BOM, if any, from the beginning of the file and output the remainder of the file.
.It Fl u , Fl \-utf8
Convert the remainder of the file to UTF-8, assuming the character encoding implied by the detected BOM.
.Pp
For files with no (supported) BOM, this flag has no effect and the remainder of the file is copied unmodified.
.Pp
For files with a UTF-8 BOM, the identity transformation is still applied, so (for example) illegal byte sequences will be detected.
.It Fl v , Fl \-version
Output program version and exit.
.El
.Sh SUPPORTED BOM TYPES
The supported BOM types are:
.Bl -tag -width Ds
.It NONE
No supported BOM was detected.
.It UTF-7
A UTF-7 BOM was detected.
.It UTF-8
A UTF-8 BOM was detected.
.It UTF-16BE
A UTF-16 (Big Endian) BOM was detected.
.It UTF-16LE
A UTF-16 (Little Endian) BOM was detected.
.It UTF-32BE
A UTF-32 (Big Endian) BOM was detected.
.It UTF-32LE
A UTF-32 (Little Endian) BOM was detected.
.It GB18030
A GB18030 (Chinese National Standard) BOM was detected.
.El
.Sh EXAMPLES
.Pp
To tell what kind of byte order mark a file has:
.Bd -literal -offset indent
$ bom --detect file
.Ed
.Pp
To normalize files with byte order marks into UTF-8, and pass other files through unchanged:
.Bd -literal -offset indent
$ bom --strip --utf8 file
.Ed
.Pp
Same as previous example, but discard illegal byte sequences instead of generating an error:
.Bd -literal -offset indent
$ bom --strip --utf8 --lenient file
.Ed
.Pp
To verify a properly encoded UTF-8 or UTF-16 file with a byte-order-mark and output it as UTF-8:
.Bd -literal -offset indent
$ bom --strip --utf8 --expect UTF-8,UTF-16LE,UTF-16BE file
.Ed
.Pp
To just remove any byte order mark and get on with your life:
.Bd -literal -offset indent
$ bom --strip file
.Ed
.Sh RETURN VALUES
.Nm
exits with one of the following values:
.Bl -tag -width Ds
.It 0
Success.
.It 1
A general error occurred.
.It 2
The
.Fl \-expect
flag was given but the detected BOM did not match.
.It 3
An illegal byte sequence was detected (and
.Fl \-lenient
was not specified).
.El
.Sh SEE ALSO
.Xr iconv 1
.Rs
.%T "bom: Decode Unicode byte order mark"
.%O https://github.com/archiecobbs/bom
.Re
.Rs
.%T "Byte order mark (Wikipedia)"
.%O https://en.wikipedia.org/wiki/Byte_order_mark
.Re
.Sh AUTHOR
.An Archie L. Cobbs Aq archie.cobbs@gmail.com
07070100000008000081A4000003E8000000640000000161830F5E00000A07000000000000000000000000000000000000001700000000bom-1.0.1/configure.ac#
# bom - Deals with Unicode byte order marks
#
# Copyright (C) 2021 Archie L. Cobbs. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
AC_INIT([bom Deals with Unicode byte order marks], [1.0.1], [https://github.com/archiecobbs/bom/], [bom])
AC_CONFIG_AUX_DIR(scripts)
AM_INIT_AUTOMAKE
dnl AM_MAINTAINER_MODE
AC_PREREQ(2.59)
AC_PREFIX_DEFAULT(/usr)
AC_PROG_MAKE_SET
[CFLAGS="-g -O3 -pipe -Wall -Waggregate-return -Wcast-align -Wchar-subscripts -Wcomment -Wformat -Wimplicit -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wno-long-long -Wparentheses -Wpointer-arith -Wredundant-decls -Wreturn-type -Wswitch -Wtrigraphs -Wuninitialized -Wunused -Wwrite-strings -Wshadow -Wstrict-prototypes -Wcast-qual $CFLAGS"]
AC_SUBST(CFLAGS)
# Compile flags for Linux
AC_DEFINE(_DEFAULT_SOURCE, 1, Default functions)
AC_DEFINE(_GNU_SOURCE, 1, GNU functions)
AC_DEFINE(_BSD_SOURCE, 1, BSD functions)
AC_DEFINE(_XOPEN_SOURCE, 500, XOpen functions)
# Compile flags for Mac OS
AC_DEFINE(_DARWIN_C_SOURCE, 1, MacOS functions)
# Check for required programs
AC_PROG_INSTALL
AC_PROG_CC
AC_PATH_PROG([CAT], [cat], [], [])
if test "x${CAT}" = "x"; then
AC_MSG_ERROR[cat not found]
fi
AC_PATH_PROG([SED], [sed], [], [])
if test "x${SED}" = "x"; then
AC_MSG_ERROR[sed not found]
fi
# Check for required libc functions
AC_SEARCH_LIBS([iconv_open], [iconv],,
[if test `uname -o` = 'Cygwin' -a -f /usr/lib/libiconv.a; then LIBS="-liconv ${LIBS}"; else AC_MSG_ERROR([required function iconv_open missing]); fi])
# Check for required header files
AC_HEADER_STDC
AC_CHECK_HEADERS(ctype.h errno.h stdio.h stdlib.h string.h unistd.h sys/stat.h sys/types.h, [],
[AC_MSG_ERROR([required header file '$ac_header' missing])])
# Optional features
AC_ARG_ENABLE(Werror,
AC_HELP_STRING([--enable-Werror],
[enable compilation with -Werror flag (default NO)]),
[test x"$enableval" = "xyes" && CFLAGS="${CFLAGS} -Werror"])
# Generated files
AC_CONFIG_FILES(Makefile)
AC_CONFIG_FILES(bom.1)
AM_CONFIG_HEADER(config.h)
# Go
AC_OUTPUT
07070100000009000081A4000003E8000000640000000161830F5E000049E6000000000000000000000000000000000000001100000000bom-1.0.1/main.c/*
* bom - Deals with Unicode byte order marks
*
* Copyright (C) 2021 Archie L. Cobbs. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <assert.h>
#include <ctype.h>
#include <err.h>
#include <errno.h>
#include <getopt.h>
#include <iconv.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// Copyright character
#define COPYRIGHT "\xc2\xa9"
// Special exit values
#define EX_EXPECT_FAIL 2
#define EX_ILLEGAL_BYTES 3
// Version string
extern const char *const bom_version;
// Command line options that only have long versions
#define FLAG_LIST (-2)
#define FLAG_PREFER_32 (-3)
#define OPT(_letter, _name, _arg) \
{ \
.name= _name, \
.has_arg= _arg, \
.flag= NULL, \
.val= _letter \
}
static const struct option long_options[] = {
OPT('d', "detect", no_argument),
OPT('e', "expect", required_argument),
OPT('h', "help", no_argument),
OPT(FLAG_LIST, "list", no_argument),
OPT('l', "lenient", no_argument),
OPT('p', "print", required_argument),
OPT(FLAG_PREFER_32, "prefer32", no_argument),
OPT('s', "strip", no_argument),
OPT('u', "utf8", no_argument),
OPT('v', "version", no_argument),
OPT(0, NULL, 0)
};
// Execution modes
#define MODE_STRIP 1
#define MODE_DETECT 2
#define MODE_LIST 3
#define MODE_PRINT 4
#define MODE_HELP 5
#define MODE_VERSION 6
// BOM types
struct bom_type {
const char *name;
const char *encoding;
const char *bytes;
const int len;
};
#define BOM_TYPE(_name, _encoding, _bytes) \
{ \
.name= _name, \
.encoding= _encoding, \
.bytes= _bytes, \
.len= sizeof(_bytes) - 1 \
}
static const struct bom_type bom_types[] = {
BOM_TYPE("NONE", NULL, ""),
BOM_TYPE("UTF-7", "UTF-7", "\x2b\x2f\x76"),
BOM_TYPE("UTF-8", "UTF-8", "\xef\xbb\xbf"),
BOM_TYPE("UTF-16BE", "UTF-16BE", "\xfe\xff"),
BOM_TYPE("UTF-16LE", "UTF-16LE", "\xff\xfe"),
BOM_TYPE("UTF-32BE", "UTF-32BE", "\x00\x00\xfe\xff"),
BOM_TYPE("UTF-32LE", "UTF-32LE", "\xff\xfe\x00\x00"),
BOM_TYPE("GB18030", "GB18030", "\x84\x31\x95\x33"),
};
#define BOM_TYPE_NONE 0
#define BOM_TYPE_UTF_7 1
#define BOM_TYPE_UTF_8 2
#define BOM_TYPE_UTF_16BE 3
#define BOM_TYPE_UTF_16LE 4
#define BOM_TYPE_UTF_32BE 5
#define BOM_TYPE_UTF_32LE 6
#define BOM_TYPE_GB18030 7
#define BOM_TYPE_MAX 8
// Input buffer
#define BUFFER_SIZE 1024
struct bom_input {
char buf[BUFFER_SIZE];
int len;
int num_complete;
int num_finished;
int match_state[BOM_TYPE_MAX];
};
#define MATCH_PREFIX 0
#define MATCH_COMPLETE 1
#define MATCH_FAILED 2
// Mode of execution functions
static void bom_detect(FILE *fp, long expect_types, int prefer32);
static void bom_strip(FILE *fp, long expect_types, int lenient, int prefer32, int utf8);
static void bom_list(void);
static void bom_print(int bom_type);
// Helper functions
static int read_bom(FILE *fp, struct bom_input *const input, long expect_types, int prefer32);
static int read_byte(FILE *fp, struct bom_input *input);
static int bom_type_from_name(const char *name);
static void init_bom_input(struct bom_input *const input);
static void set_mode(int *modep, int mode);
static void usage(void);
int
main(int argc, char **argv)
{
const struct option *opt;
char optstring[32];
long expect_types = 0;
int option_index;
int bom_type = -1;
int prefer32 = 0;
int lenient = 0;
FILE *fp = NULL;
int mode = 0;
int utf8 = 0;
char *s;
int ch;
// Build optstring dynamically
s = optstring;
for (opt = long_options; opt->name != NULL; opt++) {
if (opt->val > 0) {
*s++ = (char)opt->val;
if (opt->has_arg)
*s++ = ':';
}
}
*s = '\0';
// Parse command line
while ((ch = getopt_long(argc, argv, optstring, long_options, &option_index)) != -1) {
switch (ch) {
case 'd':
set_mode(&mode, MODE_DETECT);
break;
case 'e':
while ((s = strsep(&optarg, ",")) != NULL) {
if ((bom_type = bom_type_from_name(s)) >= sizeof(expect_types) * 8)
errx(1, "internal error: %s", "too many BOM types");
expect_types |= (1 << bom_type);
}
break;
case 'h':
set_mode(&mode, MODE_HELP);
break;
case 'l':
lenient = 1;
break;
case 'p':
bom_type = bom_type_from_name(optarg);
set_mode(&mode, MODE_PRINT);
break;
case 's':
set_mode(&mode, MODE_STRIP);
break;
case 'u':
utf8 = 1;
break;
case 'v':
set_mode(&mode, MODE_VERSION);
break;
case FLAG_PREFER_32:
prefer32 = 1;
break;
case FLAG_LIST:
set_mode(&mode, MODE_LIST);
break;
case '?':
default:
usage();
return 1;
}
}
argv += optind;
argc -= optind;
// Parse remainder of command line
switch (mode) {
case MODE_STRIP:
case MODE_DETECT:
switch (argc) {
case 0:
fp = stdin;
break;
case 1:
if (strcmp(argv[0], "-") == 0) {
fp = stdin;
break;
}
if ((fp = fopen(argv[0], "r")) == NULL)
err(1, "%s", argv[0]);
break;
default:
usage();
return 1;
}
break;
default:
switch (argc) {
case 0:
break;
default:
usage();
return 1;
}
break;
}
// Execute
switch (mode) {
case MODE_STRIP:
bom_strip(fp, expect_types, lenient, prefer32, utf8);
break;
case MODE_DETECT:
bom_detect(fp, expect_types, prefer32);
break;
case MODE_LIST:
bom_list();
break;
case MODE_PRINT:
bom_print(bom_type);
break;
case MODE_HELP:
usage();
break;
case MODE_VERSION:
fprintf(stderr, "bom %s\n", bom_version);
fprintf(stderr, "Copyright %s Archie L. Cobbs. All rights reserved.\n", COPYRIGHT);
break;
default:
usage();
return 1;
}
// Done
return 0;
}
static void
bom_detect(FILE *fp, long expect_types, int prefer32)
{
const struct bom_type *bt;
struct bom_input input;
int bom_type;
// Read BOM
init_bom_input(&input);
bom_type = read_bom(fp, &input, expect_types, prefer32);
bt = &bom_types[bom_type];
// Print its name
printf("%s\n", bt->name);
}
#if DEBUG_ICONV_OPS
#define BYTES_PER_ROW 20
static void
debug_buffer(const size_t base, const void *data, size_t len)
{
size_t offset;
size_t i;
if (data == NULL) {
fprintf(stderr, " NULL\n");
return;
}
for (offset = 0; offset < len; offset += BYTES_PER_ROW) {
fprintf(stderr, "%08d: ", (unsigned int)(base + offset));
for (i = 0; i < BYTES_PER_ROW; i++) {
const int val = offset + i < len ? *((const char *)data + offset + i) & 0xff : -1;
if (i == BYTES_PER_ROW / 2)
fprintf(stderr, " ");
if (val != -1)
fprintf(stderr, " %02x", val);
else
fprintf(stderr, " ");
}
fprintf(stderr, " ");
for (i = 0; i < BYTES_PER_ROW; i++) {
const int val = offset + i < len ? *((const char *)data + offset + i) & 0xff : -1;
if (val != -1)
fprintf(stderr, "%c", isprint(val) ? val : '.');
else
fprintf(stderr, " ");
}
fprintf(stderr, "\n");
}
}
#endif /* DEBUG_ICONV_OPS */
static void
bom_strip(FILE *fp, long expect_types, int lenient, int prefer32, int utf8)
{
const struct bom_type *bt;
struct bom_input input;
char ibuf[BUFFER_SIZE];
char obuf[BUFFER_SIZE];
char tocode[32];
size_t offset;
iconv_t icd = 0;
int done = 0;
int bom_type;
int ilen;
// Read BOM
init_bom_input(&input);
bom_type = read_bom(fp, &input, expect_types, prefer32);
bt = &bom_types[bom_type];
// If BOM type is NONE, then obviously we can't convert to UTF-8
if (bom_type == BOM_TYPE_NONE)
utf8 = 0;
// Initialize iconv conversion engine
if (utf8) {
snprintf(tocode, sizeof(tocode), "%s%s", bom_types[BOM_TYPE_UTF_8].encoding, lenient ? "//IGNORE" : "");
if ((icd = iconv_open(tocode, bt->encoding)) == (iconv_t)-1)
err(1, "iconv: \"%s\" -> \"%s\"", bt->encoding, tocode);
}
// Copy over any bytes we read after the BOM into our input buffer
ilen = input.len - bt->len;
memcpy(ibuf, input.buf + bt->len, ilen);
offset = bt->len;
// Convert remainder of file
while (!done) {
size_t nread;
size_t nwrit;
char *iptr;
char *optr;
size_t iremain;
size_t oremain;
int eof = 0;
size_t r;
// Fill the input buffer
while (ilen < sizeof(ibuf)) {
if ((nread = fread(ibuf + ilen, 1, sizeof(ibuf) - ilen, fp)) == 0) {
if (ferror(fp))
err(1, "read error");
eof = 1;
break;
}
ilen += nread;
}
// When the input buffer is empty and we couldn't add anything more, this is the last round
done = ilen == 0;
// Convert bytes (unless BOM_TYPE_NONE)
iptr = ibuf;
optr = obuf;
iremain = ilen;
oremain = sizeof(obuf);
// Convert to UTF-8 or just pass through
if (utf8) {
#if DEBUG_ICONV_OPS
fprintf(stderr, "->iconv@%d: ilen=%d\n", (int)offset, (int)ilen);
debug_buffer(offset, iptr, ilen);
#endif
r = iconv(icd, !done ? &iptr : NULL, &iremain, &optr, &oremain);
#if DEBUG_ICONV_OPS
{
const int errno_save = errno;
fprintf(stderr, "<-iconv@%d: r=%d errno=%d iptr@%d optr@%d\n",
(int)offset, (int)r, errno, (int)(iptr - ibuf), (int)(optr - obuf));
debug_buffer(offset, obuf, optr - obuf);
errno = errno_save;
}
#endif
if (r == (size_t)-1) {
switch (errno) {
case EINVAL: // incomplete multi-byte sequence at the end of the input buffer
if (!done && !eof)
break;
// FALLTHROUGH
case EILSEQ: // an invalid byte sequence was detected
if (lenient) {
iptr += iremain; // avoid an infinite loop on trailing partial multi-byte sequence
iremain = 0;
break;
}
errx(EX_ILLEGAL_BYTES, "invalid %s byte sequence at file offset %lu", bt->name, offset + (iptr - ibuf));
default:
err(1, "iconv");
}
}
} else { // behave like iconv() would but just copy the bytes
memcpy(optr, iptr, ilen);
if (!done)
iptr += ilen;
iremain = 0;
optr += ilen;
oremain -= ilen;
}
// Update file offset
offset += ilen - iremain;
// Shift unprocessed input for next time
memmove(ibuf, iptr, iremain);
ilen = iremain;
// Write output
oremain = optr - obuf;
optr = obuf;
while (oremain > 0 && (nwrit = fwrite(optr, 1, oremain, stdout)) > 0) {
optr += nwrit;
oremain -= nwrit;
}
if (ferror(stdout))
err(1, "write error");
}
if (fflush(stdout) == EOF)
err(1, "write error");
// Close conversion
if (utf8)
(void)iconv_close(icd);
}
static void
bom_list(void)
{
int bom_type;
for (bom_type = 0; bom_type < BOM_TYPE_MAX; bom_type++) {
const struct bom_type *const bt = &bom_types[bom_type];
printf("%s\n", bt->name);
}
}
static void
bom_print(int bom_type)
{
const struct bom_type *const bt = &bom_types[bom_type];
int i;
for (i = 0; i < bt->len; i++) {
if (putchar(bt->bytes[i] & 0xff) == EOF)
err(1, "write error");
}
}
static int
read_bom(FILE *fp, struct bom_input *const input, long expect_types, int prefer32)
{
int bom_type;
// Read bytes until all BOM's are either completely matched or have failed to match
while (read_byte(fp, input)) {
if (input->num_finished == BOM_TYPE_MAX)
break;
}
// Handle the UTF-16LE vs. UTF-32LE ambiguity
if (input->match_state[BOM_TYPE_UTF_16LE] == MATCH_COMPLETE
&& input->match_state[BOM_TYPE_UTF_32LE] == MATCH_COMPLETE) {
input->match_state[prefer32 ? BOM_TYPE_UTF_16LE : BOM_TYPE_UTF_32LE] = MATCH_FAILED;
input->num_complete--;
}
// At this point there should be BOM_TYPE_NONE and at most one other match
assert(input->match_state[BOM_TYPE_NONE] == MATCH_COMPLETE);
switch (input->num_complete) {
case 1:
bom_type = BOM_TYPE_NONE;
break;
case 2:
for (bom_type = 0; bom_type < BOM_TYPE_MAX; bom_type++) {
if (bom_type != BOM_TYPE_NONE && input->match_state[bom_type] == MATCH_COMPLETE)
break;
}
if (bom_type < BOM_TYPE_MAX)
break;
// FALLTHROUGH
default:
errx(1, "internal error: %s", ">2 BOM type matches");
}
// Check expected BOM type
if (expect_types != 0 && (expect_types & (1 << bom_type)) == 0)
errx(EX_EXPECT_FAIL, "unexpected BOM type %s", bom_types[bom_type].name);
// Done
return bom_type;
}
static int
bom_type_from_name(const char *name)
{
int bom_type;
for (bom_type = 0; bom_type < BOM_TYPE_MAX; bom_type++) {
if (strcmp(bom_types[bom_type].name, name) == 0)
return bom_type;
}
errx(1, "unknown BOM type \"%s\"", name);
}
static int
read_byte(FILE *fp, struct bom_input *const input)
{
int bom_type;
int ch;
// Read next byte
if ((ch = getc(fp)) == EOF) {
if (ferror(fp))
err(1, "read error");
return 0;
}
// Update state
if (input->len >= sizeof(input->buf))
errx(1, "internal error: %s", "input buffer overflow");
for (bom_type = 0; bom_type < BOM_TYPE_MAX; bom_type++) {
const struct bom_type *const bt = &bom_types[bom_type];
switch (input->match_state[bom_type]) {
case MATCH_PREFIX:
if (bt->bytes[input->len] != (char)ch) {
input->match_state[bom_type] = MATCH_FAILED;
input->num_finished++;
} else if (bt->len == input->len + 1) {
input->match_state[bom_type] = MATCH_COMPLETE;
input->num_finished++;
input->num_complete++;
}
break;
case MATCH_COMPLETE:
case MATCH_FAILED:
break;
default:
errx(1, "internal error: %s", "invalid match state");
}
}
input->buf[input->len++] = (char)ch;
return 1;
}
static void
init_bom_input(struct bom_input *const input)
{
memset(input, 0, sizeof(*input));
input->match_state[BOM_TYPE_NONE] = MATCH_COMPLETE;
input->num_complete = 1;
input->num_finished = 1;
}
static void
set_mode(int *modep, int mode)
{
if (*modep != 0) {
usage();
exit(1);
}
*modep = mode;
}
static void
usage(void)
{
fprintf(stderr, "Usage:\n");
fprintf(stderr, " bom --strip [--expect types] [--lenient] [--prefer32] [--utf8] [file]\n");
fprintf(stderr, " bom --detect [--expect types] [--prefer32] [file]\n");
fprintf(stderr, " bom --list\n");
fprintf(stderr, " bom --print type\n");
fprintf(stderr, " bom --help\n");
fprintf(stderr, " bom --version\n");
fprintf(stderr, "Options:\n");
fprintf(stderr, " -d, --detect Report the detected BOM type and exit\n");
fprintf(stderr, " -e, --expect types Expect the specified BOM type(s) (separated by commas)\n");
fprintf(stderr, " -h, --help Output command line usage summary\n");
fprintf(stderr, " -l, --lenient Skip invalid input byte sequences instead of failing\n");
fprintf(stderr, " --list List the supported BOM types\n");
fprintf(stderr, " -p, --print type Output the byte sequence corresponding to \"type\"\n");
fprintf(stderr, " --prefer32 Prefer UTF-32LE instead of UTF-16LE followed by NUL\n");
fprintf(stderr, " -s, --strip Strip the BOM and output the remainder of the file\n");
fprintf(stderr, " -u, --utf8 Convert the remainder of the file to UTF-8\n");
fprintf(stderr, " -v, --version Output program version and exit\n");
}
0707010000000A000081ED000003E8000000640000000161830F5E00000156000000000000000000000000000000000000001500000000bom-1.0.1/manpage.sh#!/bin/bash
# Bail on error
set -e
NCOLS="83"
MANPAGE="bom.1.in"
sed '/man page/q' < README.md > README.md.NEW
printf '```\n' >> README.md.NEW
groff -r LL=${NCOLS}n -r LT=${NCOLS}n -Tlatin1 -man "${MANPAGE}" \
| sed -r -e 's/.\x08(.)/\1/g' -e 's/[[0-9]+m//g' \
>> README.md.NEW
printf '```\n' >> README.md.NEW
mv README.md{.NEW,}
0707010000000B000041ED000003E8000000640000000261830F5E00000000000000000000000000000000000000000000001000000000bom-1.0.1/tests0707010000000C000081ED000003E8000000640000000161830F5E00000CEA000000000000000000000000000000000000001700000000bom-1.0.1/tests/run.sh#!/bin/bash
# Bail on error
set -e
# Setup temporary files
TMP_STDOUT_EXPECTED='bom-test-out-expected.tmp'
TMP_STDERR_EXPECTED='bom-test-err-expected.tmp'
TMP_STDOUT_ACTUAL='bom-test-out-actual.tmp'
TMP_STDERR_ACTUAL='bom-test-err-actual.tmp'
TMP_SWAP_FILE=''bom-test-hexdump.tmp
trap "rm -f \
${TMP_STDOUT_EXPECTED} \
${TMP_STDERR_EXPECTED} \
${TMP_STDOUT_ACTUAL} \
${TMP_STDERR_ACTUAL} \
${TMP_SWAP_FILE}" 0 2 3 5 10 13 15
# Convert a file to hexdump version
hexdumpify()
{
FILE="${1}"
hexdump -C < "${FILE}" > "${TMP_SWAP_FILE}"
mv "${TMP_SWAP_FILE}" "${FILE}"
}
# Compare files, on failure set ${DIFF_FAIL}
checkdiff()
{
if [ "${1}" = '-h' ]; then
HEXDUMPIFY='true'
shift
else
HEXDUMPIFY='false'
fi
TESTFILE="${1}"
WHAT="${2}"
EXPECTED="${3}"
ACTUAL="${4}"
if diff -q "${EXPECTED}" "${ACTUAL}" >/dev/null; then
return 0
fi
echo "test: ${TESTFILE}: ${WHAT} mismatch"
echo '------------------------------------------------------'
if [ "${HEXDUMPIFY}" = 'true' ]; then
hexdumpify "${EXPECTED}"
hexdumpify "${ACTUAL}"
fi
diff -u "${EXPECTED}" "${ACTUAL}" || true
echo '------------------------------------------------------'
DIFF_FAIL='true'
}
# Execute one test, on failure set ${TEST_FAIL}
runtest()
{
# Read test data
unset FLAGS
unset STDIN
unset STDOUT
unset STDERR
unset EXITVAL
. "${TESTFILE}"
if [ -z "${FLAGS+x}" \
-o -z "${STDIN+x}" \
-o -z "${STDOUT+x}" \
-o -z "${STDERR+x}" \
-o -z "${EXITVAL+x}" ]; then
echo "test: ${TESTFILE}: invalid test file"
exit 1
fi
# Set up files
echo -en "${STDOUT}" > "${TMP_STDOUT_EXPECTED}"
echo -en "${STDERR}" > "${TMP_STDERR_EXPECTED}"
set +e
echo -en "${STDIN}" | ../bom ${FLAGS} >"${TMP_STDOUT_ACTUAL}" 2>"${TMP_STDERR_ACTUAL}"
ACTUAL_EXITVAL="$?"
set -e
# Special hacks
if [ "${STDERR}" = '!USAGE!' ]; then
../bom --help 2>"${TMP_STDERR_EXPECTED}"
fi
# Check result
DIFF_FAIL='false'
checkdiff -h "${TESTFILE}" "standard output" "${TMP_STDOUT_EXPECTED}" "${TMP_STDOUT_ACTUAL}"
checkdiff "${TESTFILE}" "standard error" "${TMP_STDERR_EXPECTED}" "${TMP_STDERR_ACTUAL}"
if [ "${DIFF_FAIL}" != 'false' ]; then
TEST_FAIL='true'
fi
if [ "${ACTUAL_EXITVAL}" -ne "${EXITVAL}" ]; then
echo "test: ${TESTFILE}: exit value ${ACTUAL_EXITVAL} != ${EXITVAL}"
TEST_FAIL='true'
fi
# Print success or if failure show params
if [ "${TEST_FAIL}" = 'false' ]; then
echo "test: ${TESTFILE}: success"
else
echo "******************************************************"
echo "test: ${TESTFILE} FAILED with:"
echo " FLAGS='${FLAGS}'"
echo " STDIN='${STDIN}'"
echo "******************************************************"
fi
}
# Find all tests and run them
ANY_FAIL='false'
for TESTFILE in `find . -maxdepth 1 -type f -name 'test-*.tst' | sort | sed 's|^./||g'`; do
TEST_FAIL='false'
runtest "${TESTFILE}"
if [ "${TEST_FAIL}" != 'false' ]; then
ANY_FAIL='true'
fi
done
# Exit with error if any test failed
if [ "${ANY_FAIL}" != 'false' ]; then
exit 1
fi
0707010000000D000081A4000003E8000000640000000161830F5E00000040000000000000000000000000000000000000002600000000bom-1.0.1/tests/test-detect-empty.tstFLAGS='--detect'
STDIN=''
STDOUT='NONE\n'
STDERR=''
EXITVAL='0'
0707010000000E000081A4000003E8000000640000000161830F5E00000064000000000000000000000000000000000000002B00000000bom-1.0.1/tests/test-detect-expect-001.tstFLAGS='--detect --expect UTF-8'
STDIN='\xef\xbb\xbfblahblah'
STDOUT='UTF-8\n'
STDERR=''
EXITVAL='0'
0707010000000F000081A4000003E8000000640000000161830F5E00000080000000000000000000000000000000000000002B00000000bom-1.0.1/tests/test-detect-expect-002.tstFLAGS='--detect --expect UTF-16LE'
STDIN='\xef\xbb\xbfblahblah'
STDOUT=''
STDERR='bom: unexpected BOM type UTF-8\n'
EXITVAL='2'
07070100000010000081A4000003E8000000640000000161830F5E00000044000000000000000000000000000000000000002800000000bom-1.0.1/tests/test-detect-partial.tstFLAGS='--detect'
STDIN='\xff'
STDOUT='NONE\n'
STDERR=''
EXITVAL='0'
07070100000011000081A4000003E8000000640000000161830F5E0000007D000000000000000000000000000000000000002400000000bom-1.0.1/tests/test-list-types.tstFLAGS='--list'
STDIN=''
STDOUT='NONE\nUTF-7\nUTF-8\nUTF-16BE\nUTF-16LE\nUTF-32BE\nUTF-32LE\nGB18030\n'
STDERR=''
EXITVAL='0'
07070100000012000081A4000003E8000000640000000161830F5E0000004E000000000000000000000000000000000000002600000000bom-1.0.1/tests/test-prefer32-001.tstFLAGS='-d'
STDIN='\xff\xfe\x00\x00'
STDOUT='UTF-16LE\n'
STDERR=''
EXITVAL='0'
07070100000013000081A4000003E8000000640000000161830F5E00000059000000000000000000000000000000000000002600000000bom-1.0.1/tests/test-prefer32-002.tstFLAGS='-d --prefer32'
STDIN='\xff\xfe\x00\x00'
STDOUT='UTF-32LE\n'
STDERR=''
EXITVAL='0'
07070100000014000081A4000003E8000000640000000161830F5E00000051000000000000000000000000000000000000002700000000bom-1.0.1/tests/test-print-GB18030.tstFLAGS='--print GB18030'
STDIN=''
STDOUT='\x84\x31\x95\x33'
STDERR=''
EXITVAL='0'
07070100000015000081A4000003E8000000640000000161830F5E0000003E000000000000000000000000000000000000002400000000bom-1.0.1/tests/test-print-NONE.tstFLAGS='--print NONE'
STDIN=''
STDOUT=''
STDERR=''
EXITVAL='0'
07070100000016000081A4000003E8000000640000000161830F5E00000062000000000000000000000000000000000000002700000000bom-1.0.1/tests/test-print-UNKNOWN.tstFLAGS='--print UNKNOWN'
STDIN=''
STDOUT=''
STDERR='bom: unknown BOM type "UNKNOWN"\n'
EXITVAL='1'
07070100000017000081A4000003E8000000640000000161830F5E0000004A000000000000000000000000000000000000002800000000bom-1.0.1/tests/test-print-UTF-16BE.tstFLAGS='--print UTF-16BE'
STDIN=''
STDOUT='\xfe\xff'
STDERR=''
EXITVAL='0'
07070100000018000081A4000003E8000000640000000161830F5E0000004A000000000000000000000000000000000000002800000000bom-1.0.1/tests/test-print-UTF-16LE.tstFLAGS='--print UTF-16LE'
STDIN=''
STDOUT='\xff\xfe'
STDERR=''
EXITVAL='0'
07070100000019000081A4000003E8000000640000000161830F5E00000052000000000000000000000000000000000000002800000000bom-1.0.1/tests/test-print-UTF-32BE.tstFLAGS='--print UTF-32BE'
STDIN=''
STDOUT='\x00\x00\xfe\xff'
STDERR=''
EXITVAL='0'
0707010000001A000081A4000003E8000000640000000161830F5E00000052000000000000000000000000000000000000002800000000bom-1.0.1/tests/test-print-UTF-32LE.tstFLAGS='--print UTF-32LE'
STDIN=''
STDOUT='\xff\xfe\x00\x00'
STDERR=''
EXITVAL='0'
0707010000001B000081A4000003E8000000640000000161830F5E0000004B000000000000000000000000000000000000002500000000bom-1.0.1/tests/test-print-UTF-7.tstFLAGS='--print UTF-7'
STDIN=''
STDOUT='\x2b\x2f\x76'
STDERR=''
EXITVAL='0'
0707010000001C000081A4000003E8000000640000000161830F5E0000004B000000000000000000000000000000000000002500000000bom-1.0.1/tests/test-print-UTF-8.tstFLAGS='--print UTF-8'
STDIN=''
STDOUT='\xef\xbb\xbf'
STDERR=''
EXITVAL='0'
0707010000001D000081A4000003E8000000640000000161830F5E0000008E000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-strip-001.tstFLAGS='--strip --utf8'
STDIN='\xef\xbb\xbftest123\xff456'
STDOUT=''
STDERR='bom: invalid UTF-8 byte sequence at file offset 10\n'
EXITVAL='3'
0707010000001E000081A4000003E8000000640000000161830F5E0000006E000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-strip-002.tstFLAGS='--strip --lenient --utf8'
STDIN='\xef\xbb\xbftest123\xff456'
STDOUT='test123456'
STDERR=''
EXITVAL='0'
0707010000001F000081A4000003E8000000640000000161830F5E00000061000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-strip-003.tstFLAGS='--strip'
STDIN='\xef\xbb\xbftest123\xff456'
STDOUT='test123\xff456'
STDERR=''
EXITVAL='0'
07070100000020000081A4000003E8000000640000000161830F5E000000F1000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-strip-004.tst# The input is truncated after 2/3 of a rightwards arrow U2192 -> e2 86 92
FLAGS='--strip --expect UTF-8 --utf8'
STDIN='\xef\xbb\xbfpartial arrow: \xe2\x86'
STDOUT=''
STDERR='bom: invalid UTF-8 byte sequence at file offset 18\n'
EXITVAL='3'
07070100000021000081A4000003E8000000640000000161830F5E000000D6000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-strip-005.tst# The input is truncated after 2/3 of a rightwards arrow U2192 -> e2 86 92
FLAGS='--strip --expect UTF-8 --utf8 --lenient'
STDIN='\xef\xbb\xbfpartial arrow: \xe2\x86'
STDOUT='partial arrow: '
STDERR=''
EXITVAL='0'
07070100000022000081A4000003E8000000640000000161830F5E0000014B000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-strip-006.tst# This has a multi-byte sequence that crosses our input buffer boundary
FLAGS='--strip --expect UTF-8 --utf8'
STDIN_BOM='\xef\xbb\xbf'
STDIN_1019=`yes aaaaaaaaaaaaaaa | tr -d \\\\n | head -c 1023`
STDIN_ARROW='\xe2\x86\x92'
STDIN="${STDIN_BOM}${STDIN_1019}${STDIN_ARROW}"
STDOUT="${STDIN_1019}${STDIN_ARROW}"
STDERR=''
EXITVAL='0'
07070100000023000081A4000003E8000000640000000161830F5E00000039000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-001.tstFLAGS=''
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
07070100000024000081A4000003E8000000640000000161830F5E00000049000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-002.tstFLAGS='--strip --detect'
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
07070100000025000081A4000003E8000000640000000161830F5E00000048000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-003.tstFLAGS='--detect --list'
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
07070100000026000081A4000003E8000000640000000161830F5E0000004C000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-004.tstFLAGS='--list --print NONE'
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
07070100000027000081A4000003E8000000640000000161830F5E0000004C000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-005.tstFLAGS='--print NONE --help'
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
07070100000028000081A4000003E8000000640000000161830F5E00000042000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-006.tstFLAGS='-d --list'
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
07070100000029000081A4000003E8000000640000000161830F5E0000003D000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-007.tstFLAGS='-sdu'
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
0707010000002A000081A4000003E8000000640000000161830F5E00000049000000000000000000000000000000000000002300000000bom-1.0.1/tests/test-usage-008.tstFLAGS='--detect foo bar'
STDIN=''
STDOUT=''
STDERR='!USAGE!'
EXITVAL='1'
07070100000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000B00000000TRAILER!!!114 blocks