File perl-Algorithm-NeedlemanWunsch.spec of Package perl-Algorithm-NeedlemanWunsch

#
# spec file for package perl-Algorithm-NeedlemanWunsch
#
# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany.
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.

# Please submit bugfixes or comments via http://bugs.opensuse.org/
#


Name:           perl-Algorithm-NeedlemanWunsch
Version:        0.04
Release:        0
%define cpan_name Algorithm-NeedlemanWunsch
Summary:        Sequence Alignment with Configurable Scoring
License:        GPL-1.0+ or Artistic-1.0
Group:          Development/Libraries/Perl
Url:            http://search.cpan.org/dist/Algorithm-NeedlemanWunsch/
Source0:        http://www.cpan.org/authors/id/V/VB/VBAR/%{cpan_name}-%{version}.tar.gz
BuildArch:      noarch
BuildRoot:      %{_tmppath}/%{name}-%{version}-build
BuildRequires:  perl
BuildRequires:  perl-macros
%{perl_requires}

%description
Sequence alignment is a way to find commonalities in two (or more) similar
sequences or strings of some items or characters. Standard motivating
example is the comparison of DNA sequences and their functional and
evolutionary similarities and differences, but the problem has much wider
applicability - for example finding the longest common subsequence (that
is, 'diff') is a special case of sequence alignment.

Conceptually, sequence alignment works by scoring all possible alignments
and choosing the alignment with maximal score. For example, sequences 'a t
c t' and 't g a t' may be aligned

  sequence A: a t c - t
                | |   |
  sequence B: - t g a t

or

  sequence A: - - a t c t
                  | |
  sequence B: t g a t - -

(and exponentially many other ways, of course). Note that Needleman-Wunsch
considers _global_ alignments, over the entire length of both sequences;
each item is either aligned with an item of the other sequence, or
corresponds to a _gap_ (which is always aligned with an item - aligning two
gaps wouldn't help anything). This approach is especially suitable for
comparing sequences of comparable length and somewhat similar along their
whole lengths - that is, without long stretches that have nothing to do
with each other. If your sequences don't satisfy these requirements,
consider using local alignment, which, strictly speaking, isn't
Needleman-Wunsch, but is similar enough to be implemented in this module as
well - see below for details.

In the example above, the second alignment has more gaps than the first,
but perhaps your a's are structurally important and you like them lined up
so much that you'd still prefer the second alignment. Conversely, if c is
"almost the same" as g, it might be the first alignment that matches
better. Needleman-Wunsch formalizes such considerations into a _similarity
matrix_, assigning payoffs to each (ordered, but the matrix is normally
symmetrical so that the order doesn't matter) pair of possible sequence
items, plus a _gap penalty_, quantifying the desirability of a gap in a
sequence. A preference of pairings over gaps is expressed by a low
(relative to the similarity matrix values, normally negative) gap penalty.

The alignment score is then defined as the sum, over the positions where at
least one sequence has an item, of the similarity matrix values indexed by
the first and second item (when both are defined) and gap penalties (for
items aligned with a gap). For example, if 'S' is the similarity matrix and
'g' denotes the gap penalty, the alignment

  sequence A: a a t t c c

  sequence B: a - - - t c

has score 'S[a, a] + 3 * g + S[c, t] + S[c, c]'.

When the gap penalty is 0 and the similarity an identity matrix, i.e.
assigning 1 to every match and 0 to every mismatch, Needleman-Wunsch
reduces to finding the longest common subsequence.

The algorithm for maximizing the score is a standard application of dynamic
programming, computing the optimal alignment score of empty and 1-item
sequences and building it up until the whole input sequences are taken into
consideration. Once the optimal score is known, the algorithm traces back
to find the gap positions. Note that while the maximal score is obviously
unique, the alignment having it in general isn't; this module's interface
allows the calling application to choose between different optimal
alignments.

%prep
%setup -q -n %{cpan_name}-%{version}
find . -type f ! -name \*.pl -print0 | xargs -0 chmod 644

%build
%{__perl} Makefile.PL INSTALLDIRS=vendor
%{__make} %{?_smp_mflags}

%check
%{__make} test

%install
%perl_make_install
%perl_process_packlist
%perl_gen_filelist

%files -f %{name}.files
%defattr(-,root,root,755)
%doc Changes README

%changelog
openSUSE Build Service is sponsored by