Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
devel:languages:perl:CPAN-S
perl-String-REPartition
perl-String-REPartition.spec
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File perl-String-REPartition.spec of Package perl-String-REPartition
# # spec file for package perl-String-REPartition # # Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed # upon. The license for this file, and modifications and additions to the # file, is the same license as for the pristine package itself (unless the # license for the pristine package is not an Open Source License, in which # case the license is the MIT License). An "Open Source License" is a # license that conforms to the Open Source Definition (Version 1.9) # published by the Open Source Initiative. # Please submit bugfixes or comments via https://bugs.opensuse.org/ # Name: perl-String-REPartition Version: 1.6 Release: 0 %define cpan_name String-REPartition Summary: Generates a regex to partition a data set License: CHECK(Artistic-1.0 OR GPL-1.0-or-later) Group: Development/Libraries/Perl Url: https://metacpan.org/release/%{cpan_name} Source0: https://cpan.metacpan.org/authors/id/A/AV/AVIF/%{cpan_name}-%{version}.tar.gz BuildArch: noarch BuildRoot: %{_tmppath}/%{name}-%{version}-build BuildRequires: perl BuildRequires: perl-macros BuildRequires: perl(Test::Pod) BuildRequires: perl(Test::Pod::Coverage) %{perl_requires} %description This module exports a single function -- make_partition_re. It takes as its first argument a number between 0 and 1, representing a percentage, and as its second argument a reference to a list of strings. It returns a regular expression which is guaranteed to match the percentage of the strings in the list represented by the number in the first argument. More importantly, the regex returned will *not* to match the rest of the string in the list. That is, if the inputs were '0.6' and a reference to a list of 100 strings, the returned regex would match 60 of the strings in the list and not match the other 40. Keep in mind that, since only integer operations may be performed on these strings, (that is, there cannot be a regex which matches a fraction of a string), the target number is rounded down. If you have 4 strings in your list and a ratio of 0.4, the resulting regex will match 1 string, not 1.6 strings. More interestingly, with 4 strings and a ratio of 0.1, the resulting regex will almost certainly be '/^()$/' -- matching exactly 0 of the strings in the list. Furthermore, because of this rounding, the returned regex may not match precisely the number expected by multiplying the size of the list by the ratio, but instead be off by a small number in either direction. c<make_partition_re()> will return c<undef> on a failure, and print a warning to STDERR if '$^W' is true. Currently, the only errors that can occur relate directly to the validity of the inputs. Furthermore, if the strings in the list are not unique, the behaviour of this function is not defined. For a small amout of repetition the regex should still work, but it should be clear that a solution cannot be found if the input list consists only of many copies of the same string. The function finds its solution in roughly O(N) time -- however, in worst cases, I think it can get as high as O(N^2). It's also true that certain types of pathologically constructed data sets can break things and cause it either to return an invalid regex or to enter an infinite loop. While I haven't run into any of this in my testing, I'm not confident that I've tested every possibility. So why would you want to use this? Well, that's a question you'll have to answer for yourself, mostly. :) However, the situations I envisaged while developing the module were sort of like this: Imagine you have a large set of data, indexed by a correspondingly large set of keywords. Let's say you want to split this data into two partitions, perhaps in order to store the data in two separate locations. Maintaining a complete list of the remote keys could be expensive -- instead, you can simply store a regular expression which matches the keys you keep remotely and does not match the local ones. Another interesting feature is that a regex generated from a sufficiently large subset of your data will, approximately, match the appropriate percentage of strings from the complete data set. This means that you do not need to have all of the data before you generate a regex to partition it. As an example, generating a regex from roughly 10% of the words in /usr/dict/words (selected randomly) gave me a regex that matched within .3% of the desired result for all of the words. %prep %setup -q -n %{cpan_name}-%{version} find . -type f ! -name \*.pl -print0 | xargs -0 chmod 644 %build PERL_USE_UNSAFE_INC=1 perl Makefile.PL INSTALLDIRS=vendor make %{?_smp_mflags} %check make test %install %perl_make_install %perl_process_packlist %perl_gen_filelist %files -f %{name}.files %defattr(-,root,root,755) %doc Changes README %changelog
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor