Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
devel:languages:perl:CPAN-A
perl-Algorithm-TicketClusterer
perl-Algorithm-TicketClusterer.spec
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File perl-Algorithm-TicketClusterer.spec of Package perl-Algorithm-TicketClusterer
# # spec file for package perl-Algorithm-TicketClusterer # # Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed # upon. The license for this file, and modifications and additions to the # file, is the same license as for the pristine package itself (unless the # license for the pristine package is not an Open Source License, in which # case the license is the MIT License). An "Open Source License" is a # license that conforms to the Open Source Definition (Version 1.9) # published by the Open Source Initiative. # Please submit bugfixes or comments via http://bugs.opensuse.org/ # Name: perl-Algorithm-TicketClusterer Version: 1.01 Release: 0 %define cpan_name Algorithm-TicketClusterer Summary: Perl module for retrieving Excel-stored past License: GPL-1.0+ or Artistic-1.0 Group: Development/Libraries/Perl Url: http://search.cpan.org/dist/Algorithm-TicketClusterer/ Source0: http://www.cpan.org/authors/id/A/AV/AVIKAK/%{cpan_name}-%{version}.tar.gz BuildArch: noarch BuildRoot: %{_tmppath}/%{name}-%{version}-build BuildRequires: perl BuildRequires: perl-macros BuildRequires: perl(Spreadsheet::ParseExcel) >= 0.59 BuildRequires: perl(Spreadsheet::XLSX) >= 0.13 BuildRequires: perl(Text::Iconv) >= 1.7 BuildRequires: perl(WordNet::QueryData) >= 1.47 Requires: perl(Spreadsheet::ParseExcel) >= 0.59 Requires: perl(Spreadsheet::XLSX) >= 0.13 Requires: perl(Text::Iconv) >= 1.7 Requires: perl(WordNet::QueryData) >= 1.47 %{perl_requires} %description *Algorithm::TicketClusterer* is a _perl5_ module for retrieving previously processed Excel-stored tickets similar to a new ticket. Routing decisions made for the past similar tickets can be useful in expediting the routing of a new ticket. Tickets are commonly used in software services industry and customer support businesses to record requests for service, product complaints, user feedback, and so on. With regard to the routing of a ticket, you would want each new ticket to be handled by the tech support individual who is most qualified to address the issue raised in the ticket. Identifying the right individual for each new ticket in real-time is no easy task for organizations that man large service centers and helpdesks. So if it were possible to quickly identify the previously processed tickets that are most similar to a new ticket, one could think of constructing semi-automated (or, perhaps, even fully automated) ticket routers. Identifying old tickets similar to a new ticket is made challenging by the fact that folks who submit tickets often write them quickly and informally. The informal style of writing means that different people may use different colloquial terms to describe the same thing. And the quickness associated with their submission causes the tickets to frequently contain spelling and other errors such as conjoined words, fragmentation of long words, and so on. This module is an attempt at dealing with these challenges. The problem of different people using different words to describe the same thing is taken care of by using WordNet to add to each ticket a designated number of synonyms for each word in the ticket. The idea is that after all the tickets are expanded in this manner, they would become grounded in a common vocabulary. The synonym expansion of a ticket takes place only after the negated phrases (that is, the words preceded by 'no' or 'not') are replaced by their antonyms. Obviously, expanding a ticket by synonyms makes sense only after it is corrected for spelling and other errors. What sort of errors one looks for and corrects would, in general, depend on the application domain of the tickets. (It is not uncommon for engineering services to use jargon words and acronyms that look like spelling errors to those not familiar with the services.) The module expects to see a file that is supplied through the constructor parameter 'misspelled_words_file' that contains misspelled words in the first column and their corrected versions in the second column. An example of such a file is included in the 'examples' directory. You would need to create your own version of such a file for your application domain. Since conjuring up the misspellings that your ticket submitters are likely to throw at you is futile, you might consider using the following approach which I prefer to actually reading the tickets for such errors: Turn on the debugging options in the constructor for some initially collected spreadsheets and watch what sort of words the WordNet is not able to supply any synonyms for. In a large majority of cases, these would be the misspelled words. Expanding a ticket with synonyms is made complicated by the fact that some common words have such a large number of synonyms that they can overwhelm the relatively small number of words in a ticket. Adding too many synonyms in relation to the size of a ticket can not only distort the sense of the ticket but it can also increase the computational cost of processing all the tickets. In order to deal with the pros and the cons of using synonyms, the present module strikes a middle ground: You can specify how many synonyms to use for a word (assuming that the number of synonyms supplied by WordNet is larger than the number specified). This allows you to experiment with retrieval precision by altering the number of synonyms used. The retained synonyms are selected randomly from those supplied by WordNet. (A smarter way to select synonyms would be to base them on the context. For example, you would not want to use the synonym `programmer' for the noun `developer' if your application domain is real-estate. However, such context-dependent selection of synonyms would take us into the realm of ontologies that I have chosen to stay away from in this first version of the module.) Another issue related to the overall run-time performance of this module is the computational cost of the calls to WordNet through its Perl interface 'WordNet::QueryData'. This module uses what I have referred to as _synset caching_ to make this process as efficient as possible. The result of each WordNet lookup is cached in a database file whose name you supply through the constructor option 'synset_cache_db'. If you are doing a good job of catching spelling errors, the module will carry out a decreasing number of WordNet lookups as the tickets are scanned for expansion with synonyms. In an experiment with a spreadsheet that contained over 1400 real tickets, the last several hundred resulted in hardly any calls to WordNet. As currently programmed, the synset cache is deleted and then created afresh at every call to the function that extracts information from an Excel spreadsheet. You would want to change this behavior of the module if you are planning to use it in a production environment where the different spreadsheets are likely to deal with the same application domain. To give greater persistence to the synset cache, comment out the 'unlink $self-'{_synset_cache_db}> line in the method 'get_tickets_from_excel()'. After a few updates of the synset cache, the module would almost never need to make direct calls to WordNet, which would enhance the speed of the module even further. The textual content of the tickets, as produced by the preprocessing steps, is used for document modeling and the doc model thus created used subsequently for retrieving similar tickets. The doc modeling is carried out using the Vector Space Model (VSM) in which each ticket is represented by a vector whose size equals the size of the vocabulary used in all the tickets and whose elements represent the word frequencies in the ticket. After such a model is constructed, a query ticket is compared with the other tickets on the basis of the cosine similarity distance between the corresponding vectors. My decision to use the simplest of the text models --- the Vector Space Model --- was based of the work carried out by Shivani Rao at Purdue who has demonstrated that the simpler models are more effective at retrieval from software libraries than the more complex models. (See the paper by Shivani Rao and Avinash Kak at the MSR'11 Conference.) Although tickets, in general, are not the same as software libraries, I have a strong feeling that Shivani's conclusions would extend to other domains as well. Having said that, it is important to mention that there remains the possibility that automated ticket routing for some applications may respond better to more elaborate text models. The module uses three mechanisms to speed up the retrieval of tickets similar to a query ticket: (1) It uses the inverted index for all the words to construct for each query ticket a candidate pool of only those tickets in the database that have words in common with the query ticket; (2) Only those query-ticket words are used for retrieval whose inverse-document-frequency values exceed a user-specified threshold; and (3) The module uses stemming to reduce the variants of the same word to a common root in order to limit the size of the vocabulary. The stemming used in the current module is rudimentary. However, it would be easy to plug into the module more powerful stemmers through their Perl interfaces. Future versions of this module may do exactly that. %prep %setup -q -n %{cpan_name}-%{version} find . -type f ! -name \*.pl -print0 | xargs -0 chmod 644 %build %{__perl} Makefile.PL INSTALLDIRS=vendor %{__make} %{?_smp_mflags} %check %{__make} test %install %perl_make_install %perl_process_packlist %perl_gen_filelist %files -f %{name}.files %defattr(-,root,root,755) %doc examples README %changelog
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor