(Grav GitSync) Automatic Commit from exu

This commit is contained in:
exu 2024-09-18 10:58:35 +02:00 committed by GitSync
parent cee18abc8c
commit 18fa208973
150 changed files with 65854 additions and 0 deletions

View File

@ -0,0 +1,15 @@
{
"root": true,
"extends": "defaults/configurations/airbnb/es6",
"rules": {
"no-empty-label": 0,
"space-after-keywords": "off",
"space-return-throw-case": "off",
"no-param-reassign": 0,
"indent": [2, 4, { "SwitchCase": 1 }],
"no-labels": 2,
"keyword-spacing": [2, {"before": true, "after": true}]
}
}

2
plugins/tntsearch/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
/node_modules
/.idea

View File

@ -0,0 +1,220 @@
# v3.4.0
## 03/06/2023
1. [](#improved)
* Updated TNTSearch library to `2.9.0`
* Enable Fuzy search [#123](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/123)
* Add configuration for Levenshtein distance for fuzzy search [#124](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/124)
* Added French translation [#100](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/100)
* Added missing stemmers [#115](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/115) [#116](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/116)
# v3.3.1
## 02/25/2021
1. [](#improved)
* Upgraded to TNTSearch version `2.6.0`
* Added German (de) language [#103](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/103)
1. [](#bugfix)
* Fixed `query` truncation when containing a hash (`#`) and preventing proper search results [#110](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/110)
* Fixed `q` query parameter not working [#111](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/111)
* Fix default stemmer and description [#105](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/105)
* Fixed PHP 8 compatibility issues
# v3.3.0
## 12/02/2020
1. [](#improved)
* Upgraded to TNTSearch version `2.5.0`
* Pass phpstan level 7 tests
1. [](#bugfix)
* Fixed FlexPages events for add+delete
* Fixed running scheduled index job [#104](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/104)
# v3.2.1
## 09/04/2020
1. [](#bugfix)
* Fixed bad `require("history")...` JS warning [#101](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/101)
# v3.2.0
## 06/08/2020
1. [](#new)
* Added support for CLI `bin/plugin index` to index only a single language (`--language=en`)
1. [](#improved)
* Renamed CLI classes to avoid class name conflicts
1. [](#bugfix)
* Fixed non-routable and non-published pages showing up in search results
* Fixed indexing in multi-language sites
* Use CLI command directly in scheduler command to work [#95](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/95)
# v3.1.1
## 02/12/2020
1. [](#improved)
* Search with JS disabled [#75](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/75)
* Added RU 🇷🇺 language [#74](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/74)
* Various JS dependency updates & recompiled production JS
1. [](#bugfix)
* Added missing `search_object_type` to blueprint
# v3.1.0
## 02/11/2020
1. [](#new)
* Require Grav v1.6.21
* Upgraded to TNTSearch version 2.2 (PHP 7.4 fixes)
1. [](#improved)
* Code cleanup
1. [](#bugfix)
* Fixed Grav initialization in CLI
* Work around inconsistencies in page content if page template uses `grav.page` instead of `page`
# v3.0.1
## 02/03/2020
1. [](#bugfix)
* Fixed an issue indexing via Admin with Grav 1.7
# v3.0.0
## 04/14/2019
1. [](#new)
* Added new Grav Scheduler integration
* Added new Multi-Language Support
1. [](#improved)
* Switched to latest TNTSearch version 2.0 (PHP 7.1+)
* Added a new `onFlexObjecSave()` event
* Simplified indexing logic
* Code cleanup
* Minor CSS improvements for search field
* Implemented a unified indexer process that always uses the CLI command for consistency
* Use Grav YAML handler
1. [](#bugfix)
* Use custom search object in query [#63](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/63)
* Fixed issue with Ajax results escaping
* Fixed issues when updating search index
* Set the db index file as a property of `GravTNTSearch` to allow for better overriding
* Put better type checking around `onTNTSearchIndex()` example that indexes `page.header.author`
# v2.0.4
## 09/21/2018
1. [](#new)
* Added new `tntsearch: index: true|false` page header option to skip specific pages
1. [](#bugfix)
* Skip indexing of pages with `redirect` set in page header [#21](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/21)
# v2.0.3
## 08/16/2018
1. [](#new)
* New option to allow disabling of page events, manual updates will be required to pick up changes
1. [](#bugfix)
* Don't remove the X button if `built_in_css` is `false`
# v2.0.2
## 07/20/2018
1. [](#bugfix)
* Ensure that credentials are passed in when searching via `fetch`
* Compressed JS for better performance
# v2.0.1
## 05/21/2018
1. [](#bugfix)
* Potential fix for history conflicts.
# v2.0.0
## 05/11/2018
1. [](#new)
* Refactored TNTSearch to allow core classes to be extensible by other plugins
* Added `phrases` search support [#32](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/32)
1. [](#improved)
* Defaulted TNTSearch to search **all pages** out of the box. This should be tweaked though
* Added auto-focus to search input [#28](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/28)
* Added option to control `powered by` [#34](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/34)
* Added a timer on CLI index command
* Exposing `GravTNTSearch` to the browser for JS manipulation
* Dispatching `tntsearch:start` and `tntsearch:done` events when starting/rendering results
* README.md typo fixes
1. [](#bugfix)
* Implemented options as default values that were being ignored
* Fixed missing `break` in foreach [#33](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/33)
* Add missing `use` statement [#41](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/41)
# v1.2.5
## 03/07/2018
1. [](#improved)
* Only update the a page on save if it exists in the current filter and is therefore eligible to be indexed\
* Removed Admin dependency, it works fine without admin too, just need to use CLI
# v1.2.4
## 02/14/2018
1. [](#bugfix)
* Fix issue with admin saving 'string' for filter [#25](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/25)
# v1.2.3
## 02/14/2018
1. [](#bugfix)
* Missing comma in Admin JS breaking quick-tray reindexing
# v1.2.2
## 02/09/2018
1. [](#improved)
* Updated TNTSearch to use version `1.3.1` of TNTSearch library for PHP 7.2 compatibility [#24](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/24)
1. [](#bugfix)
* Fixed URI `hash` getting unintentionally removed by TNTSearch [#15](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/15)
* Fixed issue with param separator needed for Windows [#16](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/16)
* Fixed placeholder format in blueprint [#18](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/18)
# v1.2.1
## 01/16/2018
1. [](#new)
* Added `onTNTSearchReIndex()` that you can fire from any plugin to reindex everything
1. [](#bugfix)
* Fixed an XSS exploit in query
# v1.2.0
## 10/29/2017
1. [](#new)
* Reworked JS to VanillaJS [#12](https://github.com/trilbymedia/grav-plugin-tntsearch/pull/12)
* Implemented live URI / history refresh when typing in the field
* Added new 'auto' setting for search_type that automatically detects 'basic' or 'boolean'.
* It is now possible to force a search_type mode whether it's `basic` or `boolean`
* Updated to TNTSearch Library to v1.1.0
1. [](#improved)
* Allow the ability to pass a `placeholder` to the `partials/tntsearch.html.twig` template
* Moved 'fuzzy' option as independent option
1. [](#bugfix)
* Fixed JS issue when at login page
* Fixed results showing on load for drop-downs, instead of in_page only view [#10](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/10)
# v1.1.0
## 08/22/2017
1. [](#new)
* Extensible output JSON support via new `onTTNTSearchQuery()` event.
* Added a 'powered-by' link that can be disabled via configuration
* Improved docs by including instructions on how to use CLI to index.
# v1.0.1
## 08/22/2017
1. [](#new)
* Changed cartoon bomb icon with more friendly version (binoculars) [#4](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/4)
* Added the ability to disable CSS and JS independently [#3](https://github.com/trilbymedia/grav-plugin-tntsearch/issues/3)
# v1.0.0
## 08/16/2017
1. [](#new)
* Initial release...

21
plugins/tntsearch/LICENSE Normal file
View File

@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2017 Trilby Media, LLC
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

373
plugins/tntsearch/README.md Normal file
View File

@ -0,0 +1,373 @@
# TNTSearch Plugin
The **TNTSearch** Plugin is for [Grav CMS](http://github.com/getgrav/grav). Powerful indexed-based full text search engine powered by the [TNTSearch library](https://github.com/teamtnt/tntsearch) that provides fast Ajax-based Grav content searches. This plugin is highly flexible allowing indexes of arbitrary content data as well as custom Twig templates to provide the opportunity to index modular and other dynamic page types. TNTSearch provides CLI as well as Admin based administration and re-indexing, as well as a built-in Ajax-powered front-end search tool.
> NOTE: TNTSearch version 3.0.0 now requires Grav 1.6.0 or newer to function as it makes use of new functionality not available in previous versions.
![](assets/tntsearch-ajax.gif)
## Installation
Installing the Tnt Search plugin can be done in one of two ways. The GPM (Grav Package Manager) installation method enables you to quickly and easily install the plugin with a simple terminal command, while the manual method enables you to do so via a zip file.
### GPM Installation (Preferred)
The simplest way to install this plugin is via the [Grav Package Manager (GPM)](http://learn.getgrav.org/advanced/grav-gpm) through your system's terminal (also called the command line). From the root of your Grav install type:
bin/gpm install tntsearch
This will install the Tnt Search plugin into your `/user/plugins` directory within Grav. Its files can be found under `/your/site/grav/user/plugins/tntsearch`.
## Requirements
Other than standard Grav requirements, this plugin does have some extra requirements. Due to the complex nature of a search engine, TNTSearch utilizes a flat-file database to store its wordlist as well as the mapping for content. This is handled automatically by the plugin, but you do need to ensure you have the following installed on your server:
* **SQLite3** Database
* **PHP pdo** Extension
* **PHP pdo_sqlite** Driver
* **PHP pdo_mysql** Driver (only required because library references some MySQL constants, MySQL db is not used)
| PHP by default comes with **PDO** and the vast majority of linux-based systems already come with SQLite.
### Installation of SQLite on Mac systems
SQLite actually comes pre-installed on your Mac, but you can upgrade it to the latest version with Homebrew:
Install [Homebrew](https://brew.sh/)
```shell
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```
Install SQLite with Homebrew
```shell
$ brew install sqlite
```
### Installation of SQLite on Windows systems
Download the appropriate version of SQLite from the [SQLite Downloads Page](https://www.sqlite.org/download.html).
Extract the downloaded ZIP file and run the `sqlite3.exe` executable.
## Configuration
Before configuring this plugin, you should copy the `user/plugins/tntsearch/tntsearch.yaml` to `user/config/plugins/tntsearch.yaml` and only edit that copy.
Here is the default configuration and an explanation of available options:
```yaml
enabled: true
search_route: '/search'
query_route: '/s'
built_in_css: true
built_in_js: true
built_in_search_page: true
enable_admin_page_events: true
search_type: auto
fuzzy: false
distance: 2
phrases: true
stemmer: 'no'
display_route: true
display_hits: true
display_time: true
live_uri_update: true
limit: 20
min: 3
snippet: 300
index_page_by_default: true
scheduled_index:
enabled: false
at: '0 */3 * * *'
logs: 'logs/tntsearch-index.out'
filter:
items:
- root@.descendants
powered_by: true
search_object_type: Grav
```
The configuration options are as follows:
* `enabled` - enable or disable the plugin instantly
* `search_route` - the route used for the built-in search page
* `query_route` - the route used by the search form to query the search engine
* `built_in_css` - enable or disable the built-in css styling
* `built_in_js` - enable or disable the built-in javascript
* `built_in_search_page` - enable or disable the built-in search page
* `enable_admin_page_events` - enable or disable the page events which occur `on-save` to add/update/remove page in index
* `search_type` - can be one of these types:
* `basic` - standard string matching
* `boolean` - supports `or` or `minus`. e.g. `foo -bar`
* `auto` - automatically detects whether to use `basic` or `boolean`
* `fuzzy` - matches if the words are 'close' but not necessarily exact matches
* `distance` - Levenshtein distance of fuzzy search. It represents the amount of characters which need to be changed, removed, or added in a word in order it to match the search keyword. Increasing the distance produces more search results but decreases the accuracy of the search.
* `phrases` - automatically handle phrases support
* `stemmer` - can be one of these types:
* `no` - no stemmer
* `arabic` - Arabic language
* `croatian` - Croatian language
* `german` - German language
* `italian` - Italian language
* `porter` - Porter stemmer for English language
* `portuguese` - Portuguese language
* `russian` - Russian language
* `ukrainian` - Ukrainian language
* `display_route` - display the route in the search results
* `display_hits` - display the number of hits in the search results
* `display_time` - display the execution time in the search results
* `live_uri_update` - when `built_in_js` is enabled, live updates the URI bar in the `search_route` page
* `limit` - maximum amount of results to be shown
* `min` - mininum amount of characters typed before performing search
* `snippet` - amount of characters for previewing a result item
* `index_page_by_default` - should all pages be indexed by default unless frontmatter overrides
* `scheduled_index` - New scheduled index job. Disabled by default, when enabled defaulted to run every 3 hours, and output results to `logs/tntsearch-index.out`
* `filter` - a [Page Collections filter](https://learn.getgrav.org/content/collections#summary-of-collection-options) that lets you pick specific pages to index via a collection query
* `powered_by` - Display the **powered-by TNTSearch** text
* `search_object_type` - Allows custom classes to override the default **Grav Page** support. This allows completely custom searching capabilities for any data type.
## Usage
TNTSearch relies on your content being indexed into the SQLite index database before any search queries can be made. This is very similar to other search engines such as ElasticSearch, Solr, Lucene, etc, but it uses a relatively simply PHP search engine library [TNTSearch library](https://github.com/teamtnt/tntsearch) to achieve this with little setup and no hassles.
### Indexing
The first step after installation of the plugin, is to index your content. There are several ways you can accomplish this.
#### CLI Indexing
First if you are able to access the CLI or just choose not to use the admin plugin, you can use the built-in CLI command:
```shell
$ bin/plugin tntsearch index
```
This will scan all your pages and index the content. You should see some output like this:
```shell
Re-indexing Search
Added 1 /
Added 2 /blog/classic-modern-architecture
Added 3 /blog/daring-fireball-link
Added 4 /blog/focus-and-blur
Added 5 /blog/just-some-text-today
Added 6 /blog/london-industry
Added 7 /blog/random-thoughts
Added 8 /blog/sunshine-in-the-hills
Added 9 /blog/the-urban-jungle
Total rows 9
Done.
```
This indicates a successful indexing of your content.
#### Admin Plugin Indexing
If you are using the admin plugin you can index your content directly from the plugin. TNTSearch adds a new **quick-tray** icon that lets you create a new index or re-index all your content quickly and conveniently with a single click.
![](assets/tntsearch-quicktray.png)
Alternatively you can navigate to the TNTSearch configuration section and click the `Index Content` button:
![](assets/tntsearch-config.png)
#### Skipping Indexing
> NOTE: That any page that uses a `redirect` page header attribute will be skipped during indexing.
You can explicitly skip a page that is in the index filter by adding this YAML to the page header:
```
tntsearch:
index: false
```
#### Multi-Language Support
With the new 3.0 version of TNTSearch, support has been added for multiple languages (Grav 1.6 required). Internally, this means that rather that store the index as `user:://data/tntsearch/grav.index`, multiple indexes are created per language configured in Grav. For example if you have set the supported languages to `['en', 'fr', 'de']`, then when you perform an index, you will get three files: `en.index`, `fr.index`, and `de.index`. When querying the appropriate **active language** determines which index is queried. For example, performing the search on a page called `/fr/search` will result in the `fr.index` database to be used, and French results to be returned.
Note Indexing will take longer depending on the number of languages you support as TNTSearch has to index each page in each language.
> NOTE: While accented characters is supported in this release, there is currently no support in the underlying TNTSearch library to match non-accented characters to accented ones, so exact matches are required.
#### Scheduler Support
One of the great new features of Grav 1.6 is the built in **Scheduler** that allows plugin-provided functionality to be run periodically. TNTSearch is a great use-case for this capability as it allows an indexing job to be scheduled to be run every few hours without the need to manually keep things in sync. There are a few options that allow you to configure this capability.
First note, that this scheduler functionality is disable by default, so you first have to enable the scheduler functionality in the TNTSearch plugin settings. After that you can configure how often the indexing job should run. The default is every 3 hours. Lastly, you can configure where any indexing output is logged to.
#### Admin Page CrUD Events
Once you have an index, TNTSearch makes use of admin events to **C**reate, **U**pdate, and **D**elete index entries when you edit pages. If your index ever looks like it's out of sync, you can simply reindex your whole site.
#### Customizing the Search Index
##### Adding Custom Fields
By default the TNTSearch plugin will index the `title` and `content` of your page. This usually suffices for most cases, but there are situations where you might want to index more fields. The plugin provides an example of this by listening to the `onTNTSearchIndex` event:
```php
public static function getSubscribedEvents()
{
return [
'onTNTSearchIndex' => ['onTNTSearchIndex', 0]
];
}
public function onTNTSearchIndex(Event $e)
{
$fields = $e['fields'];
$page = $e['page'];
if (isset($page->header()->author)) {
$fields->author = $page->header()->author;
}
}
```
This allows you to add an author to the indexed fields if it is set in the page frontmatter. You can add your own custom fields with a very simple plugin that listens to this event.
##### Providing Custom Render Templates
The TNTSearch plugin generally uses the rendered content to index with. However, there are situations where your page is actually a modular page, or built from other pages where there is no actual content on the page, or the content is not representative of the page as a whole. To combat this situation you can provide custom templates in your theme that TNTSearch can use to render the content before indexing.
For example, say we have a homepage that is built from a few modular sub-pages with a little content below it, it's called `home.md`, so uses a `home.html.twig` file in your theme's `templates/` folder. You can create a simplified version of this template and save it as `templates/tntsearch/home.html.twig`. For this example this template looks like this:
```twig
{% for module in page.collection() %}
<p>
{{ module.content|raw }}
</p>
{% endfor %}
{{ page.content|raw }}
```
As you can see this simply ensures the module pages as defined in the page's collection are displayed, then the actual page content is displayed.
To instruct TNTSearch to index with this template rather than just using the Page content by itself, you just need to add an entry in the `home.md` frontmatter:
```yaml
tntsearch:
template: 'tntsearch/home'
```
### Searching
TNTSearch plugin for Grav comes with a built-in query page that is accessible via the `/search` route by default. This search page is a simple input field that will perform an Ajax query **as-you-type**. Because TNTSearch is so fast, you get a real-time search response in a similar fashion to a Google search. Also the results are returned already highlighted for matching terms.
You can also test searching with the CLI:
```json
$ bin/plugin tntsearch query ipsum
{
"number_of_hits": 3,
"execution_time": "2.101 ms",
"hits": [
{
"link": "\/blog\/classic-modern-architecture",
"title": "Classic Modern Architecture",
"content": "...sed a odio. Curabitur ut lectus tortor. Sed <em>ipsum<\/em> eros, egestas ut eleifend non, elementum vitae eros. Mauris felis diam, pellentesque vel lacinia ac, dictum a nunc.\nLorem <em>ipsum<\/em> dolor sit amet, consectetur adipiscing elit. Donec ultricies tristique nulla et mattis. Phasellus id massa eget..."
},
{
"link": "\/blog\/focus-and-blur",
"title": "Focus and Blur",
"content": "...sed a odio. Curabitur ut lectus tortor. Sed <em>ipsum<\/em> eros, egestas ut eleifend non, elementum vitae eros. Mauris felis diam, pellentesque vel lacinia ac, dictum a nunc.\nLorem <em>ipsum<\/em> dolor sit amet, consectetur adipiscing elit. Donec ultricies tristique nulla et mattis. Phasellus id massa eget..."
},
{
"link": "\/blog\/london-industry",
"title": "London Industry at Night",
"content": "...sed a odio. Curabitur ut lectus tortor. Sed <em>ipsum<\/em> eros, egestas ut eleifend non, elementum vitae eros. Mauris felis diam, pellentesque vel lacinia ac, dictum a nunc.\nLorem <em>ipsum<\/em> dolor sit amet, consectetur adipiscing elit. Donec ultricies tristique nulla et mattis. Phasellus id massa eget..."
}
]
}
```
### Customizing the Search Page
If a physical Grav page is found for the `/search` route, TNTSearch will use that rather than the page provided by the plugin. This allows you to easily add content to your search page as you need.
If you wish to customize the actual HTML output, simply copy the `templates/search.html.twig` from the plugin to your theme and customize it.
The actual input field can also be modified as needed by copy and editing the `templates/partials/tntsearch.html.twig` file to your theme and modify it.
### Customizing Query Data
By default the TNTSearch plugin for Grav, the response JSON is sent with the following structure:
```json
{
"number_of_hits": 3,
"execution_time": "1.000 ms",
"hits": [
{
"link": "/page-a",
"title": "Title A",
"content": "highlighted-summary"
},
{
"link": "/page-b",
"title": "Title B",
"content": "highlighted-summary"
},
{
"link": "/page-c",
"title": "Title C",
"content": "highlighted-summary"
}
]
}
```
There are instances where this output is not desirable or needs to be changed. TNTSearch actually provides a plugin event to allow you to manipulate this format. An example of this can be seen below:
```php
public static function getSubscribedEvents() {
return [
'onTNTSearchQuery' => ['onTNTSearchQuery', 1000],
];
}
public function onTNTSearchQuery(Event $e)
{
$query = $this->grav['uri']->param('q');
if ($query) {
$page = $e['page'];
$query = $e['query'];
$options = $e['options'];
$fields = $e['fields'];
$fields->results[] = $page->route();
$e->stopPropagation();
}
}
```
The important things to note are the `1000` order-value to ensure this event runs before the default event in the `tntsearch.php` plugin file. The actual event method simply sets a result array on fields to with a route, resulting in:
```json
{
"number_of_hits": 3,
"execution_time": "1.000 ms",
"results": ['/page-a', '/page-b', '/page-c']
}
```
### Dropdown Search Field
TNTSearch plugin can also be used to render the search as a drop-down rather than in a standard page. To do this you need to `embed` the search partial and override it to fit your needs. You could simply add this to your theme wherever you want to have an Ajax drop-down search box:
```twig
{% embed 'partials/tntsearch.html.twig' with { limit: 10, snippet: 150, min: 3, search_type: 'auto', dropdown: true } %}{% endembed %}
```
Here we embed the default partial, but override the `options` by passing them in the `with` statement. It is important to notice that the `dropdown: true` is required to be set in order to be interpreted as dropdown.
## Credits
This plugin would not of been possible without the amazing [TNTSearch library](https://github.com/teamtnt/tntsearch) for PHP. Make sure you **star** that project on GitHub.

View File

@ -0,0 +1,3 @@
import { createBrowserHistory } from 'history';
const history = createBrowserHistory();
export default history;

View File

@ -0,0 +1,61 @@
// polyfills
import 'babel-polyfill';
import domready from 'domready';
import search from './search';
const GravTNTSearch = () => {
/* const uri = new URI(global.location.href, true);
history.replace({
search: global.location.search,
hash: global.location.hash,
state: {
historyValue: uri.query.q || '',
type: 'tntsearch',
},
});*/
const searchForms = document.querySelectorAll('form.tntsearch-form');
[...searchForms].forEach((form) => {
const input = form.querySelector('.tntsearch-field');
const clear = form.querySelector('.tntsearch-clear');
const results = form.querySelector('.tntsearch-results');
if (!input || !results) { return false; }
form.addEventListener('submit', (event) => event.preventDefault());
input.addEventListener('focus', () => search(input, results));
input.addEventListener('input', () => {
if (clear) {
clear.style.display = '';
}
search.cancel();
search({ input, results });
});
if (clear) {
clear.addEventListener('click', () => {
if (clear) {
clear.style.display = 'none';
}
input.value = '';
search.cancel();
search({ input, results });
});
}
return this;
});
document.addEventListener('click', (event) => {
[...searchForms].forEach((form) => {
if (!form.querySelector('.tntsearch-dropdown')) { return; }
if (!form.contains(event.target)) {
form.querySelector('.tntsearch-results').style.display = 'none';
}
});
});
};
domready(GravTNTSearch);
window.GravTNTSearch = GravTNTSearch;

View File

@ -0,0 +1,107 @@
import throttle from 'lodash/throttle';
import URI from 'url-parse';
import qs from 'querystringify';
import history from './history';
export const DEFAULTS = {
uri: '',
limit: 20,
snippet: 300,
min: 3,
search_type: 'auto',
in_page: false,
live_update: true,
};
const historyPush = ({ value = false, params = false } = {}) => {
const uri = new URI(global.location.href, true);
if (params === false) {
delete uri.query.q;
} else {
uri.query.q = params;
}
const querystring = qs.stringify(uri.query, '?');
history.push(`${uri.pathname}${querystring}`, {
historyValue: value, type: 'tntsearch',
});
};
const throttling = throttle(async ({ input, results, historyValue = false } = {}) => {
if (!input || !results) { return false; }
const value = historyValue || input.value.trim();
const clear = input.nextElementSibling;
const data = Object.assign({}, DEFAULTS, JSON.parse(input.dataset.tntsearch || '{}'));
if (!value) {
results.style.display = 'none';
if (data.in_page) {
clear.style.display = 'none';
if (historyValue === false && data.live_update) {
historyPush({ value });
}
}
return false;
}
if (value.length < data.min) {
return false;
}
if (data.in_page) {
clear.style.display = '';
}
const params = {
q: encodeURIComponent(value),
l: data.limit,
sl: data.snippet,
search_type: data.search_type,
ajax: true,
};
const startEvent = new Event('tntsearch:start');
const query = Object.keys(params)
.map(k => `${k}=${params[k]}`)
.join('&');
input.dispatchEvent(startEvent);
fetch(`${data.uri}?${query}`, { credentials: 'same-origin' })
.then((response) => response.text())
.then((response) => {
if (data.in_page && data.live_update && !historyValue) {
historyPush({ value, params: params.q });
}
return response;
})
.then((response) => {
const doneEvent = new Event('tntsearch:done');
results.style.display = '';
results.innerHTML = response;
input.dispatchEvent(doneEvent);
return response;
});
return this;
}, 350, { leading: false });
history.listen((location) => {
if (location.state && location.state.type === 'tntsearch') {
location.state.input = document.querySelector('.tntsearch-field-inpage');
location.state.results = document.querySelector('.tntsearch-results-inpage');
if (location.state.input && location.state.results) {
location.state.input.value = location.state.historyValue;
throttling({ ...location.state });
}
}
});
export default throttling;

View File

@ -0,0 +1,34 @@
.index-status {
border: 1px solid transparent;
}
.index-status span {
padding: 0.3rem 0.7rem;
border-radius: 4px;
line-height: 1.7;
vertical-align: middle;
display: inline-block;
}
.index-status .error {
background: #ddd;
color: #c00;
}
.index-status .success {
border: 1px solid #ddd;
color: #999;
}
#admin-main .admin-block .index-status .button.critical {
background: #c00;
color: #fff;
}
#admin-main .admin-block .index-status .button.reindex {
background: #0079BA;
color: #fff;
}
.tntsearch-error-details {
padding: .2rem .5rem;
margin: 1rem 0;
border-radius: 3px;
display: none;
}

View File

@ -0,0 +1,78 @@
((function($) {
$(document).ready(function() {
var Request, Toastr = null;
if (typeof Grav !== 'undefined' && Grav && Grav.default && Grav.default.Utils) {
Request = Grav.default.Utils.request;
Toastr = Grav.default.Utils.toastr;
}
var indexer = $('#tntsearch-index, #admin-nav-quick-tray .tntsearch-reindex'),
current = null, currentTray = null;
if (!indexer.length) { return; }
indexer.on('click', function(e) {
e.preventDefault();
var target = $(e.target),
isTray = target.closest('#admin-nav-quick-tray').length,
status = indexer.siblings('.tntsearch-status'),
errorDetails = indexer.siblings('.tntsearch-error-details');
current = status.clone(true);
console.log(isTray);
if (isTray) {
target = target.is('i') ? target.parent() : target;
currentTray = target.find('i').attr('class');
target.find('i').attr('class', 'fa fa-fw fa-circle-o-notch fa-spin');
}
errorDetails
.hide()
.empty();
status
.removeClass('error success')
.empty()
.html('<i class="fa fa-circle-o-notch fa-spin" />');
$.ajax({
type: 'POST',
url: GravAdmin.config.base_url_relative + '.json/task' + GravAdmin.config.param_sep + 'reindexTNTSearch',
data: { 'admin-nonce': GravAdmin.config.admin_nonce }
}).done(function(done) {
if (done.status === 'success') {
indexer.removeClass('critical').addClass('reindex');
status.removeClass('error').addClass('success');
Toastr.success(done.message);
} else {
indexer.removeClass('reindex').addClass('critical');
status.removeClass('success').addClass('error');
var error = done.message;
if (done.details) {
error += '<br />' + done.details;
errorDetails
.text(done.details)
.show();
status.replaceWith(current);
}
Toastr.error(error);
}
status.html(done.message);
}).fail(function(error) {
if (error.responseJSON && error.responseJSON.error) {
indexer.removeClass('reindex').addClass('critical');
errorDetails
.text(error.responseJSON.error.message)
.show();
status.replaceWith(current);
}
}).always(function() {
target.find('i').attr('class', currentTray);
current = null;
currentTray = null;
});
})
});
})(jQuery));

Binary file not shown.

After

Width:  |  Height:  |  Size: 755 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

View File

@ -0,0 +1,71 @@
.tntsearch-form .form-input {
height: 2.4rem;
padding-left: 1rem;
}
#tntsearch-wrapper {
position: relative;
}
.tntsearch-clear {
border-radius: 100%;
padding: 0 1rem;
line-height: 1;
position: absolute;
right: 0;
font-size: 2rem;
top: 0;
cursor: pointer;
}
.tntsearch-field-inpage {
width: 100%;
}
.tntsearch-dropdown ~ .tntsearch-results {
position: relative;
margin-bottom: 0;
}
.tntsearch-dropdown ~ .tntsearch-results .row {
position: absolute;
top: 0;
right: 0;
width: 400px;
background: #fff;
box-shadow: 0 2px 20px rgba(0,0,0, 0.1);
padding: 10px;
z-index: 2;
}
.tntsearch-results .info {
color: #999;
font-size: 90%;
}
.tntsearch-results .title {
margin-bottom: 10px;
}
.tntsearch-results .route {
margin-top: 0;
margin-bottom: 10px;
}
.tntsearch-results .row > p {
margin-top: 0;
}
.tntsearch-results em {
font-style: normal;
background-color: #ffff33;
}
.tntsearch-powered-by {
text-align: center;
font-size: 14px;
}
.tntsearch-dropdown ~ .tntsearch-powered-by {
display: none;
}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,297 @@
name: TNT Search
type: plugin
slug: tntsearch
version: 3.4.0
testing: false
description: Powerful indexed-based full text search engine powered by TNTSearch
icon: binoculars
author:
name: Trilby Media, LLC
email: devs@trilby.media
homepage: https://github.com/trilbymedia/grav-plugin-tntsearch
keywords: grav, plugin, search, search-engine
bugs: https://github.com/trilbymedia/grav-plugin-tntsearch/issues
docs: https://github.com/trilbymedia/grav-plugin-tntsearch/blob/develop/README.md
license: MIT
dependencies:
- { name: grav, version: '>=1.6.21' }
form:
validation: strict
fields:
enabled:
type: toggle
label: Plugin status
highlight: 1
default: 0
options:
1: Enabled
0: Disabled
validate:
type: bool
index_title:
type: spacer
title: Indexer Settings
index_status:
type: indexstatus
label: Search Index Status
enable_admin_page_events:
type: toggle
label: Enable Admin Page Events
help: Disable this if you are having problems with timeouts during page saving
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
scheduled_index.enabled:
type: toggle
label: Enable Index Scheduled Job
help: Use the Grav Scheduler to kick off a background index job
highlight: 0
default: 0
options:
1: Enabled
0: Disabled
validate:
type: bool
scheduled_index.at:
type: cron
label: Scheduled Job Frequency
size: medium
help: Use 'cron' format
default: '0 */3 * * *'
placeholder: '0 */3 * * *'
scheduled_index.logs:
type: text
label: Scheduled Job Log File
placeholder: 'logs/tntsearch-index.out'
size: medium
ui_title:
type: spacer
title: UI Settings
built_in_css:
type: toggle
label: Built-in CSS
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
built_in_js:
type: toggle
label: Built-in Javascript
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
search_title:
type: spacer
title: Search Settings
built_in_search_page:
type: toggle
label: Built-in Search Page
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
search_route:
type: text
size: medium
label: Search Page Route
help: The route for the built-in search page, leave empty if you wish to not have a dedicated search page.
query_route:
type: text
size: medium
label: Query Route
help: The route used to retrieve search results.
search_type:
type: select
size: small
classes: fancy
label: Search Type
help: Configure how TNTSearch will use the search query term
default: auto
options:
auto: Auto
basic: Basic
boolean: Boolean
fuzzy:
type: toggle
label: Fuzzy Search
highlight: 1
default: 0
options:
1: Enabled
0: Disabled
validate:
type: bool
distance:
type: number
size: x-small
label: Levenshtein distance of fuzzy search
help: It represents the amount of characters which need to be changed, removed, or added in a word in order it to match the search keyword. Increasing the distance produces more search results but decreases the accuracy of the search.
default: 2
phrases:
type: toggle
label: Match quoted phrases
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
stemmer:
type: select
size: small
classes: fancy
label: Stemmer
help: An automated process which produces a base string in an attempt to represent related words. If your content is not in the language listed, for best search results it is recommended to disable the stemmer.
default: no
options:
no: Disabled
arabic: Arabic
croatian: Croatian
porter: English
german: German
italian: Italian
portuguese: Portuguese
russian: Russian
ukrainian: Ukrainian
display_route:
type: toggle
label: Display Route
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
live_uri_update:
type: toggle
label: Live URI Update
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
display_hits:
type: toggle
label: Display Hits
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
display_time:
type: toggle
label: Display Time
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
limit:
type: text
label: Results Limit
default: 20
min:
type: text
label: Min Chars Before Search
default: 3
snippet:
type: text
label: Results Text Limit
default: 300
index_page_by_default:
type: toggle
label: Index Every Page
help: 'Index every page by default unless a page specifically declares `tntsearch: process: false`. Disabling this requires a `process: true` declartion to be added to each page that should be indexed.'
highlight: 1
default: 1
options:
1: Enabled
0: Disabled
validate:
type: bool
filter.items:
type: textarea
size: large
rows: 4
label: Search Filter
help: Use a standard collections based filter definition to restrict search to only these pages
yaml: true
placeholder: 'taxonomy@: { category: [news] }'
validate:
type: yaml
powered_by:
type: toggle
label: Powered By
highlight: 1
default: 0
options:
1: Enabled
0: Disabled
validate:
type: bool
adv_title:
type: spacer
title: Advanced Settings
search_object_type:
type: text
label: Search Object Type
help: Allows for overriding the deafult search type to a custom type provided by a plugin.
default: Grav

View File

@ -0,0 +1,87 @@
<?php
namespace Grav\Plugin\TNTSearch;
use Grav\Common\Config\Config;
use Grav\Common\Grav;
use Grav\Common\Yaml;
use Grav\Common\Page\Page;
use Grav\Plugin\TNTSearchPlugin;
use PDO;
class GravConnector extends PDO
{
public function __construct()
{
}
/**
* @param int $attribute
* @return bool
*/
public function getAttribute($attribute): bool
{
return false;
}
/***
* @param string $statement
* @param int|null $fetch_style
* @param mixed ...$extra
* @return GravResultObject
*/
public function query(string $statement, ?int $fetch_style = null, ...$extra): GravResultObject
{
$counter = 0;
$results = [];
/** @var Config $config */
$config = Grav::instance()['config'];
$filter = $config->get('plugins.tntsearch.filter');
$default_process = $config->get('plugins.tntsearch.index_page_by_default');
$gtnt = TNTSearchPlugin::getSearchObjectType();
if ($filter && array_key_exists('items', $filter)) {
if (is_string($filter['items'])) {
$filter['items'] = Yaml::parse($filter['items']);
}
$page = new Page;
$collection = $page->collection($filter, false);
} else {
$collection = Grav::instance()['pages']->all();
$collection->published()->routable();
}
foreach ($collection as $page) {
$counter++;
$process = $default_process;
$header = $page->header();
$url = $page->url();
if (isset($header->tntsearch['process'])) {
$process = $header->tntsearch['process'];
}
// Only process what's configured
if (!$process) {
echo("Skipped {$counter} {$url}\n");
continue;
}
try {
$fields = $gtnt->indexPageData($page);
$results[] = (array) $fields;
echo("Added {$counter} {$url}\n");
} catch (\Exception $e) {
echo("Skipped {$counter} {$url} - {$e->getMessage()}\n");
continue;
}
}
return new GravResultObject($results);
}
}

View File

@ -0,0 +1,29 @@
<?php
namespace Grav\Plugin\TNTSearch;
class GravResultObject
{
/** @var array */
protected $items;
/** @var int */
protected $counter;
/**
* GravResultObject constructor.
* @param array $items
*/
public function __construct($items)
{
$this->counter = 0;
$this->items = $items;
}
/**
* @param array $options
* @return array
*/
public function fetch($options)
{
return $this->items[$this->counter++];
}
}

View File

@ -0,0 +1,350 @@
<?php
namespace Grav\Plugin\TNTSearch;
use Grav\Common\Config\Config;
use Grav\Common\Grav;
use Grav\Common\Language\Language;
use Grav\Common\Page\Interfaces\PageInterface;
use Grav\Common\Page\Pages;
use Grav\Common\Twig\Twig;
use Grav\Common\Uri;
use Grav\Common\Yaml;
use Grav\Common\Page\Collection;
use Grav\Common\Page\Page;
use RocketTheme\Toolbox\Event\Event;
use RocketTheme\Toolbox\ResourceLocator\UniformResourceLocator;
use TeamTNT\TNTSearch\Exceptions\IndexNotFoundException;
use TeamTNT\TNTSearch\TNTSearch;
class GravTNTSearch
{
/** @var TNTSearch */
public $tnt;
/** @var array */
protected $options;
/** @var string[] */
protected $bool_characters = ['-', '(', ')', 'or'];
/** @var string */
protected $index = 'grav.index';
/** @var false|string */
protected $language;
/**
* GravTNTSearch constructor.
* @param array $options
*/
public function __construct($options = [])
{
/** @var Config $config */
$config = Grav::instance()['config'];
/** @var UniformResourceLocator $locator */
$locator = Grav::instance()['locator'];
$search_type = $config->get('plugins.tntsearch.search_type', 'auto');
$fuzzy = $config->get('plugins.tntsearch.fuzzy', false);
$distance = $config->get('plugins.tntsearch.distance', 2);
$stemmer = $config->get('plugins.tntsearch.stemmer', 'no');
$limit = $config->get('plugins.tntsearch.limit', 20);
$snippet = $config->get('plugins.tntsearch.snippet', 300);
$data_path = $locator->findResource('user://data', true) . '/tntsearch';
/** @var Language $language */
$language = Grav::instance()['language'];
if ($language->enabled()) {
$active = $language->getActive();
$default = $language->getDefault();
$this->language = $active ?: $default;
$this->index = $this->language . '.index';
}
if (!file_exists($data_path)) {
mkdir($data_path);
}
$defaults = [
'json' => false,
'search_type' => $search_type,
'fuzzy' => $fuzzy,
'distance' => $distance,
'stemmer' => $stemmer,
'limit' => $limit,
'as_you_type' => true,
'snippet' => $snippet,
'phrases' => true,
];
$this->options = array_replace($defaults, $options);
$this->tnt = new TNTSearch();
$this->tnt->loadConfig(
[
'storage' => $data_path,
'driver' => 'sqlite',
'charset' => 'utf8'
]
);
}
/**
* @param string $query
* @return object|string
* @throws IndexNotFoundException
*/
public function search($query)
{
/** @var Uri $uri */
$uri = Grav::instance()['uri'];
$type = $uri->query('search_type');
$this->tnt->selectIndex($this->index);
$this->tnt->asYouType = $this->options['as_you_type'];
if (isset($this->options['fuzzy']) && $this->options['fuzzy']) {
$this->tnt->fuzziness = true;
$this->tnt->fuzzy_distance = $this->options['distance'];
}
$limit = (int)$this->options['limit'];
$type = $type ?? $this->options['search_type'];
// TODO: Multiword parameter has been removed from $tnt->search(), please check if this works
$multiword = null;
if (isset($this->options['phrases']) && $this->options['phrases']) {
if (strlen($query) > 2) {
if ($query[0] === '"' && $query[strlen($query) - 1] === '"') {
$multiword = substr($query, 1, -1);
$type = 'basic';
$query = $multiword;
}
}
}
switch ($type) {
case 'basic':
$results = $this->tnt->search($query, $limit);
break;
case 'boolean':
$results = $this->tnt->searchBoolean($query, $limit);
break;
case 'default':
case 'auto':
default:
$guess = 'search';
foreach ($this->bool_characters as $char) {
if (strpos($query, $char) !== false) {
$guess = 'searchBoolean';
break;
}
}
$results = $this->tnt->{$guess}($query, $limit);
}
return $this->processResults($results, $query);
}
/**
* @param array $res
* @param string $query
* @return object|string
*/
protected function processResults($res, $query)
{
$data = new \stdClass();
$data->number_of_hits = $res['hits'] ?? 0;
$data->execution_time = $res['execution_time'];
/** @var Pages $pages */
$pages = Grav::instance()['pages'];
$counter = 0;
foreach ($res['ids'] as $path) {
if ($counter++ > $this->options['limit']) {
break;
}
$page = $pages->find($path);
if ($page) {
$event = new Event(
[
'page' => $page,
'query' => $query,
'options' => $this->options,
'fields' => $data,
'gtnt' => $this
]
);
Grav::instance()->fireEvent('onTNTSearchQuery', $event);
}
}
if ($this->options['json']) {
return json_encode($data, JSON_PRETTY_PRINT) ?: '';
}
return $data;
}
/**
* @param PageInterface $page
* @return string
*/
public static function getCleanContent($page)
{
$grav = Grav::instance();
$activePage = $grav['page'];
// Set active page in grav to the one we are currently processing.
unset($grav['page']);
$grav['page'] = $page;
/** @var Twig $twig */
$twig = $grav['twig'];
$header = $page->header();
// @phpstan-ignore-next-line
if (isset($header->tntsearch['template'])) {
$processed_page = $twig->processTemplate($header->tntsearch['template'] . '.html.twig', ['page' => $page]);
$content = $processed_page;
} else {
$content = $page->content();
}
$content = strip_tags($content);
$content = preg_replace(['/[ \t]+/', '/\s*$^\s*/m'], [' ', "\n"], $content) ?? $content;
// Restore active page in Grav.
unset($grav['page']);
$grav['page'] = $activePage;
return $content;
}
/**
* @return void
*/
public function createIndex()
{
$this->tnt->setDatabaseHandle(new GravConnector);
$indexer = $this->tnt->createIndex($this->index);
// Disable stemmer for users with older configuration.
if ($this->options['stemmer'] == 'default') {
$indexer->setLanguage('no');
} else {
$indexer->setLanguage($this->options['stemmer']);
}
$indexer->run();
}
/**
* @return void
* @throws IndexNotFoundException
*/
public function selectIndex()
{
$this->tnt->selectIndex($this->index);
}
/**
* @param object $object
* @return void
*/
public function deleteIndex($object)
{
if (!$object instanceof Page) {
return;
}
$this->tnt->setDatabaseHandle(new GravConnector);
try {
$this->tnt->selectIndex($this->index);
} catch (IndexNotFoundException $e) {
return;
}
$indexer = $this->tnt->getIndex();
// Delete existing if it exists
$indexer->delete($object->route());
}
/**
* @param object $object
* @return void
*/
public function updateIndex($object)
{
if (!$object instanceof Page) {
return;
}
$this->tnt->setDatabaseHandle(new GravConnector);
try {
$this->tnt->selectIndex($this->index);
} catch (IndexNotFoundException $e) {
return;
}
$indexer = $this->tnt->getIndex();
// Delete existing if it exists
$indexer->delete($object->route());
$filter = Grav::instance()['config']->get('plugins.tntsearch.filter');
if ($filter && array_key_exists('items', $filter)) {
if (is_string($filter['items'])) {
$filter['items'] = Yaml::parse($filter['items']);
}
$apage = new Page;
/** @var Collection $collection */
$collection = $apage->collection($filter, false);
$path = $object->path();
if ($path && array_key_exists($path, $collection->toArray())) {
$fields = $this->indexPageData($object);
$document = (array) $fields;
// Insert document
$indexer->insert($document);
}
}
}
/**
* @param PageInterface $page
* @return object
*/
public function indexPageData($page)
{
$header = (array) $page->header();
$redirect = (bool) $page->redirect();
if (!$page->published()) {
throw new \RuntimeException('not published...');
}
if (!$page->routable()) {
throw new \RuntimeException('not routable...');
}
if ($redirect || (isset($header['tntsearch']['index']) && $header['tntsearch']['index'] === false )) {
throw new \RuntimeException('redirect only...');
}
$route = $page->route();
$fields = new \stdClass();
$fields->id = $route;
$fields->name = $page->title();
$fields->content = static::getCleanContent($page);
Grav::instance()->fireEvent('onTNTSearchIndex', new Event(['page' => $page, 'fields' => $fields]));
return $fields;
}
}

View File

@ -0,0 +1,97 @@
<?php
namespace Grav\Plugin\Console;
use Grav\Console\ConsoleCommand;
use Grav\Plugin\TNTSearchPlugin;
use Symfony\Component\Console\Input\InputOption;
/**
* Class IndexerCommand
*
* @package Grav\Plugin\Console
*/
class TNTSearchIndexerCommand extends ConsoleCommand
{
/** @var array */
protected $options = [];
/** @var array */
protected $colors = [
'DEBUG' => 'green',
'INFO' => 'cyan',
'NOTICE' => 'yellow',
'WARNING' => 'yellow',
'ERROR' => 'red',
'CRITICAL' => 'red',
'ALERT' => 'red',
'EMERGENCY' => 'magenta'
];
/**
* @return void
*/
protected function configure(): void
{
$this
->setName('index')
->addOption(
'alt',
null,
InputOption::VALUE_NONE,
'alternative output'
)
->addOption(
'language',
'l',
InputOption::VALUE_OPTIONAL,
'optional language to index (multi-language sites only)'
)
->setDescription('TNTSearch Indexer')
->setHelp('The <info>index command</info> re-indexes the search engine');
}
/**
* @return int
*/
protected function serve(): int
{
/** @var string|null $langCode */
$langCode = $this->input->getOption('language');
error_reporting(1);
$this->setLanguage($langCode);
$this->initializePages();
$alt_output = $this->input->getOption('alt') ? true : false;
if ($alt_output) {
$output = $this->doIndex($langCode);
$this->output->write($output);
$this->output->writeln('');
} else {
$this->output->writeln('');
$this->output->writeln('<magenta>Re-indexing</magenta>');
$this->output->writeln('');
$start = microtime(true);
$output = $this->doIndex($langCode);
$this->output->write($output);
$this->output->writeln('');
$end = number_format(microtime(true) - $start,1);
$this->output->writeln('');
$this->output->writeln('Indexed in ' . $end . 's');
}
return 0;
}
/**
* @param string|null $langCode
* @return string
*/
private function doIndex(string $langCode = null): string
{
[,,$output] = TNTSearchPlugin::indexJob($langCode);
return $output;
}
}

View File

@ -0,0 +1,74 @@
<?php
namespace Grav\Plugin\Console;
use Grav\Console\ConsoleCommand;
use Grav\Plugin\TNTSearchPlugin;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputOption;
/**
* Class IndexerCommand
*
* @package Grav\Plugin\Console
*/
class TNTSearchQueryCommand extends ConsoleCommand
{
/** @var array */
protected $options = [];
/** @var array */
protected $colors = [
'DEBUG' => 'green',
'INFO' => 'cyan',
'NOTICE' => 'yellow',
'WARNING' => 'yellow',
'ERROR' => 'red',
'CRITICAL' => 'red',
'ALERT' => 'red',
'EMERGENCY' => 'magenta'
];
/**
* @return void
*/
protected function configure()
{
$this
->setName('query')
->setDescription('TNTSearch Query')
->addArgument(
'query',
InputArgument::REQUIRED,
'The search query you wish to use to test the database'
)
->addOption(
'language',
'l',
InputOption::VALUE_OPTIONAL,
'optional language to search against (multi-language sites only)'
)
->setHelp('The <info>query command</info> allows you to test the search engine')
;
}
/**
* @return int
*/
protected function serve(): int
{
/** @var string|null $langCode */
$langCode = $this->input->getOption('language');
/** @var string $query */
$query = $this->input->getArgument('query');
$this->setLanguage($langCode);
$this->initializePages();
$gtnt = TNTSearchPlugin::getSearchObjectType(['json' => true]);
print_r($gtnt->search($query));
$this->output->newLine();
return 0;
}
}

View File

@ -0,0 +1,34 @@
{
"name": "trilbymedia/grav-plugin-tntsearch",
"type": "grav-plugin",
"description": "TNTSearch plugin for Grav CMS",
"keywords": ["tntsearch","search"],
"homepage": "https://github.com/trilbymedia/grav-plugin-tntsearch",
"license": "MIT",
"authors": [
{
"name": "Team Grav",
"email": "devs@getgrav.org",
"homepage": "http://getgrav.org",
"role": "Developer"
}
],
"require": {
"php": ">=7.1.3",
"ext-json": "*",
"ext-pdo": "*",
"teamtnt/tntsearch": "^2.0"
},
"autoload": {
"psr-4": {
"Grav\\Plugin\\TNTSearch\\": "classes/",
"Grav\\Plugin\\Console\\": "cli/"
},
"classmap": ["tntsearch.php"]
},
"config": {
"platform": {
"php": "7.1.3"
}
}
}

104
plugins/tntsearch/composer.lock generated Normal file
View File

@ -0,0 +1,104 @@
{
"_readme": [
"This file locks the dependencies of your project to a known state",
"Read more about it at https://getcomposer.org/doc/01-basic-usage.md#installing-dependencies",
"This file is @generated automatically"
],
"content-hash": "8fa5f3e8ff0d88b02f744b9dc4cfa420",
"packages": [
{
"name": "teamtnt/tntsearch",
"version": "v2.9.0",
"source": {
"type": "git",
"url": "https://github.com/teamtnt/tntsearch.git",
"reference": "ccedae0cfe21f7831f2dd1f973cf8904dad42d8d"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/teamtnt/tntsearch/zipball/ccedae0cfe21f7831f2dd1f973cf8904dad42d8d",
"reference": "ccedae0cfe21f7831f2dd1f973cf8904dad42d8d",
"shasum": ""
},
"require": {
"ext-mbstring": "*",
"ext-pdo_sqlite": "*",
"ext-sqlite3": "*",
"php": "~7.1|^8"
},
"require-dev": {
"phpunit/phpunit": "7.*|8.*|9.*",
"symfony/var-dumper": "^4|^5.2"
},
"type": "library",
"autoload": {
"files": [
"helper/helpers.php"
],
"psr-4": {
"TeamTNT\\TNTSearch\\": "src"
}
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"MIT"
],
"authors": [
{
"name": "Nenad Tičarić",
"email": "nticaric@gmail.com",
"homepage": "http://www.tntstudio.us",
"role": "Developer"
}
],
"description": "A fully featured full text search engine written in PHP",
"homepage": "https://github.com/teamtnt/tntsearch",
"keywords": [
"Fuzzy search",
"bm25",
"fulltext",
"geosearch",
"search",
"stemming",
"teamtnt",
"text classification",
"tntsearch"
],
"support": {
"issues": "https://github.com/teamtnt/tntsearch/issues",
"source": "https://github.com/teamtnt/tntsearch/tree/v2.9.0"
},
"funding": [
{
"url": "https://ko-fi.com/nticaric",
"type": "ko_fi"
},
{
"url": "https://opencollective.com/tntsearch",
"type": "open_collective"
},
{
"url": "https://www.patreon.com/nticaric",
"type": "patreon"
}
],
"time": "2022-02-22T10:35:34+00:00"
}
],
"packages-dev": [],
"aliases": [],
"minimum-stability": "stable",
"stability-flags": [],
"prefer-stable": false,
"prefer-lowest": false,
"platform": {
"php": ">=7.1.3",
"ext-json": "*",
"ext-pdo": "*"
},
"platform-dev": [],
"platform-overrides": {
"php": "7.1.3"
},
"plugin-api-version": "2.3.0"
}

View File

@ -0,0 +1,20 @@
en:
PLUGIN_TNTSEARCH:
FOUND_RESULTS: "Found %s results"
FOUND_IN: "in <span>%s</span>"
POWERED_BY: "Powered by %s"
de:
PLUGIN_TNTSEARCH:
FOUND_RESULTS: "Es wurden %s Resultate gefunden"
FOUND_IN: "(<span>%s</span>)"
POWERED_BY: "Powered by %s"
ru:
PLUGIN_TNTSEARCH:
FOUND_RESULTS: "Результатов: %s"
FOUND_IN: "(<span>%s</span>)"
POWERED_BY: "Работает на %s"
fr:
PLUGIN_TNTSEARCH:
FOUND_RESULTS: "Résultats trouvés: %s"
FOUND_IN: "(<span>%s</span>)"
POWERED_BY: "Par %s"

View File

@ -0,0 +1,38 @@
{
"name": "grav-tntsearch",
"version": "1.1.1",
"main": "app/main.js",
"author": "Trilby Media",
"private": true,
"license": "MIT",
"scripts": {
"watch": "webpack --watch --progress --colors --env.NODE_ENV=development --env.dev --config webpack.conf.js",
"dev": "webpack --progress --colors --env.NODE_ENV=development --env.dev --config webpack.conf.js",
"prod": "NODE_ENV=production webpack --env.NODE_ENV=production --env.prod --config webpack.conf.js"
},
"dependencies": {
"domready": "^1.0.8",
"history": "^4.7.2",
"lodash": "^4.17.4",
"querystringify": "^2.0.0",
"url-parse": "^1.4.3",
"whatwg-fetch": "^2.0.3"
},
"devDependencies": {
"babel-core": "^6.26.0",
"babel-eslint": "^8.2.1",
"babel-loader": "^7.1.2",
"babel-polyfill": "^6.26.0",
"babel-preset-es2015": "^6.24.1",
"babel-preset-stage-3": "^6.24.1",
"css-loader": "^0.28.9",
"eslint": "^4.18.2",
"eslint-config-defaults": "^9.0.0",
"eslint-loader": "^1.9.0",
"exports-loader": "^0.6.4",
"imports-loader": "^0.7.1",
"json-loader": "^0.5.7",
"style-loader": "^0.20.1",
"webpack": "^3.10.0"
}
}

View File

@ -0,0 +1,3 @@
---
title: TNTSearch Search
---

View File

@ -0,0 +1,5 @@
---
title: TNTSearch Query
template_format: json
cache_control: no-cache
---

View File

@ -0,0 +1,16 @@
{% extends "forms/field.html.twig" %}
{% block input %}
<div class="form-list-wrapper {{ field.size }}">
<div class="index-status">
{% if tntsearch_index_status.status %}
<button id="tntsearch-index" class="button reindex tntsearch-reindex">Re-Index Content</button>
<span class="tntsearch-status success"><i class="fa fa-book"></i> {{ tntsearch_index_status.msg }}</span>
{% else %}
<button id="tntsearch-index" class="button critical tntsearch-reindex">Index Content</button>
<span class="tntsearch-status error"><i class="fa fa-warning"></i> {{ tntsearch_index_status.msg }}</span>
{% endif %}
<div class="warning tntsearch-error-details"></div>
</div>
</div>
{% endblock %}

View File

@ -0,0 +1,30 @@
{% set url = url|default(base_url|rtrim('/') ~ '/' ~ config.get('plugins.tntsearch.query_route', 's')|trim('/')) %}
{% set limit = limit|default(config.get('plugins.tntsearch.limit', 20)) %}
{% set snippet = snippet|default(config.get('plugins.tntsearch.snippet', 300)) %}
{% set min = min|default(config.get('plugins.tntsearch.min', 3)) %}
{% set search_type = search_type|default(config.get('plugins.tntsearch.search_type', 'auto')) %}
{% set placeholder = placeholder|default('Search...') %}
{% set live_update = in_page ? live_update|default(config.get('plugins.tntsearch.live_uri_update', 1)) : 0 %}
{% set nojs_action = config.get('plugins.tntsearch.search_route', '/search')|trim('/') %}
{% set options = { uri: url, limit: limit, snippet: snippet, min: min, in_page: in_page, live_update: live_update, search_type: search_type } %}
<form role="form" class="tntsearch-form" action="{{ nojs_action }}" method="get">
{% block tntsearch_input %}
<div id="tntsearch-wrapper" class="form-group{{ dropdown ? ' tntsearch-dropdown' : '' }}">
<input type="text" name="q" class="form-control form-input tntsearch-field{{ in_page ? ' tntsearch-field-inpage' : '' }}" data-tntsearch="{{ options|json_encode|e('html_attr') }}" placeholder="{{ placeholder }}" value="{{ not dropdown ? query|e : '' }}" autofocus>
<span class="tntsearch-clear"{{ not query or dropdown ? ' style="display: none;"' : '' }}>&times;</span>
</div>
{% endblock %}
<div class="tntsearch-results{{ in_page ? ' tntsearch-results-inpage' : '' }}">
{% if tntsearch_results is defined and tntsearch_results is not empty and in_page %}
{% include 'tntquery-ajax.html.twig' %}
{% endif %}
</div>
{% if config.get('plugins.tntsearch.powered_by') %}
<p class="tntsearch-powered-by">
{{ "PLUGIN_TNTSEARCH.POWERED_BY"|t("<a href='https://github.com/trilbymedia/grav-plugin-tntsearch' target='_blank'>TNTSearch</a>")|raw }}
</p>
{% endif %}
</form>

View File

@ -0,0 +1,7 @@
{% extends 'partials/base.html.twig' %}
{% block content %}
{{ page.content|raw }}
{% include 'partials/tntsearch.html.twig' with { in_page: true }%}
{% endblock %}

View File

@ -0,0 +1,19 @@
<div class="row">
<p class="info">
{% if config.get('plugins.tntsearch.display_hits') %}
<span class="hits">{{ "PLUGIN_TNTSEARCH.FOUND_RESULTS"|t(tntsearch_results.number_of_hits)|raw }}</span>
{% endif %}
{% if config.get('plugins.tntsearch.display_time') %}
<span class="time">{{ "PLUGIN_TNTSEARCH.FOUND_IN"|t(tntsearch_results.execution_time)|raw }}</span>
{% endif %}
</p>
{% for key, val in tntsearch_results.hits %}
<h5 class="title">
<a href="{{ base_url ~ val.link }}">{{ val.title|raw }}</a>
</h5>
{% if config.get('plugins.tntsearch.display_route') %}
<h6 class="route">{{ val.link|raw }}</h6>
{% endif %}
<p>{{ val.content|raw }}</p>
{% endfor %}
</div>

View File

@ -0,0 +1,8 @@
{% extends 'partials/base.html.twig' %}
{% block content %}
<div class="body-content">
{{ vardump(tntsearch_results) }}
</div>
{% endblock %}

View File

@ -0,0 +1 @@
{{ tntsearch_results|json_encode|raw }}

View File

@ -0,0 +1,523 @@
<?php
namespace Grav\Plugin;
use Composer\Autoload\ClassLoader;
use Grav\Common\Grav;
use Grav\Common\Language\Language;
use Grav\Common\Page\Page;
use Grav\Common\Page\Pages;
use Grav\Common\Plugin;
use Grav\Common\Scheduler\Scheduler;
use Grav\Common\Uri;
use Grav\Plugin\TNTSearch\GravTNTSearch;
use RocketTheme\Toolbox\Event\Event;
use TeamTNT\TNTSearch\Exceptions\IndexNotFoundException;
/**
* Class TNTSearchPlugin
* @package Grav\Plugin
*/
class TNTSearchPlugin extends Plugin
{
/** @var array|object|string */
protected $results = [];
/** @var string */
protected $query;
/** @var bool */
protected $built_in_search_page;
/** @var string */
protected $query_route;
/** @var string */
protected $search_route;
/** @var string */
protected $current_route;
/** @var string */
protected $admin_route;
/**
* @return array
*
* The getSubscribedEvents() gives the core a list of events
* that the plugin wants to listen to. The key of each
* array section is the event that the plugin listens to
* and the value (in the form of an array) contains the
* callable (or function) as well as the priority. The
* higher the number the higher the priority.
*/
public static function getSubscribedEvents(): array
{
return [
'onPluginsInitialized' => [
['autoload', 100000],
['onPluginsInitialized', 0]
],
'onSchedulerInitialized' => ['onSchedulerInitialized', 0],
'onTwigLoader' => ['onTwigLoader', 0],
'onTNTSearchReIndex' => ['onTNTSearchReIndex', 0],
'onTNTSearchIndex' => ['onTNTSearchIndex', 0],
'onTNTSearchQuery' => ['onTNTSearchQuery', 0],
];
}
/**
* [onPluginsInitialized:100000] Composer autoload.
*is
* @return ClassLoader
*/
public function autoload(): ClassLoader
{
return require __DIR__ . '/vendor/autoload.php';
}
/**
* Initialize the plugin
*/
public function onPluginsInitialized(): void
{
if ($this->isAdmin()) {
$this->GravTNTSearch();
$route = $this->config->get('plugins.admin.route');
$base = '/' . trim($route, '/');
$this->admin_route = $this->grav['base_url'] . $base;
$this->enable([
'onAdminMenu' => ['onAdminMenu', 0],
'onAdminTaskExecute' => ['onAdminTaskExecute', 0],
'onTwigSiteVariables' => ['onTwigAdminVariables', 0],
'onTwigLoader' => ['addAdminTwigTemplates', 0],
]);
if ($this->config->get('plugins.tntsearch.enable_admin_page_events', true)) {
$this->enable([
'onAdminAfterSave' => ['onObjectSave', 0],
'onAdminAfterDelete' => ['onObjectDelete', 0],
'onFlexObjectSave' => ['onObjectSave', 0],
'onFlexObjectDelete' => ['onObjectDelete', 0],
]);
}
return;
}
$this->enable([
'onPagesInitialized' => ['onPagesInitialized', 1000],
'onTwigSiteVariables' => ['onTwigSiteVariables', 0],
]);
}
/**
* Add index job to Grav Scheduler
* Requires Grav 1.6.0 - Scheduler
*/
public function onSchedulerInitialized(Event $e): void
{
if ($this->config->get('plugins.tntsearch.scheduled_index.enabled')) {
/** @var Scheduler $scheduler */
$scheduler = $e['scheduler'];
$at = $this->config->get('plugins.tntsearch.scheduled_index.at');
$logs = $this->config->get('plugins.tntsearch.scheduled_index.logs');
$job = $scheduler->addCommand('bin/plugin', ['tntsearch', 'index'], 'tntsearch-index');
$job->at($at);
$job->output($logs);
$job->backlink('/plugins/tntsearch');
}
}
/**
* Function to force a reindex from your own plugins
*/
public function onTNTSearchReIndex(): void
{
$this->GravTNTSearch()->createIndex();
}
/**
* A sample event to show how easy it is to extend the indexing fields
*
* @param Event $e
*/
public function onTNTSearchIndex(Event $e): void
{
$page = $e['page'];
$fields = $e['fields'];
if ($page && $page instanceof Page && isset($page->header()->author)) {
$author = $page->header()->author;
if (is_string($author)) {
$fields->author = $author;
}
}
}
public function onTNTSearchQuery(Event $e): void
{
$page = $e['page'];
$query = $e['query'];
$options = $e['options'];
$fields = $e['fields'];
$gtnt = $e['gtnt'];
$content = $gtnt->getCleanContent($page);
$title = $page->title();
$relevant = $gtnt->tnt->snippet($query, $content, $options['snippet']);
if (strlen($relevant) <= 6) {
$relevant = substr($content, 0, $options['snippet']);
}
$fields->hits[] = [
'link' => $page->route(),
'title' => $gtnt->tnt->highlight($title, $query, 'em', ['wholeWord' => false]),
'content' => $gtnt->tnt->highlight($relevant, $query, 'em', ['wholeWord' => false]),
];
}
/**
* Create pages and perform the search actions
*/
public function onPagesInitialized(): void
{
/** @var Uri $uri */
$uri = $this->grav['uri'];
$options = [];
$this->current_route = $uri->path();
$this->built_in_search_page = $this->config->get('plugins.tntsearch.built_in_search_page');
$this->search_route = $this->config->get('plugins.tntsearch.search_route');
$this->query_route = $this->config->get('plugins.tntsearch.query_route');
$pages = $this->grav['pages'];
$page = $pages->dispatch($this->current_route);
if (!$page) {
if ($this->query_route && $this->query_route === $this->current_route) {
$page = new Page;
$page->init(new \SplFileInfo(__DIR__ . '/pages/tntquery.md'));
$page->slug(basename($this->current_route));
if ($uri->param('ajax') || $uri->query('ajax')) {
$page->template('tntquery-ajax');
}
$pages->addPage($page, $this->current_route);
} elseif ($this->built_in_search_page && $this->search_route == $this->current_route) {
$page = new Page;
$page->init(new \SplFileInfo(__DIR__ . '/pages/search.md'));
$page->slug(basename($this->current_route));
$pages->addPage($page, $this->current_route);
}
}
$this->query = (string)($uri->param('q', null) ?: $uri->query('q') ?: '');
if ($this->query) {
$snippet = $this->getFormValue('sl');
$limit = $this->getFormValue('l');
if ($snippet) {
$options['snippet'] = $snippet;
}
if ($limit) {
$options['limit'] = $limit;
}
$this->grav['tntsearch'] = static::getSearchObjectType($options);
if ($page) {
$this->config->set('plugins.tntsearch', $this->mergeConfig($page));
}
try {
$this->results = $this->GravTNTSearch()->search($this->query);
} catch (IndexNotFoundException $e) {
$this->results = ['number_of_hits' => 0, 'hits' => [], 'execution_time' => 'missing index'];
}
}
}
/**
* Add the Twig template paths to the Twig laoder
*/
public function onTwigLoader(): void
{
$this->grav['twig']->addPath(__DIR__ . '/templates');
}
/**
* Add the current template paths to the admin Twig loader
*/
public function addAdminTwigTemplates(): void
{
$this->grav['twig']->addPath($this->grav['locator']->findResource('theme://templates'));
}
/**
* Add results and query to Twig as well as CSS/JS assets
*/
public function onTwigSiteVariables(): void
{
$twig = $this->grav['twig'];
if ($this->query) {
$twig->twig_vars['query'] = $this->query;
$twig->twig_vars['tntsearch_results'] = $this->results;
}
if ($this->config->get('plugins.tntsearch.built_in_css')) {
$this->grav['assets']->addCss('plugin://tntsearch/assets/tntsearch.css');
}
if ($this->config->get('plugins.tntsearch.built_in_js')) {
// $this->grav['assets']->addJs('plugin://tntsearch/assets/tntsearch.js');
$this->grav['assets']->addJs('plugin://tntsearch/assets/tntsearch.js');
}
}
/**
* Handle the Reindex task from the admin
*
* @param Event $e
*/
public function onAdminTaskExecute(Event $e): void
{
if ($e['method'] === 'taskReindexTNTSearch') {
$controller = $e['controller'];
header('Content-type: application/json');
if (!$controller->authorizeTask('reindexTNTSearch', ['admin.configuration', 'admin.super'])) {
$json_response = [
'status' => 'error',
'message' => '<i class="fa fa-warning"></i> Index not created',
'details' => 'Insufficient permissions to reindex the search engine database.'
];
echo json_encode($json_response);
exit;
}
// disable warnings
error_reporting(1);
// disable execution time
set_time_limit(0);
list($status, $msg, $output) = static::indexJob();
$json_response = [
'status' => $status ? 'success' : 'error',
'message' => $msg
];
echo json_encode($json_response);
exit;
}
}
/**
* Perform an 'add' or 'update' for index data as needed
*
* @param Event $event
* @return bool
*/
public function onObjectSave($event): bool
{
if (defined('CLI_DISABLE_TNTSEARCH')) {
return true;
}
$obj = $event['object'] ?: $event['page'];
if ($obj) {
$this->GravTNTSearch()->updateIndex($obj);
}
return true;
}
/**
* Perform a 'delete' for index data as needed
*
* @param Event $event
* @return bool
*/
public function onObjectDelete($event): bool
{
if (defined('CLI_DISABLE_TNTSEARCH')) {
return true;
}
$obj = $event['object'] ?: $event['page'];
if ($obj) {
$this->GravTNTSearch()->deleteIndex($obj);
}
return true;
}
/**
* Set some twig vars and load CSS/JS assets for admin
*/
public function onTwigAdminVariables(): void
{
$twig = $this->grav['twig'];
$gtnt = $this->GravTNTSearch();
[$status, $msg] = static::getIndexCount($gtnt);
if ($status === false) {
$message = '<i class="fa fa-binoculars"></i> <a href="/'. trim($this->admin_route, '/') . '/plugins/tntsearch">TNTSearch must be indexed before it will function properly.</a>';
$this->grav['admin']->addTempMessage($message, 'error');
}
$twig->twig_vars['tntsearch_index_status'] = ['status' => $status, 'msg' => $msg];
$this->grav['assets']->addCss('plugin://tntsearch/assets/admin/tntsearch.css');
$this->grav['assets']->addJs('plugin://tntsearch/assets/admin/tntsearch.js');
}
/**
* Add reindex button to the admin QuickTray
*/
public function onAdminMenu(): void
{
$options = [
'authorize' => 'taskReindexTNTSearch',
'hint' => 'reindexes the TNT Search index',
'class' => 'tntsearch-reindex',
'icon' => 'fa-binoculars'
];
$this->grav['twig']->plugins_quick_tray['TNT Search'] = $options;
}
/**
* Wrapper to get the number of documents currently indexed
*
* @param GravTNTSearch $gtnt
* @return array
*/
protected static function getIndexCount($gtnt): array
{
$status = true;
try {
$msg = '';
$gtnt->selectIndex();
$doc_count = $gtnt->tnt->totalDocumentsInCollection();
$language = Grav::instance()['language'];
if ($language->enabled()) {
$msg .= 'Processed ' . count($language->getLanguages()) . ' languages, each with ';
}
$msg .= $doc_count . ' documents reindexed';
} catch (IndexNotFoundException $e) {
$status = false;
$msg = 'Index not created';
}
return [$status, $msg];
}
/**
* Helper function to read form/url values
*
* @param string $val
* @return mixed
*/
protected function getFormValue($val)
{
$uri = $this->grav['uri'];
return $uri->param($val) ?: $uri->query($val) ?: filter_input(INPUT_POST, $val, FILTER_SANITIZE_ENCODED);
}
/**
* @param array $options
* @return GravTNTSearch
*/
public static function getSearchObjectType($options = [])
{
$type = 'Grav\\Plugin\\TNTSearch\\' . Grav::instance()['config']->get('plugins.tntsearch.search_object_type', 'Grav') . 'TNTSearch';
if (class_exists($type)) {
return new $type($options);
}
throw new \RuntimeException('Search class: ' . $type . ' does not exist');
}
/**
* @param string|null $langCode
* @return array
*/
public static function indexJob(string $langCode = null)
{
$grav = Grav::instance();
$grav['debugger']->enabled(false);
/** @var Pages $pages */
$pages = $grav['pages'];
if (method_exists($pages, 'enablePages')) {
$pages->enablePages();
}
ob_start();
/** @var Language $language */
$language = $grav['language'];
$langEnabled = $language->enabled();
// TODO: can be removed when Grav minimum >= v1.6.22
$hasReset = method_exists($pages, 'reset');
if (!$hasReset && !$langCode) {
$langCode = $language->getActive();
}
if ($langCode && (!$langEnabled || !$language->validate($langCode))) {
$langCode = null;
}
$langCodes = $langCode ? [$langCode] : $language->getLanguages();
if ($langCodes) {
foreach ($langCodes as $lang) {
if ($lang !== $language->getActive()) {
$language->init();
$language->setActive($lang);
// TODO: $hasReset test can be removed (keep reset!) when Grav minimum >= v1.6.22
if ($hasReset) {
$pages->reset();
}
}
echo "\nLanguage: {$lang}\n";
$gtnt = static::getSearchObjectType();
$gtnt->createIndex();
}
} else {
$gtnt = static::getSearchObjectType();
$gtnt->createIndex();
}
$output = ob_get_clean();
// Reset and get index count and status
$gtnt = static::getSearchObjectType();
[$status, $msg] = static::getIndexCount($gtnt);
return [$status, $msg, $output];
}
/**
* Helper to initialize TNTSearch if required
*
* @return GravTNTSearch
*/
protected function GravTNTSearch()
{
if (!isset($this->grav['tntsearch'])) {
$this->grav['tntsearch'] = static::getSearchObjectType();
}
return $this->grav['tntsearch'];
}
}

View File

@ -0,0 +1,29 @@
enabled: true
search_route: '/search'
query_route: '/s'
built_in_css: true
built_in_js: true
built_in_search_page: true
enable_admin_page_events: true
search_type: auto
fuzzy: false
distance: 2
phrases: true
stemmer: default
display_route: true
display_hits: true
display_time: true
live_uri_update: true
limit: 20
min: 3
snippet: 300
index_page_by_default: true
scheduled_index:
enabled: false
at: '0 */3 * * *'
logs: 'logs/tntsearch-index.out'
filter:
items:
- root@.descendants
powered_by: true
search_object_type: Grav

25
plugins/tntsearch/vendor/autoload.php vendored Normal file
View File

@ -0,0 +1,25 @@
<?php
// autoload.php @generated by Composer
if (PHP_VERSION_ID < 50600) {
if (!headers_sent()) {
header('HTTP/1.1 500 Internal Server Error');
}
$err = 'Composer 2.3.0 dropped support for autoloading on PHP <5.6 and you are running '.PHP_VERSION.', please upgrade PHP or use Composer 2.2 LTS via "composer self-update --2.2". Aborting.'.PHP_EOL;
if (!ini_get('display_errors')) {
if (PHP_SAPI === 'cli' || PHP_SAPI === 'phpdbg') {
fwrite(STDERR, $err);
} elseif (!headers_sent()) {
echo $err;
}
}
trigger_error(
$err,
E_USER_ERROR
);
}
require_once __DIR__ . '/composer/autoload_real.php';
return ComposerAutoloaderInit6693564509f9a3fa6ed2c7bf76fdb017::getLoader();

View File

@ -0,0 +1,585 @@
<?php
/*
* This file is part of Composer.
*
* (c) Nils Adermann <naderman@naderman.de>
* Jordi Boggiano <j.boggiano@seld.be>
*
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*/
namespace Composer\Autoload;
/**
* ClassLoader implements a PSR-0, PSR-4 and classmap class loader.
*
* $loader = new \Composer\Autoload\ClassLoader();
*
* // register classes with namespaces
* $loader->add('Symfony\Component', __DIR__.'/component');
* $loader->add('Symfony', __DIR__.'/framework');
*
* // activate the autoloader
* $loader->register();
*
* // to enable searching the include path (eg. for PEAR packages)
* $loader->setUseIncludePath(true);
*
* In this example, if you try to use a class in the Symfony\Component
* namespace or one of its children (Symfony\Component\Console for instance),
* the autoloader will first look for the class under the component/
* directory, and it will then fallback to the framework/ directory if not
* found before giving up.
*
* This class is loosely based on the Symfony UniversalClassLoader.
*
* @author Fabien Potencier <fabien@symfony.com>
* @author Jordi Boggiano <j.boggiano@seld.be>
* @see https://www.php-fig.org/psr/psr-0/
* @see https://www.php-fig.org/psr/psr-4/
*/
class ClassLoader
{
/** @var \Closure(string):void */
private static $includeFile;
/** @var ?string */
private $vendorDir;
// PSR-4
/**
* @var array[]
* @psalm-var array<string, array<string, int>>
*/
private $prefixLengthsPsr4 = array();
/**
* @var array[]
* @psalm-var array<string, array<int, string>>
*/
private $prefixDirsPsr4 = array();
/**
* @var array[]
* @psalm-var array<string, string>
*/
private $fallbackDirsPsr4 = array();
// PSR-0
/**
* @var array[]
* @psalm-var array<string, array<string, string[]>>
*/
private $prefixesPsr0 = array();
/**
* @var array[]
* @psalm-var array<string, string>
*/
private $fallbackDirsPsr0 = array();
/** @var bool */
private $useIncludePath = false;
/**
* @var string[]
* @psalm-var array<string, string>
*/
private $classMap = array();
/** @var bool */
private $classMapAuthoritative = false;
/**
* @var bool[]
* @psalm-var array<string, bool>
*/
private $missingClasses = array();
/** @var ?string */
private $apcuPrefix;
/**
* @var self[]
*/
private static $registeredLoaders = array();
/**
* @param ?string $vendorDir
*/
public function __construct($vendorDir = null)
{
$this->vendorDir = $vendorDir;
self::initializeIncludeClosure();
}
/**
* @return string[]
*/
public function getPrefixes()
{
if (!empty($this->prefixesPsr0)) {
return call_user_func_array('array_merge', array_values($this->prefixesPsr0));
}
return array();
}
/**
* @return array[]
* @psalm-return array<string, array<int, string>>
*/
public function getPrefixesPsr4()
{
return $this->prefixDirsPsr4;
}
/**
* @return array[]
* @psalm-return array<string, string>
*/
public function getFallbackDirs()
{
return $this->fallbackDirsPsr0;
}
/**
* @return array[]
* @psalm-return array<string, string>
*/
public function getFallbackDirsPsr4()
{
return $this->fallbackDirsPsr4;
}
/**
* @return string[] Array of classname => path
* @psalm-return array<string, string>
*/
public function getClassMap()
{
return $this->classMap;
}
/**
* @param string[] $classMap Class to filename map
* @psalm-param array<string, string> $classMap
*
* @return void
*/
public function addClassMap(array $classMap)
{
if ($this->classMap) {
$this->classMap = array_merge($this->classMap, $classMap);
} else {
$this->classMap = $classMap;
}
}
/**
* Registers a set of PSR-0 directories for a given prefix, either
* appending or prepending to the ones previously set for this prefix.
*
* @param string $prefix The prefix
* @param string[]|string $paths The PSR-0 root directories
* @param bool $prepend Whether to prepend the directories
*
* @return void
*/
public function add($prefix, $paths, $prepend = false)
{
if (!$prefix) {
if ($prepend) {
$this->fallbackDirsPsr0 = array_merge(
(array) $paths,
$this->fallbackDirsPsr0
);
} else {
$this->fallbackDirsPsr0 = array_merge(
$this->fallbackDirsPsr0,
(array) $paths
);
}
return;
}
$first = $prefix[0];
if (!isset($this->prefixesPsr0[$first][$prefix])) {
$this->prefixesPsr0[$first][$prefix] = (array) $paths;
return;
}
if ($prepend) {
$this->prefixesPsr0[$first][$prefix] = array_merge(
(array) $paths,
$this->prefixesPsr0[$first][$prefix]
);
} else {
$this->prefixesPsr0[$first][$prefix] = array_merge(
$this->prefixesPsr0[$first][$prefix],
(array) $paths
);
}
}
/**
* Registers a set of PSR-4 directories for a given namespace, either
* appending or prepending to the ones previously set for this namespace.
*
* @param string $prefix The prefix/namespace, with trailing '\\'
* @param string[]|string $paths The PSR-4 base directories
* @param bool $prepend Whether to prepend the directories
*
* @throws \InvalidArgumentException
*
* @return void
*/
public function addPsr4($prefix, $paths, $prepend = false)
{
if (!$prefix) {
// Register directories for the root namespace.
if ($prepend) {
$this->fallbackDirsPsr4 = array_merge(
(array) $paths,
$this->fallbackDirsPsr4
);
} else {
$this->fallbackDirsPsr4 = array_merge(
$this->fallbackDirsPsr4,
(array) $paths
);
}
} elseif (!isset($this->prefixDirsPsr4[$prefix])) {
// Register directories for a new namespace.
$length = strlen($prefix);
if ('\\' !== $prefix[$length - 1]) {
throw new \InvalidArgumentException("A non-empty PSR-4 prefix must end with a namespace separator.");
}
$this->prefixLengthsPsr4[$prefix[0]][$prefix] = $length;
$this->prefixDirsPsr4[$prefix] = (array) $paths;
} elseif ($prepend) {
// Prepend directories for an already registered namespace.
$this->prefixDirsPsr4[$prefix] = array_merge(
(array) $paths,
$this->prefixDirsPsr4[$prefix]
);
} else {
// Append directories for an already registered namespace.
$this->prefixDirsPsr4[$prefix] = array_merge(
$this->prefixDirsPsr4[$prefix],
(array) $paths
);
}
}
/**
* Registers a set of PSR-0 directories for a given prefix,
* replacing any others previously set for this prefix.
*
* @param string $prefix The prefix
* @param string[]|string $paths The PSR-0 base directories
*
* @return void
*/
public function set($prefix, $paths)
{
if (!$prefix) {
$this->fallbackDirsPsr0 = (array) $paths;
} else {
$this->prefixesPsr0[$prefix[0]][$prefix] = (array) $paths;
}
}
/**
* Registers a set of PSR-4 directories for a given namespace,
* replacing any others previously set for this namespace.
*
* @param string $prefix The prefix/namespace, with trailing '\\'
* @param string[]|string $paths The PSR-4 base directories
*
* @throws \InvalidArgumentException
*
* @return void
*/
public function setPsr4($prefix, $paths)
{
if (!$prefix) {
$this->fallbackDirsPsr4 = (array) $paths;
} else {
$length = strlen($prefix);
if ('\\' !== $prefix[$length - 1]) {
throw new \InvalidArgumentException("A non-empty PSR-4 prefix must end with a namespace separator.");
}
$this->prefixLengthsPsr4[$prefix[0]][$prefix] = $length;
$this->prefixDirsPsr4[$prefix] = (array) $paths;
}
}
/**
* Turns on searching the include path for class files.
*
* @param bool $useIncludePath
*
* @return void
*/
public function setUseIncludePath($useIncludePath)
{
$this->useIncludePath = $useIncludePath;
}
/**
* Can be used to check if the autoloader uses the include path to check
* for classes.
*
* @return bool
*/
public function getUseIncludePath()
{
return $this->useIncludePath;
}
/**
* Turns off searching the prefix and fallback directories for classes
* that have not been registered with the class map.
*
* @param bool $classMapAuthoritative
*
* @return void
*/
public function setClassMapAuthoritative($classMapAuthoritative)
{
$this->classMapAuthoritative = $classMapAuthoritative;
}
/**
* Should class lookup fail if not found in the current class map?
*
* @return bool
*/
public function isClassMapAuthoritative()
{
return $this->classMapAuthoritative;
}
/**
* APCu prefix to use to cache found/not-found classes, if the extension is enabled.
*
* @param string|null $apcuPrefix
*
* @return void
*/
public function setApcuPrefix($apcuPrefix)
{
$this->apcuPrefix = function_exists('apcu_fetch') && filter_var(ini_get('apc.enabled'), FILTER_VALIDATE_BOOLEAN) ? $apcuPrefix : null;
}
/**
* The APCu prefix in use, or null if APCu caching is not enabled.
*
* @return string|null
*/
public function getApcuPrefix()
{
return $this->apcuPrefix;
}
/**
* Registers this instance as an autoloader.
*
* @param bool $prepend Whether to prepend the autoloader or not
*
* @return void
*/
public function register($prepend = false)
{
spl_autoload_register(array($this, 'loadClass'), true, $prepend);
if (null === $this->vendorDir) {
return;
}
if ($prepend) {
self::$registeredLoaders = array($this->vendorDir => $this) + self::$registeredLoaders;
} else {
unset(self::$registeredLoaders[$this->vendorDir]);
self::$registeredLoaders[$this->vendorDir] = $this;
}
}
/**
* Unregisters this instance as an autoloader.
*
* @return void
*/
public function unregister()
{
spl_autoload_unregister(array($this, 'loadClass'));
if (null !== $this->vendorDir) {
unset(self::$registeredLoaders[$this->vendorDir]);
}
}
/**
* Loads the given class or interface.
*
* @param string $class The name of the class
* @return true|null True if loaded, null otherwise
*/
public function loadClass($class)
{
if ($file = $this->findFile($class)) {
$includeFile = self::$includeFile;
$includeFile($file);
return true;
}
return null;
}
/**
* Finds the path to the file where the class is defined.
*
* @param string $class The name of the class
*
* @return string|false The path if found, false otherwise
*/
public function findFile($class)
{
// class map lookup
if (isset($this->classMap[$class])) {
return $this->classMap[$class];
}
if ($this->classMapAuthoritative || isset($this->missingClasses[$class])) {
return false;
}
if (null !== $this->apcuPrefix) {
$file = apcu_fetch($this->apcuPrefix.$class, $hit);
if ($hit) {
return $file;
}
}
$file = $this->findFileWithExtension($class, '.php');
// Search for Hack files if we are running on HHVM
if (false === $file && defined('HHVM_VERSION')) {
$file = $this->findFileWithExtension($class, '.hh');
}
if (null !== $this->apcuPrefix) {
apcu_add($this->apcuPrefix.$class, $file);
}
if (false === $file) {
// Remember that this class does not exist.
$this->missingClasses[$class] = true;
}
return $file;
}
/**
* Returns the currently registered loaders indexed by their corresponding vendor directories.
*
* @return self[]
*/
public static function getRegisteredLoaders()
{
return self::$registeredLoaders;
}
/**
* @param string $class
* @param string $ext
* @return string|false
*/
private function findFileWithExtension($class, $ext)
{
// PSR-4 lookup
$logicalPathPsr4 = strtr($class, '\\', DIRECTORY_SEPARATOR) . $ext;
$first = $class[0];
if (isset($this->prefixLengthsPsr4[$first])) {
$subPath = $class;
while (false !== $lastPos = strrpos($subPath, '\\')) {
$subPath = substr($subPath, 0, $lastPos);
$search = $subPath . '\\';
if (isset($this->prefixDirsPsr4[$search])) {
$pathEnd = DIRECTORY_SEPARATOR . substr($logicalPathPsr4, $lastPos + 1);
foreach ($this->prefixDirsPsr4[$search] as $dir) {
if (file_exists($file = $dir . $pathEnd)) {
return $file;
}
}
}
}
}
// PSR-4 fallback dirs
foreach ($this->fallbackDirsPsr4 as $dir) {
if (file_exists($file = $dir . DIRECTORY_SEPARATOR . $logicalPathPsr4)) {
return $file;
}
}
// PSR-0 lookup
if (false !== $pos = strrpos($class, '\\')) {
// namespaced class name
$logicalPathPsr0 = substr($logicalPathPsr4, 0, $pos + 1)
. strtr(substr($logicalPathPsr4, $pos + 1), '_', DIRECTORY_SEPARATOR);
} else {
// PEAR-like class name
$logicalPathPsr0 = strtr($class, '_', DIRECTORY_SEPARATOR) . $ext;
}
if (isset($this->prefixesPsr0[$first])) {
foreach ($this->prefixesPsr0[$first] as $prefix => $dirs) {
if (0 === strpos($class, $prefix)) {
foreach ($dirs as $dir) {
if (file_exists($file = $dir . DIRECTORY_SEPARATOR . $logicalPathPsr0)) {
return $file;
}
}
}
}
}
// PSR-0 fallback dirs
foreach ($this->fallbackDirsPsr0 as $dir) {
if (file_exists($file = $dir . DIRECTORY_SEPARATOR . $logicalPathPsr0)) {
return $file;
}
}
// PSR-0 include paths.
if ($this->useIncludePath && $file = stream_resolve_include_path($logicalPathPsr0)) {
return $file;
}
return false;
}
/**
* @return void
*/
private static function initializeIncludeClosure()
{
if (self::$includeFile !== null) {
return;
}
/**
* Scope isolated include.
*
* Prevents access to $this/self from included files.
*
* @param string $file
* @return void
*/
self::$includeFile = \Closure::bind(static function($file) {
include $file;
}, null, null);
}
}

View File

@ -0,0 +1,352 @@
<?php
/*
* This file is part of Composer.
*
* (c) Nils Adermann <naderman@naderman.de>
* Jordi Boggiano <j.boggiano@seld.be>
*
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*/
namespace Composer;
use Composer\Autoload\ClassLoader;
use Composer\Semver\VersionParser;
/**
* This class is copied in every Composer installed project and available to all
*
* See also https://getcomposer.org/doc/07-runtime.md#installed-versions
*
* To require its presence, you can require `composer-runtime-api ^2.0`
*
* @final
*/
class InstalledVersions
{
/**
* @var mixed[]|null
* @psalm-var array{root: array{name: string, pretty_version: string, version: string, reference: string|null, type: string, install_path: string, aliases: string[], dev: bool}, versions: array<string, array{pretty_version?: string, version?: string, reference?: string|null, type?: string, install_path?: string, aliases?: string[], dev_requirement: bool, replaced?: string[], provided?: string[]}>}|array{}|null
*/
private static $installed;
/**
* @var bool|null
*/
private static $canGetVendors;
/**
* @var array[]
* @psalm-var array<string, array{root: array{name: string, pretty_version: string, version: string, reference: string|null, type: string, install_path: string, aliases: string[], dev: bool}, versions: array<string, array{pretty_version?: string, version?: string, reference?: string|null, type?: string, install_path?: string, aliases?: string[], dev_requirement: bool, replaced?: string[], provided?: string[]}>}>
*/
private static $installedByVendor = array();
/**
* Returns a list of all package names which are present, either by being installed, replaced or provided
*
* @return string[]
* @psalm-return list<string>
*/
public static function getInstalledPackages()
{
$packages = array();
foreach (self::getInstalled() as $installed) {
$packages[] = array_keys($installed['versions']);
}
if (1 === \count($packages)) {
return $packages[0];
}
return array_keys(array_flip(\call_user_func_array('array_merge', $packages)));
}
/**
* Returns a list of all package names with a specific type e.g. 'library'
*
* @param string $type
* @return string[]
* @psalm-return list<string>
*/
public static function getInstalledPackagesByType($type)
{
$packagesByType = array();
foreach (self::getInstalled() as $installed) {
foreach ($installed['versions'] as $name => $package) {
if (isset($package['type']) && $package['type'] === $type) {
$packagesByType[] = $name;
}
}
}
return $packagesByType;
}
/**
* Checks whether the given package is installed
*
* This also returns true if the package name is provided or replaced by another package
*
* @param string $packageName
* @param bool $includeDevRequirements
* @return bool
*/
public static function isInstalled($packageName, $includeDevRequirements = true)
{
foreach (self::getInstalled() as $installed) {
if (isset($installed['versions'][$packageName])) {
return $includeDevRequirements || empty($installed['versions'][$packageName]['dev_requirement']);
}
}
return false;
}
/**
* Checks whether the given package satisfies a version constraint
*
* e.g. If you want to know whether version 2.3+ of package foo/bar is installed, you would call:
*
* Composer\InstalledVersions::satisfies(new VersionParser, 'foo/bar', '^2.3')
*
* @param VersionParser $parser Install composer/semver to have access to this class and functionality
* @param string $packageName
* @param string|null $constraint A version constraint to check for, if you pass one you have to make sure composer/semver is required by your package
* @return bool
*/
public static function satisfies(VersionParser $parser, $packageName, $constraint)
{
$constraint = $parser->parseConstraints($constraint);
$provided = $parser->parseConstraints(self::getVersionRanges($packageName));
return $provided->matches($constraint);
}
/**
* Returns a version constraint representing all the range(s) which are installed for a given package
*
* It is easier to use this via isInstalled() with the $constraint argument if you need to check
* whether a given version of a package is installed, and not just whether it exists
*
* @param string $packageName
* @return string Version constraint usable with composer/semver
*/
public static function getVersionRanges($packageName)
{
foreach (self::getInstalled() as $installed) {
if (!isset($installed['versions'][$packageName])) {
continue;
}
$ranges = array();
if (isset($installed['versions'][$packageName]['pretty_version'])) {
$ranges[] = $installed['versions'][$packageName]['pretty_version'];
}
if (array_key_exists('aliases', $installed['versions'][$packageName])) {
$ranges = array_merge($ranges, $installed['versions'][$packageName]['aliases']);
}
if (array_key_exists('replaced', $installed['versions'][$packageName])) {
$ranges = array_merge($ranges, $installed['versions'][$packageName]['replaced']);
}
if (array_key_exists('provided', $installed['versions'][$packageName])) {
$ranges = array_merge($ranges, $installed['versions'][$packageName]['provided']);
}
return implode(' || ', $ranges);
}
throw new \OutOfBoundsException('Package "' . $packageName . '" is not installed');
}
/**
* @param string $packageName
* @return string|null If the package is being replaced or provided but is not really installed, null will be returned as version, use satisfies or getVersionRanges if you need to know if a given version is present
*/
public static function getVersion($packageName)
{
foreach (self::getInstalled() as $installed) {
if (!isset($installed['versions'][$packageName])) {
continue;
}
if (!isset($installed['versions'][$packageName]['version'])) {
return null;
}
return $installed['versions'][$packageName]['version'];
}
throw new \OutOfBoundsException('Package "' . $packageName . '" is not installed');
}
/**
* @param string $packageName
* @return string|null If the package is being replaced or provided but is not really installed, null will be returned as version, use satisfies or getVersionRanges if you need to know if a given version is present
*/
public static function getPrettyVersion($packageName)
{
foreach (self::getInstalled() as $installed) {
if (!isset($installed['versions'][$packageName])) {
continue;
}
if (!isset($installed['versions'][$packageName]['pretty_version'])) {
return null;
}
return $installed['versions'][$packageName]['pretty_version'];
}
throw new \OutOfBoundsException('Package "' . $packageName . '" is not installed');
}
/**
* @param string $packageName
* @return string|null If the package is being replaced or provided but is not really installed, null will be returned as reference
*/
public static function getReference($packageName)
{
foreach (self::getInstalled() as $installed) {
if (!isset($installed['versions'][$packageName])) {
continue;
}
if (!isset($installed['versions'][$packageName]['reference'])) {
return null;
}
return $installed['versions'][$packageName]['reference'];
}
throw new \OutOfBoundsException('Package "' . $packageName . '" is not installed');
}
/**
* @param string $packageName
* @return string|null If the package is being replaced or provided but is not really installed, null will be returned as install path. Packages of type metapackages also have a null install path.
*/
public static function getInstallPath($packageName)
{
foreach (self::getInstalled() as $installed) {
if (!isset($installed['versions'][$packageName])) {
continue;
}
return isset($installed['versions'][$packageName]['install_path']) ? $installed['versions'][$packageName]['install_path'] : null;
}
throw new \OutOfBoundsException('Package "' . $packageName . '" is not installed');
}
/**
* @return array
* @psalm-return array{name: string, pretty_version: string, version: string, reference: string|null, type: string, install_path: string, aliases: string[], dev: bool}
*/
public static function getRootPackage()
{
$installed = self::getInstalled();
return $installed[0]['root'];
}
/**
* Returns the raw installed.php data for custom implementations
*
* @deprecated Use getAllRawData() instead which returns all datasets for all autoloaders present in the process. getRawData only returns the first dataset loaded, which may not be what you expect.
* @return array[]
* @psalm-return array{root: array{name: string, pretty_version: string, version: string, reference: string|null, type: string, install_path: string, aliases: string[], dev: bool}, versions: array<string, array{pretty_version?: string, version?: string, reference?: string|null, type?: string, install_path?: string, aliases?: string[], dev_requirement: bool, replaced?: string[], provided?: string[]}>}
*/
public static function getRawData()
{
@trigger_error('getRawData only returns the first dataset loaded, which may not be what you expect. Use getAllRawData() instead which returns all datasets for all autoloaders present in the process.', E_USER_DEPRECATED);
if (null === self::$installed) {
// only require the installed.php file if this file is loaded from its dumped location,
// and not from its source location in the composer/composer package, see https://github.com/composer/composer/issues/9937
if (substr(__DIR__, -8, 1) !== 'C') {
self::$installed = include __DIR__ . '/installed.php';
} else {
self::$installed = array();
}
}
return self::$installed;
}
/**
* Returns the raw data of all installed.php which are currently loaded for custom implementations
*
* @return array[]
* @psalm-return list<array{root: array{name: string, pretty_version: string, version: string, reference: string|null, type: string, install_path: string, aliases: string[], dev: bool}, versions: array<string, array{pretty_version?: string, version?: string, reference?: string|null, type?: string, install_path?: string, aliases?: string[], dev_requirement: bool, replaced?: string[], provided?: string[]}>}>
*/
public static function getAllRawData()
{
return self::getInstalled();
}
/**
* Lets you reload the static array from another file
*
* This is only useful for complex integrations in which a project needs to use
* this class but then also needs to execute another project's autoloader in process,
* and wants to ensure both projects have access to their version of installed.php.
*
* A typical case would be PHPUnit, where it would need to make sure it reads all
* the data it needs from this class, then call reload() with
* `require $CWD/vendor/composer/installed.php` (or similar) as input to make sure
* the project in which it runs can then also use this class safely, without
* interference between PHPUnit's dependencies and the project's dependencies.
*
* @param array[] $data A vendor/composer/installed.php data set
* @return void
*
* @psalm-param array{root: array{name: string, pretty_version: string, version: string, reference: string|null, type: string, install_path: string, aliases: string[], dev: bool}, versions: array<string, array{pretty_version?: string, version?: string, reference?: string|null, type?: string, install_path?: string, aliases?: string[], dev_requirement: bool, replaced?: string[], provided?: string[]}>} $data
*/
public static function reload($data)
{
self::$installed = $data;
self::$installedByVendor = array();
}
/**
* @return array[]
* @psalm-return list<array{root: array{name: string, pretty_version: string, version: string, reference: string|null, type: string, install_path: string, aliases: string[], dev: bool}, versions: array<string, array{pretty_version?: string, version?: string, reference?: string|null, type?: string, install_path?: string, aliases?: string[], dev_requirement: bool, replaced?: string[], provided?: string[]}>}>
*/
private static function getInstalled()
{
if (null === self::$canGetVendors) {
self::$canGetVendors = method_exists('Composer\Autoload\ClassLoader', 'getRegisteredLoaders');
}
$installed = array();
if (self::$canGetVendors) {
foreach (ClassLoader::getRegisteredLoaders() as $vendorDir => $loader) {
if (isset(self::$installedByVendor[$vendorDir])) {
$installed[] = self::$installedByVendor[$vendorDir];
} elseif (is_file($vendorDir.'/composer/installed.php')) {
$installed[] = self::$installedByVendor[$vendorDir] = require $vendorDir.'/composer/installed.php';
if (null === self::$installed && strtr($vendorDir.'/composer', '\\', '/') === strtr(__DIR__, '\\', '/')) {
self::$installed = $installed[count($installed) - 1];
}
}
}
}
if (null === self::$installed) {
// only require the installed.php file if this file is loaded from its dumped location,
// and not from its source location in the composer/composer package, see https://github.com/composer/composer/issues/9937
if (substr(__DIR__, -8, 1) !== 'C') {
self::$installed = require __DIR__ . '/installed.php';
} else {
self::$installed = array();
}
}
$installed[] = self::$installed;
return $installed;
}
}

View File

@ -0,0 +1,21 @@
Copyright (c) Nils Adermann, Jordi Boggiano
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished
to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -0,0 +1,11 @@
<?php
// autoload_classmap.php @generated by Composer
$vendorDir = dirname(__DIR__);
$baseDir = dirname($vendorDir);
return array(
'Composer\\InstalledVersions' => $vendorDir . '/composer/InstalledVersions.php',
'Grav\\Plugin\\TNTSearchPlugin' => $baseDir . '/tntsearch.php',
);

View File

@ -0,0 +1,10 @@
<?php
// autoload_files.php @generated by Composer
$vendorDir = dirname(__DIR__);
$baseDir = dirname($vendorDir);
return array(
'290dd4ba42f11019134caca05dbefe3f' => $vendorDir . '/teamtnt/tntsearch/helper/helpers.php',
);

View File

@ -0,0 +1,9 @@
<?php
// autoload_namespaces.php @generated by Composer
$vendorDir = dirname(__DIR__);
$baseDir = dirname($vendorDir);
return array(
);

View File

@ -0,0 +1,12 @@
<?php
// autoload_psr4.php @generated by Composer
$vendorDir = dirname(__DIR__);
$baseDir = dirname($vendorDir);
return array(
'TeamTNT\\TNTSearch\\' => array($vendorDir . '/teamtnt/tntsearch/src'),
'Grav\\Plugin\\TNTSearch\\' => array($baseDir . '/classes'),
'Grav\\Plugin\\Console\\' => array($baseDir . '/cli'),
);

View File

@ -0,0 +1,50 @@
<?php
// autoload_real.php @generated by Composer
class ComposerAutoloaderInit6693564509f9a3fa6ed2c7bf76fdb017
{
private static $loader;
public static function loadClassLoader($class)
{
if ('Composer\Autoload\ClassLoader' === $class) {
require __DIR__ . '/ClassLoader.php';
}
}
/**
* @return \Composer\Autoload\ClassLoader
*/
public static function getLoader()
{
if (null !== self::$loader) {
return self::$loader;
}
require __DIR__ . '/platform_check.php';
spl_autoload_register(array('ComposerAutoloaderInit6693564509f9a3fa6ed2c7bf76fdb017', 'loadClassLoader'), true, true);
self::$loader = $loader = new \Composer\Autoload\ClassLoader(\dirname(__DIR__));
spl_autoload_unregister(array('ComposerAutoloaderInit6693564509f9a3fa6ed2c7bf76fdb017', 'loadClassLoader'));
require __DIR__ . '/autoload_static.php';
call_user_func(\Composer\Autoload\ComposerStaticInit6693564509f9a3fa6ed2c7bf76fdb017::getInitializer($loader));
$loader->register(true);
$filesToLoad = \Composer\Autoload\ComposerStaticInit6693564509f9a3fa6ed2c7bf76fdb017::$files;
$requireFile = \Closure::bind(static function ($fileIdentifier, $file) {
if (empty($GLOBALS['__composer_autoload_files'][$fileIdentifier])) {
$GLOBALS['__composer_autoload_files'][$fileIdentifier] = true;
require $file;
}
}, null, null);
foreach ($filesToLoad as $fileIdentifier => $file) {
$requireFile($fileIdentifier, $file);
}
return $loader;
}
}

View File

@ -0,0 +1,54 @@
<?php
// autoload_static.php @generated by Composer
namespace Composer\Autoload;
class ComposerStaticInit6693564509f9a3fa6ed2c7bf76fdb017
{
public static $files = array (
'290dd4ba42f11019134caca05dbefe3f' => __DIR__ . '/..' . '/teamtnt/tntsearch/helper/helpers.php',
);
public static $prefixLengthsPsr4 = array (
'T' =>
array (
'TeamTNT\\TNTSearch\\' => 18,
),
'G' =>
array (
'Grav\\Plugin\\TNTSearch\\' => 22,
'Grav\\Plugin\\Console\\' => 20,
),
);
public static $prefixDirsPsr4 = array (
'TeamTNT\\TNTSearch\\' =>
array (
0 => __DIR__ . '/..' . '/teamtnt/tntsearch/src',
),
'Grav\\Plugin\\TNTSearch\\' =>
array (
0 => __DIR__ . '/../..' . '/classes',
),
'Grav\\Plugin\\Console\\' =>
array (
0 => __DIR__ . '/../..' . '/cli',
),
);
public static $classMap = array (
'Composer\\InstalledVersions' => __DIR__ . '/..' . '/composer/InstalledVersions.php',
'Grav\\Plugin\\TNTSearchPlugin' => __DIR__ . '/../..' . '/tntsearch.php',
);
public static function getInitializer(ClassLoader $loader)
{
return \Closure::bind(function () use ($loader) {
$loader->prefixLengthsPsr4 = ComposerStaticInit6693564509f9a3fa6ed2c7bf76fdb017::$prefixLengthsPsr4;
$loader->prefixDirsPsr4 = ComposerStaticInit6693564509f9a3fa6ed2c7bf76fdb017::$prefixDirsPsr4;
$loader->classMap = ComposerStaticInit6693564509f9a3fa6ed2c7bf76fdb017::$classMap;
}, null, ClassLoader::class);
}
}

View File

@ -0,0 +1,87 @@
{
"packages": [
{
"name": "teamtnt/tntsearch",
"version": "v2.9.0",
"version_normalized": "2.9.0.0",
"source": {
"type": "git",
"url": "https://github.com/teamtnt/tntsearch.git",
"reference": "ccedae0cfe21f7831f2dd1f973cf8904dad42d8d"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/teamtnt/tntsearch/zipball/ccedae0cfe21f7831f2dd1f973cf8904dad42d8d",
"reference": "ccedae0cfe21f7831f2dd1f973cf8904dad42d8d",
"shasum": ""
},
"require": {
"ext-mbstring": "*",
"ext-pdo_sqlite": "*",
"ext-sqlite3": "*",
"php": "~7.1|^8"
},
"require-dev": {
"phpunit/phpunit": "7.*|8.*|9.*",
"symfony/var-dumper": "^4|^5.2"
},
"time": "2022-02-22T10:35:34+00:00",
"type": "library",
"installation-source": "dist",
"autoload": {
"files": [
"helper/helpers.php"
],
"psr-4": {
"TeamTNT\\TNTSearch\\": "src"
}
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"MIT"
],
"authors": [
{
"name": "Nenad Tičarić",
"email": "nticaric@gmail.com",
"homepage": "http://www.tntstudio.us",
"role": "Developer"
}
],
"description": "A fully featured full text search engine written in PHP",
"homepage": "https://github.com/teamtnt/tntsearch",
"keywords": [
"Fuzzy search",
"bm25",
"fulltext",
"geosearch",
"search",
"stemming",
"teamtnt",
"text classification",
"tntsearch"
],
"support": {
"issues": "https://github.com/teamtnt/tntsearch/issues",
"source": "https://github.com/teamtnt/tntsearch/tree/v2.9.0"
},
"funding": [
{
"url": "https://ko-fi.com/nticaric",
"type": "ko_fi"
},
{
"url": "https://opencollective.com/tntsearch",
"type": "open_collective"
},
{
"url": "https://www.patreon.com/nticaric",
"type": "patreon"
}
],
"install-path": "../teamtnt/tntsearch"
}
],
"dev": true,
"dev-package-names": []
}

View File

@ -0,0 +1,32 @@
<?php return array(
'root' => array(
'name' => 'trilbymedia/grav-plugin-tntsearch',
'pretty_version' => 'dev-develop',
'version' => 'dev-develop',
'reference' => '60562d62856c114f23c183f7873fe1c809f4c7b5',
'type' => 'grav-plugin',
'install_path' => __DIR__ . '/../../',
'aliases' => array(),
'dev' => true,
),
'versions' => array(
'teamtnt/tntsearch' => array(
'pretty_version' => 'v2.9.0',
'version' => '2.9.0.0',
'reference' => 'ccedae0cfe21f7831f2dd1f973cf8904dad42d8d',
'type' => 'library',
'install_path' => __DIR__ . '/../teamtnt/tntsearch',
'aliases' => array(),
'dev_requirement' => false,
),
'trilbymedia/grav-plugin-tntsearch' => array(
'pretty_version' => 'dev-develop',
'version' => 'dev-develop',
'reference' => '60562d62856c114f23c183f7873fe1c809f4c7b5',
'type' => 'grav-plugin',
'install_path' => __DIR__ . '/../../',
'aliases' => array(),
'dev_requirement' => false,
),
),
);

View File

@ -0,0 +1,26 @@
<?php
// platform_check.php @generated by Composer
$issues = array();
if (!(PHP_VERSION_ID >= 70103)) {
$issues[] = 'Your Composer dependencies require a PHP version ">= 7.1.3". You are running ' . PHP_VERSION . '.';
}
if ($issues) {
if (!headers_sent()) {
header('HTTP/1.1 500 Internal Server Error');
}
if (!ini_get('display_errors')) {
if (PHP_SAPI === 'cli' || PHP_SAPI === 'phpdbg') {
fwrite(STDERR, 'Composer detected issues in your platform:' . PHP_EOL.PHP_EOL . implode(PHP_EOL, $issues) . PHP_EOL.PHP_EOL);
} elseif (!headers_sent()) {
echo 'Composer detected issues in your platform:' . PHP_EOL.PHP_EOL . str_replace('You are running '.PHP_VERSION.'.', '', implode(PHP_EOL, $issues)) . PHP_EOL.PHP_EOL;
}
}
trigger_error(
'Composer detected issues in your platform: ' . implode(' ', $issues),
E_USER_ERROR
);
}

View File

@ -0,0 +1,3 @@
open_collective: tntsearch
patreon: nticaric
ko_fi: nticaric

View File

@ -0,0 +1,18 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 240
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
- security
- PR
# Label to use when marking an issue as stale
staleLabel: wontfix
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false

View File

@ -0,0 +1,8 @@
.idea/*
vendor
examples
.DS_Store
composer.lock
coverage
tests/_files/*.index
.phpunit.result.cache

View File

@ -0,0 +1,22 @@
language: php
php:
- 7.1
- 7.2
- 7.3
- 7.4
- 8.0
addons:
code_climate:
repo_token: e43f1f89afb5a2f6acfaea42a6a9ebd8d33538208fafa8636826c173b3f7ec26
script:
- vendor/bin/phpunit
before_script:
- composer self-update
- composer install
after_script:
- vendor/bin/test-reporter

View File

@ -0,0 +1,22 @@
# Changelog
All Notable changes to `tntsearch` will be documented in this file.
Updates should follow the [Keep a CHANGELOG](http://keepachangelog.com/) principles.
## NEXT - YYYY-MM-DD
### Added
- Nothing
### Deprecated
- Nothing
### Fixed
- Nothing
### Removed
- Nothing
### Security
- Nothing

View File

@ -0,0 +1,46 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at info@tntstudio.hr. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/

View File

@ -0,0 +1,22 @@
# Contributor Code of Conduct
As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, or nationality.
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery
* Personal attacks
* Trolling or insulting/derogatory comments
* Public or private harassment
* Publishing other's private information, such as physical or electronic addresses, without explicit permission
* Other unethical or unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. By adopting this Code of Conduct, project maintainers commit themselves to fairly and consistently applying these principles to every aspect of managing this project. Project maintainers who do not follow or enforce the Code of Conduct may be permanently removed from the project team.
This code of conduct applies both within project spaces and in public spaces when an individual is representing the project or its community in a direct capacity. Personal views, beliefs and values of individuals do not necessarily reflect those of the organisation or affiliated individuals and organisations.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the [Contributor Covenant](http://contributor-covenant.org), version 1.2.0, available at [http://contributor-covenant.org/version/1/2/0/](http://contributor-covenant.org/version/1/2/0/)

View File

@ -0,0 +1,32 @@
# Contributing
Contributions are **welcome** and will be fully **credited**.
We accept contributions via Pull Requests on [Github](https://github.com/teamtnt/tntsearch).
## Pull Requests
- **[PSR-2 Coding Standard](https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-2-coding-style-guide.md)** - The easiest way to apply the conventions is to install [PHP Code Sniffer](http://pear.php.net/package/PHP_CodeSniffer).
- **Add tests!** - Your patch won't be accepted if it doesn't have tests.
- **Document any change in behaviour** - Make sure the `README.md` and any other relevant documentation are kept up-to-date.
- **Consider our release cycle** - We try to follow [SemVer v2.0.0](http://semver.org/). Randomly breaking public APIs is not an option.
- **Create feature branches** - Don't ask us to pull from your master branch.
- **One pull request per feature** - If you want to do more than one thing, send multiple pull requests.
- **Send coherent history** - Make sure each individual commit in your pull request is meaningful. If you had to make multiple intermediate commits while developing, please [squash them](http://www.git-scm.com/book/en/v2/Git-Tools-Rewriting-History#Changing-Multiple-Commit-Messages) before submitting.
## Running Tests
``` bash
$ composer test
```
**Happy coding**!

View File

@ -0,0 +1,21 @@
# The MIT License (MIT)
Copyright (c) 2016 Nenad Tičarić <nticaric@gmail.com>
> Permission is hereby granted, free of charge, to any person obtaining a copy
> of this software and associated documentation files (the "Software"), to deal
> in the Software without restriction, including without limitation the rights
> to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> copies of the Software, and to permit persons to whom the Software is
> furnished to do so, subject to the following conditions:
>
> The above copyright notice and this permission notice shall be included in
> all copies or substantial portions of the Software.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> THE SOFTWARE.

View File

@ -0,0 +1,9 @@
# PS4Ware
TNTSearch is PS4Ware: it's free to use, but if it makes to production
we'd appreciate a PS4 game.
### [Helm und Walter Team](https://helmundwalter.de/)
![The Long Dark](https://user-images.githubusercontent.com/824840/66302347-0e8af800-e8f9-11e9-96d2-4bbf58532f34.png)

View File

@ -0,0 +1,380 @@
[![Latest Version on Packagist][ico-version]][link-packagist]
[![Total Downloads][ico-downloads]][link-downloads]
[![Software License][ico-license]](LICENSE.md)
[![Build Status](https://img.shields.io/travis/teamtnt/tntsearch/master.svg?style=flat-square)](https://travis-ci.org/teamtnt/tntsearch)
[![Slack Status](https://img.shields.io/badge/slack-chat-E01563.svg?style=flat-square)](https://tntsearch.slack.com)
![TNTSearch](https://i.imgur.com/aYKsNYv.png)
# TNTSearch
TNTSearch is a full-text search (FTS) engine written entirely in PHP. A simple configuration allows you to add an amazing search experience in just minutes. Features include:
* Fuzzy search
* Search as you type
* Geo-search
* Text classification
* Stemming
* Custom tokenizers
* Bm25 ranking algorithm
* Boolean search
* Result highlighting
* Dynamic index updates (no need to reindex each time)
* Easily deployable via Packagist.org
We also created some demo pages that show tolerant retrieval with n-grams in action.
The package has a bunch of helper functions like Jaro-Winkler and Cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built-in stemmers aren't enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. And please contribute other languages!
Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.
**View** [online demo](http://tntsearch.tntstudio.us/) &nbsp;|&nbsp; **Follow us** on
[Twitter](https://twitter.com/tntstudiohr),
or [Facebook](https://www.facebook.com/tntstudiohr) &nbsp;|&nbsp;
**Visit our sponsors**:
<p align="center">
<a href="https://m.do.co/c/ddfc227b7d18" target="_blank">
<img src="https://images.prismic.io/www-static/49aa0a09-06d2-4bba-ad20-4bcbe56ac507_logo.png?auto=compress,format" width="196.5" height="32">
</a>
</p>
---
## Demo
* [TV Shows Search](http://tntsearch.tntstudio.us/)
* [PHPUnit Documentation Search](http://phpunit.tntstudio.us)
* [City Search with n-grams](http://cities.tnt.studio/)
## Tutorials
* [Solving the search problem with Laravel and TNTSearch](https://tnt.studio/solving-the-search-problem-with-laravel-and-tntsearch)
* [Searching for Users with Laravel Scout and TNTSearch](https://tnt.studio/searching-for-users-with-laravel-scout-and-tntsearch)
## Premium products
If you're using TNT Search and finding it useful, take a look at our premium analytics tool:
[<img src="https://i.imgur.com/ujagviB.png" width="420px" />](https://analytics.tnt.studio)
## Support us on Open Collective
- [TNTSearch](https://opencollective.com/tntsearch)
## Installation
The easiest way to install TNTSearch is via [composer](http://getcomposer.org/):
```
composer require teamtnt/tntsearch
```
## Requirements
Before you proceed, make sure your server meets the following requirements:
* PHP >= 7.1
* PDO PHP Extension
* SQLite PHP Extension
* mbstring PHP Extension
## Examples
### Creating an index
In order to be able to make full text search queries, you have to create an index.
Usage:
```php
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig([
'driver' => 'mysql',
'host' => 'localhost',
'database' => 'dbname',
'username' => 'user',
'password' => 'pass',
'storage' => '/var/www/tntsearch/examples/',
'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional
]);
$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');
//$indexer->setLanguage('german');
$indexer->run();
```
Important: "storage" settings marks the folder where all of your indexes
will be saved so make sure to have permission to write to this folder otherwise
you might expect the following exception thrown:
* [PDOException] SQLSTATE[HY000] [14] unable to open database file *
Note: If your primary key is different than `id` set it like:
```php
$indexer->setPrimaryKey('article_id');
```
### Making the primary key searchable
By default, the primary key isn't searchable. If you want to make it searchable, simply run:
```php
$indexer->includePrimaryKey();
```
### Searching
Searching for a phrase or keyword is trivial:
```php
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$res = $tnt->search("This is a test search", 12);
print_r($res); //returns an array of 12 document ids that best match your query
// to display the results you need an additional query against your application database
// SELECT * FROM articles WHERE id IN $res ORDER BY FIELD(id, $res);
```
The ORDER BY FIELD clause is important, otherwise the database engine will not return
the results in the required order.
### Boolean Search
```php
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
//this will return all documents that have romeo in it but not juliet
$res = $tnt->searchBoolean("romeo -juliet");
//returns all documents that have romeo or hamlet in it
$res = $tnt->searchBoolean("romeo or hamlet");
//returns all documents that have either romeo AND juliet or prince AND hamlet
$res = $tnt->searchBoolean("(romeo juliet) or (prince hamlet)");
```
### Fuzzy Search
The fuzziness can be tweaked by setting the following member variables:
```php
public $fuzzy_prefix_length = 2;
public $fuzzy_max_expansions = 50;
public $fuzzy_distance = 2; //represents the Levenshtein distance;
```
```php
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$tnt->fuzziness = true;
//when the fuzziness flag is set to true, the keyword juleit will return
//documents that match the word juliet, the default Levenshtein distance is 2
$res = $tnt->search("juleit");
```
## Updating the index
Once you created an index, you don't need to reindex it each time you make some changes
to your document collection. TNTSearch supports dynamic index updates.
```php
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$index = $tnt->getIndex();
//to insert a new document to the index
$index->insert(['id' => '11', 'title' => 'new title', 'article' => 'new article']);
//to update an existing document
$index->update(11, ['id' => '11', 'title' => 'updated title', 'article' => 'updated article']);
//to delete the document from index
$index->delete(12);
```
## Custom Tokenizer
First, create your own Tokenizer class. It should extend AbstractTokenizer class, define
word split $pattern value and must implement TokenizerInterface:
``` php
use TeamTNT\TNTSearch\Support\AbstractTokenizer;
use TeamTNT\TNTSearch\Support\TokenizerInterface;
class SomeTokenizer extends AbstractTokenizer implements TokenizerInterface
{
static protected $pattern = '/[\s,\.]+/';
public function tokenize($text) {
return preg_split($this->getPattern(), strtolower($text), -1, PREG_SPLIT_NO_EMPTY);
}
}
```
This tokenizer will split words using spaces, commas and periods.
After you have the tokenizer ready, you should pass it to `TNTIndexer` via `setTokenizer` method.
``` php
$someTokenizer = new SomeTokenizer;
$indexer = new TNTIndexer;
$indexer->setTokenizer($someTokenizer);
```
Another way would be to pass the tokenizer via config:
```php
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig([
'driver' => 'mysql',
'host' => 'localhost',
'database' => 'dbname',
'username' => 'user',
'password' => 'pass',
'storage' => '/var/www/tntsearch/examples/',
'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional,
'tokenizer' => \TeamTNT\TNTSearch\Support\SomeTokenizer::class
]);
$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');
$indexer->run();
```
## Geo Search
### Indexing
```php
$candyShopIndexer = new TNTGeoIndexer;
$candyShopIndexer->loadConfig($config);
$candyShopIndexer->createIndex('candyShops.index');
$candyShopIndexer->query('SELECT id, longitude, latitude FROM candy_shops;');
$candyShopIndexer->run();
```
### Searching
```php
$currentLocation = [
'longitude' => 11.576124,
'latitude' => 48.137154
];
$distance = 2; //km
$candyShopIndex = new TNTGeoSearch();
$candyShopIndex->loadConfig($config);
$candyShopIndex->selectIndex('candyShops.index');
$candyShops = $candyShopIndex->findNearest($currentLocation, $distance, 10);
```
## Classification
```php
use TeamTNT\TNTSearch\Classifier\TNTClassifier;
$classifier = new TNTClassifier();
$classifier->learn("A great game", "Sports");
$classifier->learn("The election was over", "Not sports");
$classifier->learn("Very clean match", "Sports");
$classifier->learn("A clean but forgettable game", "Sports");
$guess = $classifier->predict("It was a close election");
var_dump($guess['label']); //returns "Not sports"
```
### Saving the classifier
```php
$classifier->save('sports.cls');
```
### Loading the classifier
```php
$classifier = new TNTClassifier();
$classifier->load('sports.cls');
```
## Drivers
* [TNTSearch Driver for Laravel Scout](https://github.com/teamtnt/laravel-scout-tntsearch-driver)
## PS4Ware
You're free to use this package, but if it makes it to your production environment, we would highly appreciate you sending us a PS4 game of your choice. This way you support us to further develop and add new features.
Our address is: TNT Studio, Sv. Mateja 19, 10010 Zagreb, Croatia.
We'll publish all received games [here][link-ps4ware]
[link-ps4ware]: https://github.com/teamtnt/tntsearch/blob/master/PS4Ware.md
## Support [![OpenCollective](https://opencollective.com/tntsearch/backers/badge.svg)](#backers) [![OpenCollective](https://opencollective.com/tntsearch/sponsors/badge.svg)](#sponsors)
<a href='https://ko-fi.com/O4O3K2R9' target='_blank'><img height='36' style='border:0px;height:36px;' src='https://az743702.vo.msecnd.net/cdn/kofi4.png?v=0' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
### Backers
Support us with a monthly donation and help us continue our activities. [[Become a backer](https://opencollective.com/tntsearch#backer)]
## Sponsors
Become a sponsor and get your logo on our README on Github with a link to your site. [[Become a sponsor](https://opencollective.com/tntsearch#sponsor)]
## Credits
- [Nenad Tičarić][link-author]
- [All Contributors][link-contributors]
## License
The MIT License (MIT). Please see [License File](LICENSE.md) for more information.
[ico-version]: https://img.shields.io/packagist/v/teamtnt/tntsearch.svg?style=flat-square
[ico-license]: https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square
[ico-downloads]: https://img.shields.io/packagist/dt/teamtnt/tntsearch.svg?style=flat-square
[link-packagist]: https://packagist.org/packages/teamtnt/tntsearch
[link-downloads]: https://packagist.org/packages/teamtnt/tntsearch
[link-author]: https://github.com/nticaric
[link-contributors]: ../../contributors
---
From Croatia with ♥ by TNT Studio ([@tntstudiohr](https://twitter.com/tntstudiohr), [blog](https://tnt.studio))

View File

@ -0,0 +1,42 @@
{
"name": "teamtnt/tntsearch",
"type": "library",
"description": "A fully featured full text search engine written in PHP",
"keywords": [
"teamtnt",
"tntsearch",
"search",
"fulltext",
"geosearch",
"text classification",
"bm25",
"stemming",
"fuzzy search"
],
"homepage": "https://github.com/teamtnt/tntsearch",
"license": "MIT",
"authors": [{
"name": "Nenad Tičarić",
"email": "nticaric@gmail.com",
"homepage": "http://www.tntstudio.us",
"role": "Developer"
}],
"require": {
"php": "~7.1|^8",
"ext-pdo_sqlite": "*",
"ext-sqlite3": "*",
"ext-mbstring": "*"
},
"require-dev": {
"phpunit/phpunit": "7.*|8.*|9.*",
"symfony/var-dumper": "^4|^5.2"
},
"autoload": {
"psr-4": {
"TeamTNT\\TNTSearch\\": "src"
},
"files": [
"helper/helpers.php"
]
}
}

View File

@ -0,0 +1,25 @@
<?php
if (!function_exists('stringEndsWith')) {
function stringEndsWith($haystack, $needle)
{
// search forward starting from end minus needle length characters
return $needle === "" || (($temp = strlen($haystack) - strlen($needle)) >= 0 && strpos($haystack, $needle, $temp) !== false);
}
}
if (!function_exists('fuzzyMatch')) {
function fuzzyMatch($pattern, $items)
{
$fm = new TeamTNT\TNTSearch\TNTFuzzyMatch;
return $fm->fuzzyMatch($pattern, $items);
}
}
if (!function_exists('fuzzyMatchFromFile')) {
function fuzzyMatchFromFile($pattern, $path)
{
$fm = new TeamTNT\TNTSearch\TNTFuzzyMatch;
return $fm->fuzzyMatchFromFile($pattern, $path);
}
}

View File

@ -0,0 +1,15 @@
<?php
/*
|--------------------------------------------------------------------------
| Register The Composer Auto Loader
|--------------------------------------------------------------------------
|
| Composer provides a convenient, automatically generated class loader
| for our application. We just need to utilize it! We'll require it
| into the script here so that we do not have to worry about the
| loading of any our classes "manually". Feels great to relax.
|
*/
require __DIR__ . '/vendor/autoload.php';

View File

@ -0,0 +1,29 @@
<?xml version="1.0" encoding="UTF-8"?>
<phpunit backupGlobals="false"
backupStaticAttributes="false"
bootstrap="phpunit.php"
colors="true"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="true"
processIsolation="false"
stopOnFailure="true"
>
<testsuites>
<testsuite name="TNTSearch Test Suite">
<directory>./tests/</directory>
</testsuite>
</testsuites>
<!-- <filter>
<whitelist>
<directory suffix=".php">./src/</directory>
</whitelist>
</filter>
<logging>
<log type="coverage-html" target="./coverage" charset="UTF-8"
yui="true" highlight="true"
lowUpperBound="50" highLowerBound="80"/>
<log type="testdox-html" target="./coverage/testdox.html" />
</logging> -->
</phpunit>

View File

@ -0,0 +1,131 @@
<?php
namespace TeamTNT\TNTSearch\Classifier;
use TeamTNT\TNTSearch\Stemmer\NoStemmer;
use TeamTNT\TNTSearch\Support\Tokenizer;
class TNTClassifier
{
public $documents = [];
public $words = [];
public $types = [];
public $tokenizer = null;
public $stemmer = null;
protected $arraySumOfWordType = null;
protected $arraySumOfDocuments = null;
public function __construct()
{
$this->tokenizer = new Tokenizer;
$this->stemmer = new NoStemmer;
}
public function predict($statement)
{
$words = $this->tokenizer->tokenize($statement);
$best_likelihood = -INF;
$best_type = '';
foreach ($this->types as $type) {
$likelihood = log($this->pTotal($type)); // calculate P(Type)
$p = 0;
foreach ($words as $word) {
$word = $this->stemmer->stem($word);
$p += log($this->p($word, $type));
}
$likelihood += $p; // calculate P(word, Type)
if ($likelihood > $best_likelihood) {
$best_likelihood = $likelihood;
$best_type = $type;
}
}
return [
'likelihood' => $best_likelihood,
'label' => $best_type
];
}
public function learn($statement, $type)
{
if (!in_array($type, $this->types)) {
$this->types[] = $type;
}
$words = $this->tokenizer->tokenize($statement);
foreach ($words as $word) {
$word = $this->stemmer->stem($word);
if (!isset($this->words[$type][$word])) {
$this->words[$type][$word] = 0;
}
$this->words[$type][$word]++; // increment the word count for the type
}
if (!isset($this->documents[$type])) {
$this->documents[$type] = 0;
}
$this->documents[$type]++; // increment the document count for the type
}
public function p($word, $type)
{
$count = 0;
if (isset($this->words[$type][$word])) {
$count = $this->words[$type][$word];
}
if (!isset($this->arraySumOfWordType[$type])) {
$this->arraySumOfWordType[$type] = array_sum($this->words[$type]);
}
return ($count + 1) / ($this->arraySumOfWordType[$type] + $this->vocabularyCount());
}
public function pTotal($type)
{
if (!isset($this->arraySumOfDocuments)) {
$this->arraySumOfDocuments = array_sum($this->documents);
}
return ($this->documents[$type]) / $this->arraySumOfDocuments;
}
public function vocabularyCount()
{
if (isset($this->vc)) {
return $this->vc;
}
$words = [];
foreach ($this->words as $key => $value) {
foreach ($this->words[$key] as $word => $count) {
$words[$word] = 0;
}
}
$this->vc = count($words);
return $this->vc;
}
public function save($path)
{
$s = serialize($this);
return file_put_contents($path, $s);
}
public function load($name)
{
$s = file_get_contents($name);
$classifier = unserialize($s);
unset($this->vc);
unset($this->arraySumOfDocuments);
unset($this->arraySumOfWordType);
$this->documents = $classifier->documents;
$this->words = $classifier->words;
$this->types = $classifier->types;
$this->tokenizer = $classifier->tokenizer;
$this->stemmer = $classifier->stemmer;
}
}

View File

@ -0,0 +1,77 @@
<?php
namespace TeamTNT\TNTSearch\Connectors;
use PDO;
class Connector
{
/**
* The default PDO connection options.
*
* @var array
*/
protected $options = [
PDO::ATTR_CASE => PDO::CASE_NATURAL,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_ORACLE_NULLS => PDO::NULL_NATURAL,
PDO::ATTR_STRINGIFY_FETCHES => false,
PDO::ATTR_EMULATE_PREPARES => false,
];
/**
* Get the PDO options based on the configuration.
*
* @param array $config
* @return array
*/
public function getOptions(array $config)
{
return $this->options;
}
/**
* Create a new PDO connection.
*
* @param string $dsn
* @param array $config
* @param array $options
* @return \PDO
*/
public function createConnection($dsn, array $config, array $options)
{
extract($config, EXTR_SKIP);
if (!array_key_exists('username', $config)) {
$username = null;
}
if (!array_key_exists('password', $config)) {
$password = null;
}
return new PDO($dsn, $username, $password, $options);
}
/**
* Get the default PDO connection options.
*
* @return array
*/
public function getDefaultOptions()
{
return $this->options;
}
/**
* Set the default PDO connection options.
*
* @param array $options
* @return void
*/
public function setDefaultOptions(array $options)
{
$this->options = $options;
}
}

View File

@ -0,0 +1,14 @@
<?php
namespace TeamTNT\TNTSearch\Connectors;
interface ConnectorInterface
{
/**
* Establish a database connection.
*
* @param array $config
* @return \PDO
*/
public function connect(array $config);
}

View File

@ -0,0 +1,21 @@
<?php
namespace TeamTNT\TNTSearch\Connectors;
use Exception;
class FileSystemConnector extends Connector implements ConnectorInterface
{
/**
* Establish a database connection.
*
* @param array $config
* @return \PDO
*
* @throws \InvalidArgumentException
*/
public function connect(array $config)
{
}
}

View File

@ -0,0 +1,139 @@
<?php
namespace TeamTNT\TNTSearch\Connectors;
use PDO;
class MySqlConnector extends Connector implements ConnectorInterface
{
/**
* Establish a database connection.
*
* @param array $config
* @return \PDO
*/
public function connect(array $config)
{
$dsn = $this->getDsn($config);
$options = $this->getOptions($config);
// We need to grab the PDO options that should be used while making the brand
// new connection instance. The PDO options control various aspects of the
// connection's behavior, and some might be specified by the developers.
$connection = $this->createConnection($dsn, $config, $options);
if (! empty($config['database'])) {
$connection->exec("use `{$config['database']}`;");
}
$collation = 'utf8_unicode_ci';
if (! empty($config['collation'])) {
$collation = $config['collation'];
}
// Next we will set the "names" and "collation" on the clients connections so
// a correct character set will be used by this client. The collation also
// is set on the server but needs to be set here on this client objects.
if (isset($config['charset'])) {
$charset = $config['charset'];
$names = "set names '{$charset}'".
(! is_null($collation) ? " collate '{$collation}'" : '');
$connection->prepare($names)->execute();
}
// Next, we will check to see if a timezone has been specified in this config
// and if it has we will issue a statement to modify the timezone with the
// database. Setting this DB timezone is an optional configuration item.
if (isset($config['timezone'])) {
$connection->prepare(
'set time_zone="'.$config['timezone'].'"'
)->execute();
}
$this->setModes($connection, $config);
return $connection;
}
public function getOptions(array $config)
{
return array_merge(parent::getOptions($config), [
PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false,
]);
}
/**
* Create a DSN string from a configuration.
*
* Chooses socket or host/port based on the 'unix_socket' config value.
*
* @param array $config
* @return string
*/
protected function getDsn(array $config)
{
return $this->configHasSocket($config) ? $this->getSocketDsn($config) : $this->getHostDsn($config);
}
/**
* Determine if the given configuration array has a UNIX socket value.
*
* @param array $config
* @return bool
*/
protected function configHasSocket(array $config)
{
return isset($config['unix_socket']) && ! empty($config['unix_socket']);
}
/**
* Get the DSN string for a socket configuration.
*
* @param array $config
* @return string
*/
protected function getSocketDsn(array $config)
{
return "mysql:unix_socket={$config['unix_socket']};dbname={$config['database']}";
}
/**
* Get the DSN string for a host / port configuration.
*
* @param array $config
* @return string
*/
protected function getHostDsn(array $config)
{
extract($config, EXTR_SKIP);
return isset($port)
? "mysql:host={$host};port={$port};dbname={$database}"
: "mysql:host={$host};dbname={$database}";
}
/**
* Set the modes for the connection.
*
* @param \PDO $connection
* @param array $config
* @return void
*/
protected function setModes(PDO $connection, array $config)
{
if (isset($config['modes'])) {
$modes = implode(',', $config['modes']);
$connection->prepare("set session sql_mode='{$modes}'")->execute();
} elseif (isset($config['strict'])) {
if ($config['strict']) {
$connection->prepare("set session sql_mode='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'")->execute();
} else {
$connection->prepare("set session sql_mode='NO_ENGINE_SUBSTITUTION'")->execute();
}
}
}
}

View File

@ -0,0 +1,121 @@
<?php
namespace TeamTNT\TNTSearch\Connectors;
use PDO;
class PostgresConnector extends Connector implements ConnectorInterface
{
/**
* The default PDO connection options.
*
* @var array
*/
protected $options = [
PDO::ATTR_CASE => PDO::CASE_NATURAL,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_ORACLE_NULLS => PDO::NULL_NATURAL,
PDO::ATTR_STRINGIFY_FETCHES => false,
];
/**
* Establish a database connection.
*
* @param array $config
* @return \PDO
*/
public function connect(array $config)
{
// First we'll create the basic DSN and connection instance connecting to the
// using the configuration option specified by the developer. We will also
// set the default character set on the connections to UTF-8 by default.
$dsn = $this->getDsn($config);
$options = $this->getOptions($config);
$connection = $this->createConnection($dsn, $config, $options);
$charset = 'utf8';
if (isset($config['charset'])) {
$charset = $config['charset'];
}
$connection->prepare("set names '$charset'")->execute();
// Next, we will check to see if a timezone has been specified in this config
// and if it has we will issue a statement to modify the timezone with the
// database. Setting this DB timezone is an optional configuration item.
if (isset($config['timezone'])) {
$timezone = $config['timezone'];
$connection->prepare("set time zone '$timezone'")->execute();
}
// Unlike MySQL, Postgres allows the concept of "schema" and a default schema
// may have been specified on the connections. If that is the case we will
// set the default schema search paths to the specified database schema.
if (isset($config['schema'])) {
$schema = $this->formatSchema($config['schema']);
$connection->prepare("set search_path to {$schema}")->execute();
}
// Postgres allows an application_name to be set by the user and this name is
// used to when monitoring the application with pg_stat_activity. So we'll
// determine if the option has been specified and run a statement if so.
if (isset($config['application_name'])) {
$applicationName = $config['application_name'];
$connection->prepare("set application_name to '$applicationName'")->execute();
}
return $connection;
}
/**
* Create a DSN string from a configuration.
*
* @param array $config
* @return string
*/
protected function getDsn(array $config)
{
// First we will create the basic DSN setup as well as the port if it is in
// in the configuration options. This will give us the basic DSN we will
// need to establish the PDO connections and return them back for use.
extract($config, EXTR_SKIP);
$host = isset($host) ? "host={$host};" : '';
$dsn = "pgsql:{$host}dbname={$database}";
// If a port was specified, we will add it to this Postgres DSN connections
// format. Once we have done that we are ready to return this connection
// string back out for usage, as this has been fully constructed here.
if (isset($config['port'])) {
$dsn .= ";port={$port}";
}
if (isset($config['sslmode'])) {
$dsn .= ";sslmode={$sslmode}";
}
return $dsn;
}
/**
* Format the schema for the DSN.
*
* @param array|string $schema
* @return string
*/
protected function formatSchema($schema)
{
if (is_array($schema)) {
return '"'.implode('", "', $schema).'"';
} else {
return '"'.$schema.'"';
}
}
}

View File

@ -0,0 +1,40 @@
<?php
namespace TeamTNT\TNTSearch\Connectors;
use Exception;
class SQLiteConnector extends Connector implements ConnectorInterface
{
protected $options = [];
/**
* Establish a database connection.
*
* @param array $config
* @return \PDO
*
* @throws \InvalidArgumentException
*/
public function connect(array $config)
{
$options = $this->getOptions($config);
// SQLite supports "in-memory" databases that only last as long as the owning
// connection does. These are useful for tests or for short lifetime store
// querying. In-memory databases may only have a single open connection.
if ($config['database'] == ':memory:') {
return $this->createConnection('sqlite::memory:', $config, $options);
}
$path = realpath($config['database']);
// Here we'll verify that the SQLite database exists before going any further
// as the developer probably wants to know if the database exists and this
// SQLite driver will not throw any exception if it does not by default.
if ($path === false) {
throw new Exception("Database (${config['database']}) does not exist.");
}
return $this->createConnection("sqlite:{$path}", $config, $options);
}
}

View File

@ -0,0 +1,71 @@
<?php
namespace TeamTNT\TNTSearch\Connectors;
use PDO;
class SqlServerConnector extends Connector implements ConnectorInterface {
/**
* The PDO connection options.
*
* @var array
*/
protected $options = array(
PDO::ATTR_CASE => PDO::CASE_NATURAL,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_ORACLE_NULLS => PDO::NULL_NATURAL,
PDO::ATTR_STRINGIFY_FETCHES => false,
);
/**
* Establish a database connection.
*
* @param array $config
* @return PDO
*/
public function connect(array $config)
{
$options = $this->getOptions($config);
return $this->createConnection($this->getDsn($config), $config, $options);
}
/**
* Create a DSN string from a configuration.
*
* @param array $config
* @return string
*/
protected function getDsn(array $config)
{
extract($config);
// First we will create the basic DSN setup as well as the port if it is in
// in the configuration options. This will give us the basic DSN we will
// need to establish the PDO connections and return them back for use.
$port = isset($config['port']) ? ','.$port : '';
if (in_array('dblib', $this->getAvailableDrivers()))
{
return "dblib:host={$host}{$port};dbname={$database}";
}
else
{
$dbName = $database != '' ? ";Database={$database}" : '';
return "sqlsrv:Server={$host}{$port}{$dbName}";
}
}
/**
* Get the available PDO drivers.
*
* @return array
*/
protected function getAvailableDrivers()
{
return PDO::getAvailableDrivers();
}
}

View File

@ -0,0 +1,9 @@
<?php
namespace TeamTNT\TNTSearch\Exceptions;
use Exception;
class IndexNotFoundException extends Exception
{
}

View File

@ -0,0 +1,16 @@
<?php
namespace TeamTNT\TNTSearch\FileReaders;
use SplFileInfo;
interface FileReaderInterface
{
/**
* Read the content of a file
*
* @param SplFileInfo $fileinfo
* @return string
*/
public function read(SplFileInfo $fileinfo);
}

View File

@ -0,0 +1,16 @@
<?php
namespace TeamTNT\TNTSearch\FileReaders;
use SplFileInfo;
class TextFileReader implements FileReaderInterface
{
public $fileMapCallback = null;
public $fileFilterCallback = null;
public function read(SplFileInfo $fileinfo)
{
return file_get_contents($fileinfo);
}
}

View File

@ -0,0 +1,72 @@
<?php
namespace TeamTNT\TNTSearch\Indexer;
use PDO;
class TNTGeoIndexer extends TNTIndexer
{
public function createIndex($indexName)
{
$this->indexName = $indexName;
if (file_exists($this->config['storage'].$indexName)) {
unlink($this->config['storage'].$indexName);
}
$this->index = new PDO('sqlite:'.$this->config['storage'].$indexName);
$this->index->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$this->index->exec("CREATE TABLE IF NOT EXISTS locations (
doc_id INTEGER,
longitude REAL,
latitude REAL,
cos_lat REAL,
sin_lat REAL,
cos_lng REAL,
sin_lng REAL
)");
$this->index->exec("CREATE INDEX location_index ON locations ('longitude', 'latitude');");
$this->index->exec("CREATE TABLE IF NOT EXISTS info (key TEXT, value INTEGER)");
$connector = $this->createConnector($this->config);
if (!$this->dbh) {
$this->dbh = $connector->connect($this->config);
}
return $this;
}
public function processDocument($row)
{
$this->prepareInsertStatement();
$docId = $row->get($this->getPrimaryKey());
$longitude = $row->get('longitude');
$latitude = $row->get('latitude');
$cos_lat = cos($latitude * pi() / 180);
$sin_lat = sin($latitude * pi() / 180);
$cos_lng = cos($longitude * pi() / 180);
$sin_lng = sin($longitude * pi() / 180);
$this->insertStmt->bindParam(":doc_id", $docId);
$this->insertStmt->bindParam(":longitude", $longitude);
$this->insertStmt->bindParam(":latitude", $latitude);
$this->insertStmt->bindParam(":cos_lat", $cos_lat);
$this->insertStmt->bindParam(":sin_lat", $sin_lat);
$this->insertStmt->bindParam(":cos_lng", $cos_lng);
$this->insertStmt->bindParam(":sin_lng", $sin_lng);
$this->insertStmt->execute();
}
public function prepareInsertStatement()
{
if (isset($this->insertStmt)) {
return $this->insertStmt;
}
$this->insertStmt = $this->index->prepare("INSERT INTO locations (doc_id, longitude, latitude, cos_lat, sin_lat, cos_lng, sin_lng)
VALUES (:doc_id, :longitude, :latitude, :cos_lat, :sin_lat, :cos_lng, :sin_lng)");
}
}

View File

@ -0,0 +1,695 @@
<?php
namespace TeamTNT\TNTSearch\Indexer;
use Exception;
use PDO;
use RecursiveDirectoryIterator;
use RecursiveIteratorIterator;
use TeamTNT\TNTSearch\Connectors\FileSystemConnector;
use TeamTNT\TNTSearch\Connectors\MySqlConnector;
use TeamTNT\TNTSearch\Connectors\PostgresConnector;
use TeamTNT\TNTSearch\Connectors\SQLiteConnector;
use TeamTNT\TNTSearch\Connectors\SqlServerConnector;
use TeamTNT\TNTSearch\FileReaders\TextFileReader;
use TeamTNT\TNTSearch\Stemmer\CroatianStemmer;
use TeamTNT\TNTSearch\Stemmer\NoStemmer;
use TeamTNT\TNTSearch\Support\Collection;
use TeamTNT\TNTSearch\Support\Tokenizer;
use TeamTNT\TNTSearch\Support\TokenizerInterface;
class TNTIndexer
{
protected $index = null;
protected $dbh = null;
protected $primaryKey = null;
protected $excludePrimaryKey = true;
public $stemmer = null;
public $tokenizer = null;
public $stopWords = [];
public $filereader = null;
public $config = [];
protected $query = "";
protected $wordlist = [];
protected $inMemoryTerms = [];
protected $decodeHTMLEntities = false;
public $disableOutput = false;
public $inMemory = true;
public $steps = 1000;
public $indexName = "";
public $statementsPrepared = false;
public function __construct()
{
$this->stemmer = new NoStemmer;
$this->tokenizer = new Tokenizer;
$this->filereader = new TextFileReader;
}
/**
* @param TokenizerInterface $tokenizer
*/
public function setTokenizer(TokenizerInterface $tokenizer)
{
$this->tokenizer = $tokenizer;
$this->updateInfoTable('tokenizer', get_class($tokenizer));
}
public function setStopWords(array $stopWords)
{
$this->stopWords = $stopWords;
}
/**
* @param array $config
*/
public function loadConfig(array $config)
{
$this->config = $config;
$this->config['storage'] = rtrim($this->config['storage'], '/').'/';
if (!isset($this->config['driver'])) {
$this->config['driver'] = "";
}
if (!isset($this->config['wal'])) {
$this->config['wal'] = true;
}
}
/**
* @return string
*/
public function getStoragePath()
{
return $this->config['storage'];
}
public function getStemmer()
{
return $this->stemmer;
}
/**
* @return string
*/
public function getPrimaryKey()
{
if (isset($this->primaryKey)) {
return $this->primaryKey;
}
return 'id';
}
/**
* @param string $primaryKey
*/
public function setPrimaryKey($primaryKey)
{
$this->primaryKey = $primaryKey;
}
public function excludePrimaryKey()
{
$this->excludePrimaryKey = true;
}
public function includePrimaryKey()
{
$this->excludePrimaryKey = false;
}
public function setStemmer($stemmer)
{
$this->stemmer = $stemmer;
$this->updateInfoTable('stemmer', get_class($stemmer));
}
public function setCroatianStemmer()
{
$this->setStemmer(new CroatianStemmer);
}
/**
* @param string $language - one of: no, arabic, croatian, german, italian, porter, portuguese, russian, ukrainian
*/
public function setLanguage($language = 'no')
{
$class = 'TeamTNT\\TNTSearch\\Stemmer\\'.ucfirst(strtolower($language)).'Stemmer';
$this->setStemmer(new $class);
}
/**
* @param PDO $index
*/
public function setIndex($index)
{
$this->index = $index;
}
public function setFileReader($filereader)
{
$this->filereader = $filereader;
}
public function prepareStatementsForIndex()
{
if (!$this->statementsPrepared) {
$this->insertWordlistStmt = $this->index->prepare("INSERT INTO wordlist (term, num_hits, num_docs) VALUES (:keyword, :hits, :docs)");
$this->selectWordlistStmt = $this->index->prepare("SELECT * FROM wordlist WHERE term like :keyword LIMIT 1");
$this->updateWordlistStmt = $this->index->prepare("UPDATE wordlist SET num_docs = num_docs + :docs, num_hits = num_hits + :hits WHERE term = :keyword");
$this->statementsPrepared = true;
}
}
/**
* @param string $indexName
*
* @return TNTIndexer
*/
public function createIndex($indexName)
{
$this->indexName = $indexName;
if (file_exists($this->config['storage'].$indexName)) {
unlink($this->config['storage'].$indexName);
}
$this->index = new PDO('sqlite:'.$this->config['storage'].$indexName);
$this->index->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
if ($this->config['wal']) {
$this->index->exec("PRAGMA journal_mode=wal;");
}
$this->index->exec("CREATE TABLE IF NOT EXISTS wordlist (
id INTEGER PRIMARY KEY,
term TEXT UNIQUE COLLATE nocase,
num_hits INTEGER,
num_docs INTEGER)");
$this->index->exec("CREATE UNIQUE INDEX 'main'.'index' ON wordlist ('term');");
$this->index->exec("CREATE TABLE IF NOT EXISTS doclist (
term_id INTEGER,
doc_id INTEGER,
hit_count INTEGER)");
$this->index->exec("CREATE TABLE IF NOT EXISTS fields (
id INTEGER PRIMARY KEY,
name TEXT)");
$this->index->exec("CREATE TABLE IF NOT EXISTS hitlist (
term_id INTEGER,
doc_id INTEGER,
field_id INTEGER,
position INTEGER,
hit_count INTEGER)");
$this->index->exec("CREATE TABLE IF NOT EXISTS info (
key TEXT,
value INTEGER)");
$this->index->exec("INSERT INTO info ( 'key', 'value') values ( 'total_documents', 0)");
$this->index->exec("INSERT INTO info ( 'key', 'value') values ( 'stemmer', 'TeamTNT\TNTSearch\Stemmer\NoStemmer')");
$this->index->exec("INSERT INTO info ( 'key', 'value') values ( 'tokenizer', 'TeamTNT\TNTSearch\Support\Tokenizer')");
$this->index->exec("CREATE INDEX IF NOT EXISTS 'main'.'term_id_index' ON doclist ('term_id' COLLATE BINARY);");
$this->index->exec("CREATE INDEX IF NOT EXISTS 'main'.'doc_id_index' ON doclist ('doc_id');");
if (isset($this->config['stemmer'])) {
$this->setStemmer(new $this->config['stemmer']);
}
if (isset($this->config['tokenizer'])) {
$this->setTokenizer(new $this->config['tokenizer']);
}
if (!$this->dbh) {
$connector = $this->createConnector($this->config);
$this->dbh = $connector->connect($this->config);
}
return $this;
}
public function indexBeginTransaction()
{
$this->index->beginTransaction();
}
public function indexEndTransaction()
{
$this->index->commit();
}
/**
* @param array $config
*
* @return FileSystemConnector|MySqlConnector|PostgresConnector|SQLiteConnector|SqlServerConnector
* @throws Exception
*/
public function createConnector(array $config)
{
if (!isset($config['driver'])) {
throw new Exception('A driver must be specified.');
}
switch ($config['driver']) {
case 'mysql':
return new MySqlConnector;
case 'pgsql':
return new PostgresConnector;
case 'sqlite':
return new SQLiteConnector;
case 'sqlsrv':
return new SqlServerConnector;
case 'filesystem':
return new FileSystemConnector;
}
throw new Exception("Unsupported driver [{$config['driver']}]");
}
/**
* @param PDO $dbh
*/
public function setDatabaseHandle(PDO $dbh)
{
$this->dbh = $dbh;
if ($this->dbh->getAttribute(PDO::ATTR_DRIVER_NAME) == 'mysql') {
$this->dbh->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
}
}
public function query($query)
{
$this->query = $query;
}
public function run()
{
if ($this->config['driver'] == "filesystem") {
return $this->readDocumentsFromFileSystem();
}
$result = $this->dbh->query($this->query);
$counter = 0;
$this->index->beginTransaction();
while ($row = $result->fetch(PDO::FETCH_ASSOC)) {
$counter++;
$this->processDocument(new Collection($row));
if ($counter % $this->steps == 0) {
$this->info("Processed $counter rows");
}
if ($counter % 10000 == 0) {
$this->index->commit();
$this->index->beginTransaction();
$this->info("Committed");
}
}
$this->index->commit();
$this->updateInfoTable('total_documents', $counter);
$this->info("Total rows $counter");
}
public function readDocumentsFromFileSystem()
{
$exclude = [];
if (isset($this->config['exclude'])) {
$exclude = $this->config['exclude'];
}
$this->index->exec("CREATE TABLE IF NOT EXISTS filemap (
id INTEGER PRIMARY KEY,
path TEXT)");
$path = realpath($this->config['location']);
$objects = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST);
$this->index->beginTransaction();
$counter = 0;
foreach ($objects as $name => $object) {
$name = str_replace($path.'/', '', $name);
if (is_callable($this->config['extension'])) {
$includeFile = $this->config['extension']($object);
} elseif (is_array($this->config['extension'])) {
$includeFile = in_array($object->getExtension(), $this->config['extension']);
} else {
$includeFile = stringEndsWith($name, $this->config['extension']);
}
if ($includeFile && !in_array($name, $exclude)) {
$counter++;
$file = [
'id' => $counter,
'name' => $name,
'content' => $this->filereader->read($object)
];
$fileCollection = new Collection($file);
if (property_exists($this->filereader, 'fileFilterCallback')
&& is_callable($this->filereader->fileFilterCallback)) {
$fileCollection = $fileCollection->filter($this->filereader->fileFilterCallback);
}
if (property_exists($this->filereader, 'fileMapCallback')
&& is_callable($this->filereader->fileMapCallback)) {
$fileCollection = $fileCollection->map($this->filereader->fileMapCallback);
}
$this->processDocument($fileCollection);
$statement = $this->index->prepare("INSERT INTO filemap ( 'id', 'path') values ( $counter, :object)");
$statement->bindParam(':object', $object);
$statement->execute();
$this->info("Processed $counter $object");
}
}
$this->index->commit();
$this->index->exec("INSERT INTO info ( 'key', 'value') values ( 'total_documents', $counter)");
$this->index->exec("INSERT INTO info ( 'key', 'value') values ( 'driver', 'filesystem')");
$this->info("Total rows $counter");
$this->info("Index created: {$this->config['storage']}");
}
public function processDocument($row)
{
$documentId = $row->get($this->getPrimaryKey());
if ($this->excludePrimaryKey) {
$row->forget($this->getPrimaryKey());
}
$stems = $row->map(function ($columnContent, $columnName) use ($row) {
return $this->stemText($columnContent);
});
$this->saveToIndex($stems, $documentId);
}
public function insert($document)
{
$this->processDocument(new Collection($document));
$total = $this->totalDocumentsInCollection() + 1;
$this->updateInfoTable('total_documents', $total);
}
public function update($id, $document)
{
$this->delete($id);
$this->insert($document);
}
public function delete($documentId)
{
$rows = $this->prepareAndExecuteStatement("SELECT * FROM doclist WHERE doc_id = :documentId;", [
['key' => ':documentId', 'value' => $documentId]
])->fetchAll(PDO::FETCH_ASSOC);
$updateStmt = $this->index->prepare("UPDATE wordlist SET num_docs = num_docs - 1, num_hits = num_hits - :hits WHERE id = :term_id");
foreach ($rows as $document) {
$updateStmt->bindParam(":hits", $document['hit_count']);
$updateStmt->bindParam(":term_id", $document['term_id']);
$updateStmt->execute();
}
$this->prepareAndExecuteStatement("DELETE FROM doclist WHERE doc_id = :documentId;", [
['key' => ':documentId', 'value' => $documentId]
]);
$res = $this->prepareAndExecuteStatement("DELETE FROM wordlist WHERE num_hits = 0");
$affected = $res->rowCount();
if ($affected) {
$total = $this->totalDocumentsInCollection() - 1;
$this->updateInfoTable('total_documents', $total);
}
}
public function updateInfoTable($key, $value)
{
$this->updateInfoTableStmt = $this->index->prepare("UPDATE info SET value = :value WHERE key = :key");
$this->updateInfoTableStmt->bindValue(':key', $key);
$this->updateInfoTableStmt->bindValue(':value', $value);
$this->updateInfoTableStmt->execute();
}
public function stemText($text)
{
$stemmer = $this->getStemmer();
$words = $this->breakIntoTokens($text);
$stems = [];
foreach ($words as $word) {
$stems[] = $stemmer->stem($word);
}
return $stems;
}
public function breakIntoTokens($text)
{
if ($this->decodeHTMLEntities) {
$text = html_entity_decode($text);
}
return $this->tokenizer->tokenize($text, $this->stopWords);
}
public function decodeHtmlEntities($value = true)
{
$this->decodeHTMLEntities = $value;
}
public function saveToIndex($stems, $docId)
{
$this->prepareStatementsForIndex();
$terms = $this->saveWordlist($stems);
$this->saveDoclist($terms, $docId);
$this->saveHitList($stems, $docId, $terms);
}
/**
* @param $stems
*
* @return array
*/
public function saveWordlist($stems)
{
$terms = [];
$stems->map(function ($column, $key) use (&$terms) {
foreach ($column as $term) {
if (array_key_exists($term, $terms)) {
$terms[$term]['hits']++;
$terms[$term]['docs'] = 1;
} else {
$terms[$term] = [
'hits' => 1,
'docs' => 1,
'id' => 0
];
}
}
});
foreach ($terms as $key => $term) {
try {
$this->insertWordlistStmt->bindParam(":keyword", $key);
$this->insertWordlistStmt->bindParam(":hits", $term['hits']);
$this->insertWordlistStmt->bindParam(":docs", $term['docs']);
$this->insertWordlistStmt->execute();
$terms[$key]['id'] = $this->index->lastInsertId();
if ($this->inMemory) {
$this->inMemoryTerms[$key] = $terms[$key]['id'];
}
} catch (\Exception $e) {
if ($e->getCode() == 23000) {
$this->updateWordlistStmt->bindValue(':docs', $term['docs']);
$this->updateWordlistStmt->bindValue(':hits', $term['hits']);
$this->updateWordlistStmt->bindValue(':keyword', $key);
$this->updateWordlistStmt->execute();
if (!$this->inMemory) {
$this->selectWordlistStmt->bindValue(':keyword', $key);
$this->selectWordlistStmt->execute();
$res = $this->selectWordlistStmt->fetch(PDO::FETCH_ASSOC);
$terms[$key]['id'] = $res['id'];
} else {
$terms[$key]['id'] = $this->inMemoryTerms[$key];
}
} else {
echo "Error while saving wordlist: ".$e->getMessage()."\n";
}
// Statements must be refreshed, because in this state they have error attached to them.
$this->statementsPrepared = false;
$this->prepareStatementsForIndex();
}
}
return $terms;
}
public function saveDoclist($terms, $docId)
{
$insert = "INSERT INTO doclist (term_id, doc_id, hit_count) VALUES (:id, :doc, :hits)";
$stmt = $this->index->prepare($insert);
foreach ($terms as $key => $term) {
$stmt->bindValue(':id', $term['id']);
$stmt->bindValue(':doc', $docId);
$stmt->bindValue(':hits', $term['hits']);
try {
$stmt->execute();
} catch (\Exception $e) {
//we have a duplicate
echo $e->getMessage();
}
}
}
public function saveHitList($stems, $docId, $termsList)
{
return;
$fieldCounter = 0;
$fields = [];
$insert = "INSERT INTO hitlist (term_id, doc_id, field_id, position, hit_count)
VALUES (:term_id, :doc_id, :field_id, :position, :hit_count)";
$stmt = $this->index->prepare($insert);
foreach ($stems as $field => $terms) {
$fields[$fieldCounter] = $field;
$positionCounter = 0;
$termCounts = array_count_values($terms);
foreach ($terms as $term) {
if (isset($termsList[$term])) {
$stmt->bindValue(':term_id', $termsList[$term]['id']);
$stmt->bindValue(':doc_id', $docId);
$stmt->bindValue(':field_id', $fieldCounter);
$stmt->bindValue(':position', $positionCounter);
$stmt->bindValue(':hit_count', $termCounts[$term]);
$stmt->execute();
}
$positionCounter++;
}
$fieldCounter++;
}
}
public function getWordFromWordList($word)
{
$selectStmt = $this->index->prepare("SELECT * FROM wordlist WHERE term like :keyword LIMIT 1");
$selectStmt->bindValue(':keyword', $word);
$selectStmt->execute();
return $selectStmt->fetch(PDO::FETCH_ASSOC);
}
/**
* @param $word
*
* @return int
*/
public function countWordInWordList($word)
{
$res = $this->getWordFromWordList($word);
if ($res) {
return $res['num_hits'];
}
return 0;
}
/**
* @param $word
*
* @return int
*/
public function countDocHitsInWordList($word)
{
$res = $this->getWordFromWordList($word);
if ($res) {
return $res['num_docs'];
}
return 0;
}
public function buildDictionary($filename, $count = -1, $hits = true, $docs = false)
{
$selectStmt = $this->index->prepare("SELECT * FROM wordlist ORDER BY num_hits DESC;");
$selectStmt->execute();
$dictionary = "";
$counter = 0;
while ($row = $selectStmt->fetch(PDO::FETCH_ASSOC)) {
$dictionary .= $row['term'];
if ($hits) {
$dictionary .= "\t".$row['num_hits'];
}
if ($docs) {
$dictionary .= "\t".$row['num_docs'];
}
$counter++;
if ($counter >= $count && $count > 0) {
break;
}
$dictionary .= "\n";
}
file_put_contents($filename, $dictionary, LOCK_EX);
}
/**
* @return int
*/
public function totalDocumentsInCollection()
{
$query = "SELECT * FROM info WHERE key = 'total_documents'";
$docs = $this->index->query($query);
return $docs->fetch(PDO::FETCH_ASSOC)['value'];
}
/**
* @param $keyword
*
* @return string
*/
public function buildTrigrams($keyword)
{
$t = "__".$keyword."__";
$trigrams = "";
for ($i = 0; $i < strlen($t) - 2; $i++) {
$trigrams .= mb_substr($t, $i, 3)." ";
}
return trim($trigrams);
}
public function prepareAndExecuteStatement($query, $params = [])
{
$statemnt = $this->index->prepare($query);
foreach ($params as $param) {
$statemnt->bindParam($param['key'], $param['value']);
}
$statemnt->execute();
return $statemnt;
}
public function info($text)
{
if (!$this->disableOutput) {
echo $text.PHP_EOL;
}
}
}

View File

@ -0,0 +1,147 @@
<?php
namespace TeamTNT\TNTSearch\KeywordExtraction;
class Rake
{
public function __construct($language = "english")
{
$stopwords = file_get_contents(__DIR__."/../Stopwords/".$language.".json");
$this->stopwords = json_decode($stopwords);
}
public function extractKeywords($text, $includeScores = true)
{
$phraseList = $this->generateCandidateKeywords($text);
$wordScores = $this->calculateWordScores($phraseList);
$phraseScores = $this->calculatePhraseScores($phraseList, $wordScores);
arsort($phraseScores);
$oneThird = ceil(count($phraseScores) / 3) + 1;
$phraseScores = array_slice($phraseScores, 0, $oneThird);
if ($includeScores) {
return $phraseScores;
}
return array_keys($phraseScores);
}
public function generateCandidateKeywords($text)
{
$phraseList = [];
$words = $this->tokenize($text);
$phrase = [];
foreach ($words as $word) {
if (in_array($word, $this->stopwords) || ctype_punct($word)) {
if (count($phrase) > 0) {
$phraseList[] = $phrase;
$phrase = [];
}
} else {
$phrase[] = $word;
}
}
if (count($phrase) > 0) {
$phraseList[] = $phrase;
$phrase = [];
}
return $phraseList;
}
public function calculatePhraseScores($phraseList, $wordScores)
{
$result = [];
foreach ($phraseList as $phrase) {
$wordScore = 0;
foreach ($phrase as $word) {
$wordScore += $wordScores[$word];
}
$result[implode(" ", $phrase)] = $wordScore;
}
return $result;
}
public function calculateWordScores($phraseList)
{
$result = [];
foreach ($phraseList as $phrase) {
foreach ($phrase as $word) {
$wordScore = $this->wordDegree($word, $phraseList) / $this->wordFrequency($word, $phraseList);
$result[$word] = $wordScore;
}
}
return $result;
}
public function wordDegree($word, $phraseList)
{
$count = 0;
foreach ($phraseList as $phrase) {
foreach ($phrase as $p) {
if ($p == $word) {
$count += count($phrase);
}
}
}
return $count;
}
public function wordFrequency($word, $phraseList)
{
$count = 0;
foreach ($phraseList as $phrase) {
foreach ($phrase as $p) {
if ($p == $word) {
$count++;
}
}
}
return $count;
}
public function returnFormatedPharaseList($phraseList)
{
$formatedList = [];
foreach ($phraseList as $phrase) {
$formatedList[] = implode(" ", $phrase);
}
return $formatedList;
}
public function tokenize($str)
{
$str = mb_strtolower($str);
$arr = [];
// for the character classes
// see http://php.net/manual/en/regexp.reference.unicode.php
$pat = '/
([\pZ\pC]*) # match any separator or other
# in sequence
(
[^\pP\pZ\pC]+ | # match a sequence of characters
# that are not punctuation,
# separator or other
. # match punctuations one by one
)
([\pZ\pC]*) # match a sequence of separators
# that follows
/xu';
preg_match_all($pat, $str, $arr);
return $arr[2];
}
}

View File

@ -0,0 +1,114 @@
<?php
namespace TeamTNT\TNTSearch\Spell;
class JaroWinklerDistance
{
private $threshold = 0.7;
public function getDistance($str1, $str2)
{
$j = $this->jaro($str1, $str2);
if ($j < $this->threshold) {
return $j;
}
$lengthOfCommonPrefix = 0;
for ($i = 0; $i < min(strlen($str1), strlen($str2)); $i++) {
if ($str1[$i] == $str2[$i]) {
$lengthOfCommonPrefix++;
} else {
break;
}
}
$lp = min(0.1, 1 / max(strlen($str1), strlen($str2))) * $lengthOfCommonPrefix;
$jw = $j + ($lp * (1 - $j));
return $jw;
}
public function jaro($str1, $str2)
{
// length of the strings
$str1_len = strlen($str1);
$str2_len = strlen($str2);
// if both strings are empty return 1
// if only one of the strings is empty return 0
if ($str1_len == 0) {
return $str2_len == 0 ? 1 : 0;
}
// max distance between two chars to be considered matching
$match_distance = max($str1_len, $str2_len) / 2 - 1;
$str1_matches = array_fill(0, $str1_len, 0);
$str2_matches = array_fill(0, $str2_len, 0);
// number of matches and transpositions
$matches = 0;
$transpositions = 0;
// find the matches
for ($i = 0; $i < $str1_len; $i++) {
// start and end take into account the match distance
$start = (int) max(0, $i - $match_distance);
$end = (int) min($i + $match_distance + 1, $str2_len);
for ($k = $start; $k < $end; $k++) {
// if $str2 already has a match continue
if ($str2_matches[$k]) {
continue;
}
// if str1 and str2 are not
if ($str1[$i] != $str2[$k]) {
continue;
}
// otherwise assume there is a match
$str1_matches[$i] = true;
$str2_matches[$k] = true;
$matches++;
break;
}
}
// if there are no matches return 0
if ($matches == 0) {
return 0.0;
}
// count transpositions
$k = 0;
for ($i = 0; $i < $str1_len; $i++) {
// if there are no matches in str1 continue
if (!$str1_matches[$i]) {
continue;
}
// while there is no match in str2 increment k
while (!$str2_matches[$k]) {
$k++;
}
// increment transpositions
if ($str1[$i] != $str2[$k]) {
$transpositions++;
}
$k++;
}
// divide the number of transpositions by two as per the algorithm specs
// this division is valid because the counted transpositions include both
// instances of the transposed characters.
$transpositions /= 2.0;
// return the Jaro distance
return (($matches / $str1_len) +
($matches / $str2_len) +
(($matches - $transpositions) / $matches)) / 3.0;
}
}

View File

@ -0,0 +1,129 @@
<?php
/**
* This is a reimplementation of AR-PHP Arabic stemmer.
* The original author is Khaled Al-Sham'aa <khaled@ar-php.org>
*/
namespace TeamTNT\TNTSearch\Stemmer;
class ArabicStemmer implements Stemmer
{
private static $_verbPre = 'وأسفلي';
private static $_verbPost = 'ومكانيه';
private static $_verbMay;
private static $_verbMaxPre = 4;
private static $_verbMaxPost = 6;
private static $_verbMinStem = 2;
private static $_nounPre = 'ابفكلوأ';
private static $_nounPost = 'اتةكمنهوي';
private static $_nounMay;
private static $_nounMaxPre = 4;
private static $_nounMaxPost = 6;
private static $_nounMinStem = 2;
/**
* Loads initialize values
*
* @ignore
*/
public function __construct()
{
self::$_verbMay = self::$_verbPre . self::$_verbPost;
self::$_nounMay = self::$_nounPre . self::$_nounPost;
}
/**
* Get rough stem of the given Arabic word
*
* @param string $word Arabic word you would like to get its stem
*
* @return string Arabic stem of the word
* @author Khaled Al-Sham'aa <khaled@ar-php.org>
*/
public static function stem($word)
{
$nounStem = self::roughStem(
$word, self::$_nounMay, self::$_nounPre, self::$_nounPost,
self::$_nounMaxPre, self::$_nounMaxPost, self::$_nounMinStem
);
$verbStem = self::roughStem(
$word, self::$_verbMay, self::$_verbPre, self::$_verbPost,
self::$_verbMaxPre, self::$_verbMaxPost, self::$_verbMinStem
);
if (mb_strlen($nounStem, 'UTF-8') < mb_strlen($verbStem, 'UTF-8')) {
$stem = $nounStem;
} else {
$stem = $verbStem;
}
return $stem;
}
/**
* Get rough stem of the given Arabic word (under specific rules)
*
* @param string $word Arabic word you would like to get its stem
* @param string $notChars Arabic chars those can't be in postfix or prefix
* @param string $preChars Arabic chars those may exists in the prefix
* @param string $postChars Arabic chars those may exists in the postfix
* @param integer $maxPre Max prefix length
* @param integer $maxPost Max postfix length
* @param integer $minStem Min stem length
*
* @return string Arabic stem of the word under giving rules
* @author Khaled Al-Sham'aa <khaled@ar-php.org>
*/
protected static function roughStem (
$word, $notChars, $preChars, $postChars, $maxPre, $maxPost, $minStem
) {
$right = -1;
$left = -1;
$max = mb_strlen($word, 'UTF-8');
for ($i=0; $i < $max; $i++) {
$needle = mb_substr($word, $i, 1, 'UTF-8');
if (mb_strpos($notChars, $needle, 0, 'UTF-8') === false) {
if ($right == -1) {
$right = $i;
}
$left = $i;
}
}
if ($right > $maxPre) {
$right = $maxPre;
}
if ($max - $left - 1 > $maxPost) {
$left = $max - $maxPost -1;
}
for ($i=0; $i < $right; $i++) {
$needle = mb_substr($word, $i, 1, 'UTF-8');
if (mb_strpos($preChars, $needle, 0, 'UTF-8') === false) {
$right = $i;
break;
}
}
for ($i=$max-1; $i>$left; $i--) {
$needle = mb_substr($word, $i, 1, 'UTF-8');
if (mb_strpos($postChars, $needle, 0, 'UTF-8') === false) {
$left = $i;
break;
}
}
if ($left - $right >= $minStem) {
$stem = mb_substr($word, $right, $left-$right+1, 'UTF-8');
} else {
$stem = null;
}
return $stem;
}
}

View File

@ -0,0 +1,315 @@
<?php
/*
This is a reimplementation in PHP of a simple rule-based stemmer for Croatian
at http://nlp.ffzg.hr/resources/tools/stemmer-for-croatian/ (Python).
The original author is Ivan Pandžić. */
namespace TeamTNT\TNTSearch\Stemmer;
class CroatianStemmer implements Stemmer
{
protected static $stop = ['biti', 'jesam', 'budem', 'sam', 'jesi', 'budeš', 'si', 'jesmo', 'budemo',
'smo', 'jeste', 'budete', 'ste', 'jesu', 'budu', 'su', 'bih', 'bijah', 'bjeh',
'bijaše', 'bi', 'bje', 'bješe', 'bijasmo', 'bismo', 'bjesmo', 'bijaste', 'biste',
'bjeste', 'bijahu', 'biste', 'bjeste', 'bijahu', 'bi', 'biše', 'bjehu', 'bješe',
'bio', 'bili', 'budimo', 'budite', 'bila', 'bilo', 'bile', 'ću', 'ćeš', 'će',
'ćemo', 'ćete', 'želim', 'želiš', 'želi', 'želimo', 'želite', 'žele', 'moram',
'moraš', 'mora', 'moramo', 'morate', 'moraju', 'trebam', 'trebaš', 'treba',
'trebamo', 'trebate', 'trebaju', 'mogu', 'možeš', 'može', 'možemo', 'možete'];
public static function stem($token)
{
if (in_array($token, self::$stop)) {
return $token;
}
return self::korjenuj(self::transformiraj($token));
}
public static function istakniSlogotvornoR($niz)
{
return preg_replace('/(^|[^aeiou])r($|[^aeiou])/', '\1R\2', $niz);
}
public static function imaSamoglasnik($niz)
{
preg_match('/[aeiouR]/', self::istakniSlogotvornoR($niz), $matches);
if (count($matches) > 0) {
return true;
}
return false;
}
public static function transformiraj($pojavnica)
{
foreach (self::$transformations as $trazi => $zamijeni) {
if (self::endsWith($pojavnica, $trazi)) {
return substr($pojavnica, 0, -1 * strlen($trazi)) . $zamijeni;
}
}
return $pojavnica;
}
public static function korjenuj($pojavnica)
{
foreach (self::$rules as $rule) {
$rules = explode(" ", $rule);
$osnova = $rules[0];
$nastavak = $rules[1];
preg_match("/^(" . $osnova . ")(" . $nastavak . ")$/", $pojavnica, $dioba);
if (!empty($dioba)) {
if (self::imaSamoglasnik($dioba[1]) && strlen($dioba[1]) > 1) {
return $dioba[1];
}
}
}
return $pojavnica;
}
public static function endsWith($haystack, $needle)
{
// search forward starting from end minus needle length characters
return $needle === "" || (($temp = strlen($haystack) - strlen($needle)) >= 0 && strpos($haystack, $needle, $temp) !== false);
}
protected static $transformations = [
'lozi' => 'loga',
'lozima' => 'loga',
'pjesi' => 'pjeh',
'pjesima' => 'pjeh',
'vojci' => 'vojka',
'bojci' => 'bojka',
'jaci' => 'jak',
'jacima' => 'jak',
'čajan' => 'čajni',
'ijeran' => 'ijerni',
'laran' => 'larni',
'ijesan' => 'ijesni',
'anjac' => 'anjca',
'ajac' => 'ajca',
'ajaca' => 'ajca',
'ljaca' => 'ljca',
'ljac' => 'ljca',
'ejac' => 'ejca',
'ejaca' => 'ejca',
'ojac' => 'ojca',
'ojaca' => 'ojca',
'ajaka' => 'ajka',
'ojaka' => 'ojka',
'šaca' => 'šca',
'šac' => 'šca',
'inzima' => 'ing',
'inzi' => 'ing',
'tvenici' => 'tvenik',
'tetici' => 'tetika',
'teticima' => 'tetika',
'nstava' => 'nstva',
'nicima' => 'nik',
'ticima' => 'tik',
'zicima' => 'zik',
'snici' => 'snik',
'kuse' => 'kusi',
'kusan' => 'kusni',
'kustava' => 'kustva',
'dušan' => 'dušni',
'antan' => 'antni',
'bilan' => 'bilni',
'tilan' => 'tilni',
'avilan' => 'avilni',
'silan' => 'silni',
'gilan' => 'gilni',
'rilan' => 'rilni',
'nilan' => 'nilni',
'alan' => 'alni',
'ozan' => 'ozni',
'rave' => 'ravi',
'stavan' => 'stavni',
'pravan' => 'pravni',
'tivan' => 'tivni',
'sivan' => 'sivni',
'atan' => 'atni',
'cenata' => 'centa',
'denata' => 'denta',
'genata' => 'genta',
'lenata' => 'lenta',
'menata' => 'menta',
'jenata' => 'jenta',
'venata' => 'venta',
'tetan' => 'tetni',
'pletan' => 'pletni',
'šave' => 'šavi',
'manata' => 'manta',
'tanata' => 'tanta',
'lanata' => 'lanta',
'sanata' => 'santa',
'ačak' => 'ačka',
'ačaka' => 'ačka',
'ušak' => 'uška',
'atak' => 'atka',
'ataka' => 'atka',
'atci' => 'atka',
'atcima' => 'atka',
'etak' => 'etka',
'etaka' => 'etka',
'itak' => 'itka',
'itaka' => 'itka',
'itci' => 'itka',
'otak' => 'otka',
'otaka' => 'otka',
'utak' => 'utka',
'utaka' => 'utka',
'utci' => 'utka',
'utcima' => 'utka',
'eskan' => 'eskna',
'tičan' => 'tični',
'ojsci' => 'ojska',
'esama' => 'esma',
'metara' => 'metra',
'centar' => 'centra',
'centara' => 'centra',
'istara' => 'istra',
'istar' => 'istra',
'ošću' => 'osti',
'daba' => 'dba',
'čcima' => 'čka',
'čci' => 'čka',
'mac' => 'mca',
'maca' => 'mca',
'naca' => 'nca',
'nac' => 'nca',
'voljan' => 'voljni',
'anaka' => 'anki',
'vac' => 'vca',
'vaca' => 'vca',
'saca' => 'sca',
'sac' => 'sca',
'naca' => 'nca',
'nac' => 'nca',
'raca' => 'rca',
'rac' => 'rca',
'aoca' => 'alca',
'alaca' => 'alca',
'alac' => 'alca',
'elaca' => 'elca',
'elac' => 'elca',
'olaca' => 'olca',
'olac' => 'olca',
'olce' => 'olca',
'njac' => 'njca',
'njaca' => 'njca',
'ekata' => 'ekta',
'ekat' => 'ekta',
'izam' => 'izma',
'izama' => 'izma',
'jebe' => 'jebi',
'baci' => 'baci',
'ašan' => 'ašni',
];
protected static $rules = [
".+(s|š)k ijima|ijega|ijemu|ijem|ijim|ijih|ijoj|ijeg|iji|ije|ija|oga|ome|omu|ima|og|om|im|ih|oj|i|e|o|a|u",
".+(s|š)tv ima|om|o|a|u",
// N
".+(t|m|p|r|g)anij ama|ima|om|a|u|e|i| ",
".+an inom|ina|inu|ine|ima|in|om|u|i|a|e| ",
".+in ima|ama|om|a|e|i|u|o| ",
".+on ovima|ova|ove|ovi|ima|om|a|e|i|u| ",
".+n ijima|ijega|ijemu|ijeg|ijem|ijim|ijih|ijoj|iji|ije|ija|iju|ima|ome|omu|oga|oj|om|ih|im|og|o|e|a|u|i| ",
// Ć
".+(a|e|u)ć oga|ome|omu|ega|emu|ima|oj|ih|om|eg|em|og|uh|im|e|a",
// G
".+ugov ima|i|e|a",
".+ug ama|om|a|e|i|u|o",
".+log ama|om|a|u|e| ",
".+[^eo]g ovima|ama|ovi|ove|ova|om|a|e|i|u|o| ",
// I
".+(rrar|ott|ss|ll)i jem|ja|ju|o| ",
// J
".+uj ući|emo|ete|mo|em|eš|e|u| ",
".+(c|č|ć|đ|l|r)aj evima|evi|eva|eve|ama|ima|em|a|e|i|u| ",
".+(b|c|d|l|n|m|ž|g|f|p|r|s|t|z)ij ima|ama|om|a|e|i|u|o| ",
// L
//.+al inom|ina|inu|ine|ima|om|in|i|a|e
//.+[^(lo|ž)]il ima|om|a|e|u|i|
".+[^z]nal ima|ama|om|a|e|i|u|o| ",
".+ijal ima|ama|om|a|e|i|u|o| ",
".+ozil ima|om|a|e|u|i| ",
".+olov ima|i|a|e",
".+ol ima|om|a|u|e|i| ",
// M
".+lem ama|ima|om|a|e|i|u|o| ",
".+ram ama|om|a|e|i|u|o",
//.+(es|e|u)m ama|om|a|e|i|u|o
// R
//.+(a|d|e|o|u)r ama|ima|om|u|a|e|i|
".+(a|d|e|o)r ama|ima|om|u|a|e|i| ",
// S
".+(e|i)s ima|om|e|a|u",
// Š
".+(t|n|j|k|j|t|b|g|v)aš ama|ima|om|em|a|u|i|e| ",
".+(e|i)š ima|ama|om|em|i|e|a|u| ",
// T
".+ikat ima|om|a|e|i|u|o| ",
".+lat ima|om|a|e|i|u|o| ",
".+et ama|ima|om|a|e|i|u|o| ",
//.+ot ama|ima|om|a|u|e|i|
".+(e|i|k|o)st ima|ama|om|a|e|i|u|o| ",
".+išt ima|em|a|e|u",
//.+ut ovima|evima|ove|ovi|ova|eve|evi|eva|ima|om|a|u|e|i|
// V
".+ova smo|ste|hu|ti|še|li|la|le|lo|t|h|o",
".+(a|e|i)v ijemu|ijima|ijega|ijeg|ijem|ijim|ijih|ijoj|oga|ome|omu|ima|ama|iji|ije|ija|iju|im|ih|oj|om|og|i|a|u|e|o| ",
".+[^dkml]ov ijemu|ijima|ijega|ijeg|ijem|ijim|ijih|ijoj|oga|ome|omu|ima|iji|ije|ija|iju|im|ih|oj|om|og|i|a|u|e|o| ",
".+(m|l)ov ima|om|a|u|e|i| ",
// PRIDJEVI
".+el ijemu|ijima|ijega|ijeg|ijem|ijim|ijih|ijoj|oga|ome|omu|ima|iji|ije|ija|iju|im|ih|oj|om|og|i|a|u|e|o| ",
".+(a|e|š)nj ijemu|ijima|ijega|ijeg|ijem|ijim|ijih|ijoj|oga|ome|omu|ima|iji|ije|ija|iju|ega|emu|eg|em|im|ih|oj|om|og|a|e|i|o|u",
".+čin ama|ome|omu|oga|ima|og|om|im|ih|oj|a|u|i|o|e| ",
".+roši vši|smo|ste|še|mo|te|ti|li|la|lo|le|m|š|t|h|o",
".+oš ijemu|ijima|ijega|ijeg|ijem|ijim|ijih|ijoj|oga|ome|omu|ima|iji|ije|ija|iju|im|ih|oj|om|og|i|a|u|e| ",
".+(e|o)vit ijima|ijega|ijemu|ijem|ijim|ijih|ijoj|ijeg|iji|ije|ija|oga|ome|omu|ima|og|om|im|ih|oj|i|e|o|a|u| ",
//.+tit ijima|ijega|ijemu|ijem|ijim|ijih|ijoj|ijeg|iji|ije|ija|oga|ome|omu|ima|og|om|im|ih|oj|e|o|a|u|i|
".+ast ijima|ijega|ijemu|ijem|ijim|ijih|ijoj|ijeg|iji|ije|ija|oga|ome|omu|ima|og|om|im|ih|oj|i|e|o|a|u| ",
".+k ijemu|ijima|ijega|ijeg|ijem|ijim|ijih|ijoj|oga|ome|omu|ima|iji|ije|ija|iju|im|ih|oj|om|og|i|a|u|e|o| ",
// GLAGOLI
".+(e|a|i|u)va jući|smo|ste|jmo|jte|ju|la|le|li|lo|mo|na|ne|ni|no|te|ti|še|hu|h|j|m|n|o|t|v|š| ",
".+ir ujemo|ujete|ujući|ajući|ivat|ujem|uješ|ujmo|ujte|avši|asmo|aste|ati|amo|ate|aju|aše|ahu|ala|alo|ali|ale|uje|uju|uj|al|an|am|aš|at|ah|ao",
".+ač ismo|iste|iti|imo|ite|iše|eći|ila|ilo|ili|ile|ena|eno|eni|ene|io|im|iš|it|ih|en|i|e",
".+ača vši|smo|ste|smo|ste|hu|ti|mo|te|še|la|lo|li|le|ju|na|no|ni|ne|o|m|š|t|h|n",
//.+ači smo|ste|ti|li|la|lo|le|mo|te|še|m|š|t|h|o|
// Druga_vrsta
".+n uvši|usmo|uste|ući|imo|ite|emo|ete|ula|ulo|ule|uli|uto|uti|uta|em|eš|uo|ut|e|u|i",
".+ni vši|smo|ste|ti|mo|te|mo|te|la|lo|le|li|m|š|o",
// A
".+((a|r|i|p|e|u)st|[^o]g|ik|uc|oj|aj|lj|ak|ck|čk|šk|uk|nj|im|ar|at|et|št|it|ot|ut|zn|zv)a jući|vši|smo|ste|jmo|jte|jem|mo|te|je|ju|ti|še|hu|la|li|le|lo|na|no|ni|ne|t|h|o|j|n|m|š",
".+ur ajući|asmo|aste|ajmo|ajte|amo|ate|aju|ati|aše|ahu|ala|ali|ale|alo|ana|ano|ani|ane|al|at|ah|ao|aj|an|am|aš",
".+(a|i|o)staj asmo|aste|ahu|ati|emo|ete|aše|ali|ući|ala|alo|ale|mo|ao|em|eš|at|ah|te|e|u| ",
".+(b|c|č|ć|d|e|f|g|j|k|n|r|t|u|v)a lama|lima|lom|lu|li|la|le|lo|l",
".+(t|č|j|ž|š)aj evima|evi|eva|eve|ama|ima|em|a|e|i|u| ",
//.+(e|j|k|r|u|v)al ama|ima|om|u|i|a|e|o|
//.+(e|j|k|r|t|u|v)al ih|im
".+([^o]m|ič|nč|uč|b|c|ć|d|đ|h|j|k|l|n|p|r|s|š|v|z|ž)a jući|vši|smo|ste|jmo|jte|mo|te|ju|ti|še|hu|la|li|le|lo|na|no|ni|ne|t|h|o|j|n|m|š",
".+(a|i|o)sta dosmo|doste|doše|nemo|demo|nete|dete|nimo|nite|nila|vši|nem|dem|neš|deš|doh|de|ti|ne|nu|du|la|li|lo|le|t|o",
".+ta smo|ste|jmo|jte|vši|ti|mo|te|ju|še|la|lo|le|li|na|no|ni|ne|n|j|o|m|š|t|h",
".+inj asmo|aste|ati|emo|ete|ali|ala|alo|ale|aše|ahu|em|eš|at|ah|ao",
".+as temo|tete|timo|tite|tući|tem|teš|tao|te|li|ti|la|lo|le",
// I
".+(elj|ulj|tit|ac|ič|od|oj|et|av|ov)i vši|eći|smo|ste|še|mo|te|ti|li|la|lo|le|m|š|t|h|o",
".+(tit|jeb|ar|ed|uš|ič)i jemo|jete|jem|ješ|smo|ste|jmo|jte|vši|mo|še|te|ti|ju|je|la|lo|li|le|t|m|š|h|j|o",
".+(b|č|d|l|m|p|r|s|š|ž)i jemo|jete|jem|ješ|smo|ste|jmo|jte|vši|mo|lu|še|te|ti|ju|je|la|lo|li|le|t|m|š|h|j|o",
".+luč ujete|ujući|ujemo|ujem|uješ|ismo|iste|ujmo|ujte|uje|uju|iše|iti|imo|ite|ila|ilo|ili|ile|ena|eno|eni|ene|uj|io|en|im|iš|it|ih|e|i",
".+jeti smo|ste|še|mo|te|ti|li|la|lo|le|m|š|t|h|o",
".+e lama|lima|lom|lu|li|la|le|lo|l",
".+i lama|lima|lom|lu|li|la|le|lo|l",
// Pridjev_t
".+at ijega|ijemu|ijima|ijeg|ijem|ijih|ijim|ima|oga|ome|omu|iji|ije|ija|iju|oj|og|om|im|ih|a|u|i|e|o| ",
// Pridjev
".+et avši|ući|emo|imo|em|eš|e|u|i",
".+ ajući|alima|alom|avši|asmo|aste|ajmo|ajte|ivši|amo|ate|aju|ati|aše|ahu|ali|ala|ale|alo|ana|ano|ani|ane|am|aš|at|ah|ao|aj|an",
".+ anje|enje|anja|enja|enom|enoj|enog|enim|enih|anom|anoj|anog|anim|anih|eno|ovi|ova|oga|ima|ove|enu|anu|ena|ama",
".+ nijega|nijemu|nijima|nijeg|nijem|nijim|nijih|nima|niji|nije|nija|niju|noj|nom|nog|nim|nih|an|na|nu|ni|ne|no",
".+ om|og|im|ih|em|oj|an|u|o|i|e|a",
];
}

View File

@ -0,0 +1,693 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/**
*
* @link http://snowball.tartarus.org/algorithms/french/stemmer.html
* The original author is wamania
*
*/
class FrenchStemmer implements Stemmer
{
/**
* All french vowels
*/
protected static $vowels = ['a', 'e', 'i', 'o', 'u', 'y', 'â', 'à', 'ë', 'é', 'ê', 'è', 'ï', 'î', 'ô', 'û', 'ù'];
protected $word;
/**
* helper, contains stringified list of vowels
* @var string
*/
protected $plainVowels;
/**
* The original word, use to check if word has been modified
* @var string
*/
protected $originalWord;
/**
* RV value
* @var string
*/
protected $rv;
/**
* RV index (based on the beginning of the word)
* @var int
*/
protected $rvIndex;
/**
* R1 value
* @var int
*/
protected $r1;
/**
* R1 index (based on the beginning of the word)
* @var int
*/
protected $r1Index;
/**
* R2 value
* @var int
*/
protected $r2;
/**
* R2 index (based on the beginning of the word)
* @var int
*/
protected $r2Index;
public static function stem($word)
{
return (new static)->analyze($word);
}
public function analyze($word)
{
$this->word = mb_strtolower($word);
$this->plainVowels = implode('', static::$vowels);
$this->step0();
$this->rv();
$this->r1();
$this->r2();
// to know if step1, 2a or 2b have altered the word
$this->originalWord = $this->word;
$nextStep = $this->step1();
// Do step 2a if either no ending was removed by step 1, or if one of endings amment, emment, ment, ments was found.
if (($nextStep == 2) || ($this->originalWord === $this->word) ) {
$modified = $this->step2a();
if (!$modified) {
$this->step2b();
}
}
if ($this->word != $this->originalWord) {
$this->step3();
} else {
$this->step4();
}
$this->step5();
$this->step6();
$this->finish();
return $this->word;
}
/**
* Assume the word is in lower case.
* Then put into upper case u or i preceded and followed by a vowel, and y preceded or followed by a vowel.
* u after q is also put into upper case. For example,
* jouer -> joUer
* ennuie -> ennuIe
* yeux -> Yeux
* quand -> qUand
*/
private function step0()
{
$this->word = preg_replace('#([q])u#u', '$1U', $this->word);
$this->word = preg_replace('#(['.$this->plainVowels.'])y#u', '$1Y', $this->word);
$this->word = preg_replace('#y(['.$this->plainVowels.'])#u', 'Y$1', $this->word);
$this->word = preg_replace('#(['.$this->plainVowels.'])u(['.$this->plainVowels.'])#u', '$1U$2', $this->word);
$this->word = preg_replace('#(['.$this->plainVowels.'])i(['.$this->plainVowels.'])#u', '$1I$2', $this->word);
}
/**
* Step 1
* Search for the longest among the following suffixes, and perform the action indicated.
*
* @return integer Next step number
*/
private function step1()
{
// ance iqUe isme able iste eux ances iqUes ismes ables istes
// delete if in R2
if (($position = $this->search([
'ances', 'iqUes', 'ismes', 'ables', 'istes', 'ance', 'iqUe','isme', 'able', 'iste', 'eux'
])) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
}
return 3;
}
// atrice ateur ation atrices ateurs ations
// delete if in R2
// if preceded by ic, delete if in R2, else replace by iqU
if (($position = $this->search(['atrices', 'ateurs', 'ations', 'atrice', 'ateur', 'ation'])) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
if (($position2 = $this->searchIfInR2(['ic'])) !== false) {
$this->word = mb_substr($this->word, 0, $position2);
} else {
$this->word = preg_replace('#(ic)$#u', 'iqU', $this->word);
}
}
return 3;
}
// logie logies
// replace with log if in R2
if (($position = $this->search(['logies', 'logie'])) !== false) {
if ($this->inR2($position)) {
$this->word = preg_replace('#(logies|logie)$#u', 'log', $this->word);
}
return 3;
}
// usion ution usions utions
// replace with u if in R2
if (($position = $this->search(['usions', 'utions', 'usion', 'ution'])) !== false) {
if ($this->inR2($position)) {
$this->word = preg_replace('#(usion|ution|usions|utions)$#u', 'u', $this->word);
}
return 3;
}
// ence ences
// replace with ent if in R2
if (($position = $this->search(['ences', 'ence'])) !== false) {
if ($this->inR2($position)) {
$this->word = preg_replace('#(ence|ences)$#u', 'ent', $this->word);
}
return 3;
}
// issement issements
// delete if in R1 and preceded by a non-vowel
if (($position = $this->search(['issements', 'issement'])) != false) {
if ($this->inR1($position)) {
$before = $position - 1;
$letter = mb_substr($this->word, $before, 1);
if (! in_array($letter, static::$vowels)) {
$this->word = mb_substr($this->word, 0, $position);
}
}
return 3;
}
// ement ements
// delete if in RV
// if preceded by iv, delete if in R2 (and if further preceded by at, delete if in R2), otherwise,
// if preceded by eus, delete if in R2, else replace by eux if in R1, otherwise,
// if preceded by abl or iqU, delete if in R2, otherwise,
// if preceded by ièr or Ièr, replace by i if in RV
if (($position = $this->search(['ements', 'ement'])) !== false) {
if ($this->inRv($position)) {
$this->word = mb_substr($this->word, 0, $position);
}
if (($position = $this->searchIfInR2(['iv'])) !== false) {
$this->word = mb_substr($this->word, 0, $position);
if (($position2 = $this->searchIfInR2(['at'])) !== false) {
$this->word = mb_substr($this->word, 0, $position2);
}
} elseif (($position = $this->search(['eus'])) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
} elseif ($this->inR1($position)) {
$this->word = preg_replace('#(eus)$#u', 'eux', $this->word);
}
} elseif (($position = $this->searchIfInR2(['abl', 'iqU'])) !== false) {
$this->word = mb_substr($this->word, 0, $position);
} elseif (($this->searchIfInRv(['ièr', 'Ièr'])) !== false) {
$this->word = preg_replace('#(ièr|Ièr)$#u', 'i', $this->word);
}
return 3;
}
// ité ités
// delete if in R2
// if preceded by abil, delete if in R2, else replace by abl, otherwise,
// if preceded by ic, delete if in R2, else replace by iqU, otherwise,
// if preceded by iv, delete if in R2
if (($position = $this->search(['ités', 'ité'])) !== false) {
// delete if in R2
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
}
// if preceded by abil, delete if in R2, else replace by abl, otherwise,
if (($position = $this->search(['abil'])) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
} else {
$this->word = preg_replace('#(abil)$#u', 'abl', $this->word);
}
// if preceded by ic, delete if in R2, else replace by iqU, otherwise,
} elseif (($position = $this->search(['ic'])) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
} else {
$this->word = preg_replace('#(ic)$#u', 'iqU', $this->word);
}
// if preceded by iv, delete if in R2
} elseif (($position = $this->searchIfInR2(['iv'])) !== false) {
$this->word = mb_substr($this->word, 0, $position);
}
return 3;
}
// if ive ifs ives
// delete if in R2
// if preceded by at, delete if in R2 (and if further preceded by ic, delete if in R2, else replace by iqU)
if (($position = $this->search(['ifs', 'ives', 'if', 'ive'])) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
}
if (($position = $this->searchIfInR2(['at'])) !== false) {
$this->word = mb_substr($this->word, 0, $position);
if (($position2 = $this->search(['ic'])) !== false) {
if ($this->inR2($position2)) {
$this->word = mb_substr($this->word, 0, $position2);
} else {
$this->word = preg_replace('#(ic)$#u', 'iqU', $this->word);
}
}
}
return 3;
}
// eaux
// replace with eau
if (($this->search(['eaux'])) !== false) {
$this->word = preg_replace('#(eaux)$#u', 'eau', $this->word);
return 3;
}
// aux
// replace with al if in R1
if (($position = $this->search(['aux'])) !== false) {
if ($this->inR1($position)) {
$this->word = preg_replace('#(aux)$#u', 'al', $this->word);
}
return 3;
}
// euse euses
// delete if in R2, else replace by eux if in R1
if (($position = $this->search(['euses', 'euse'])) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
} elseif ($this->inR1($position)) {
$this->word = preg_replace('#(euses|euse)$#u', 'eux', $this->word);
}
return 3;
}
// amment
// replace with ant if in RV
if ( ($position = $this->search(['amment'])) !== false) {
if ($this->inRv($position)) {
$this->word = preg_replace('#(amment)$#u', 'ant', $this->word);
}
return 2;
}
// emment
// replace with ent if in RV
if (($position = $this->search(['emment'])) !== false) {
if ($this->inRv($position)) {
$this->word = preg_replace('#(emment)$#u', 'ent', $this->word);
}
return 2;
}
// ment ments
// delete if preceded by a vowel in RV
if (($position = $this->search(['ments', 'ment'])) != false) {
$before = $position - 1;
$letter = mb_substr($this->word, $before, 1);
if ($this->inRv($before) && (in_array($letter, static::$vowels)) ) {
$this->word = mb_substr($this->word, 0, $position);
}
return 2;
}
return 2;
}
/**
* Step 2a: Verb suffixes beginning i
* In steps 2a and 2b all tests are confined to the RV region.
* Search for the longest among the following suffixes and if found, delete if preceded by a non-vowel.
* îmes ît îtes i ie ies ir ira irai iraIent irais irait iras irent irez iriez
* irions irons iront is issaIent issais issait issant issante issantes issants isse
* issent isses issez issiez issions issons it
* (Note that the non-vowel itself must also be in RV.)
*/
private function step2a()
{
if (($position = $this->searchIfInRv([
'îmes', 'îtes', 'ît', 'ies', 'ie', 'iraIent', 'irais', 'irait', 'irai', 'iras', 'ira', 'irent', 'irez', 'iriez',
'irions', 'irons', 'iront', 'ir', 'issaIent', 'issais', 'issait', 'issant', 'issantes', 'issante', 'issants',
'issent', 'isses', 'issez', 'isse', 'issiez', 'issions', 'issons', 'is', 'it', 'i'])) !== false) {
$before = $position - 1;
$letter = mb_substr($this->word, $before, 1);
if ( $this->inRv($before) && (!in_array($letter, static::$vowels)) ) {
$this->word = mb_substr($this->word, 0, $position);
return true;
}
}
return false;
}
/**
* Do step 2b if step 2a was done, but failed to remove a suffix.
* Step 2b: Other verb suffixes
*/
private function step2b()
{
// é ée ées és èrent er era erai eraIent erais erait eras erez eriez erions erons eront ez iez
// delete
if (($position = $this->searchIfInRv([
'ées', 'èrent', 'erais', 'erait', 'erai', 'eraIent', 'eras', 'erez', 'eriez',
'erions', 'erons', 'eront', 'era', 'er', 'iez', 'ez','és', 'ée', 'é'])) !== false) {
$this->word = mb_substr($this->word, 0, $position);
return true;
}
// âmes ât âtes a ai aIent ais ait ant ante antes ants as asse assent asses assiez assions
// delete
// if preceded by e, delete
if (($position = $this->searchIfInRv([
'âmes', 'âtes', 'ât', 'aIent', 'ais', 'ait', 'antes', 'ante', 'ants', 'ant',
'assent', 'asses', 'assiez', 'assions', 'asse', 'as', 'ai', 'a'])) !== false) {
$before = $position - 1;
$letter = mb_substr($this->word, $before, 1);
if ( $this->inRv($before) && ($letter === 'e') ) {
$this->word = mb_substr($this->word, 0, $before);
} else {
$this->word = mb_substr($this->word, 0, $position);
}
return true;
}
// ions
// delete if in R2
if ( ($position = $this->searchIfInRv(array('ions'))) !== false) {
if ($this->inR2($position)) {
$this->word = mb_substr($this->word, 0, $position);
}
return true;
}
return false;
}
/**
* Step 3: Replace final Y with i or final ç with c
*/
private function step3()
{
$this->word = preg_replace('#(Y)$#u', 'i', $this->word);
$this->word = preg_replace('#(ç)$#u', 'c', $this->word);
}
/**
* Step 4: Residual suffix
*/
private function step4()
{
//If the word ends s, not preceded by a, i, o, u, è or s, delete it.
if (preg_match('#[^aiouès]s$#', $this->word)) {
$this->word = mb_substr($this->word, 0, -1);
}
// In the rest of step 4, all tests are confined to the RV region.
// ion
// delete if in R2 and preceded by s or t
if ((($position = $this->searchIfInRv(['ion'])) !== false) && ($this->inR2($position)) ) {
$before = $position - 1;
$letter = mb_substr($this->word, $before, 1);
if ( $this->inRv($before) && (($letter === 's') || ($letter === 't')) ) {
$this->word = mb_substr($this->word, 0, $position);
}
return true;
}
// ier ière Ier Ière
// replace with i
if (($this->searchIfInRv(['ier', 'ière', 'Ier', 'Ière'])) !== false) {
$this->word = preg_replace('#(ier|ière|Ier|Ière)$#u', 'i', $this->word);
return true;
}
// e
// delete
if (($this->searchIfInRv(['e'])) !== false) {
$this->word = mb_substr($this->word, 0, -1);
return true;
}
// ë
// if preceded by gu, delete
if (($position = $this->searchIfInRv(['guë'])) !== false) {
if ($this->inRv($position + 2)) {
$this->word = mb_substr($this->word, 0, -1);
return true;
}
}
return false;
}
/**
* Step 5: Undouble
* If the word ends enn, onn, ett, ell or eill, delete the last letter
*/
private function step5()
{
if ($this->search(['enn', 'onn', 'ett', 'ell', 'eill']) !== false) {
$this->word = mb_substr($this->word, 0, -1);
}
}
/**
* Step 6: Un-accent
* If the words ends é or è followed by at least one non-vowel, remove the accent from the e.
*/
private function step6()
{
$this->word = preg_replace('#(é|è)([^'.$this->plainVowels.']+)$#u', 'e$2', $this->word);
}
/**
* And finally:
* Turn any remaining I, U and Y letters in the word back into lower case.
*/
private function finish()
{
$this->word = str_replace(['I','U','Y'], ['i', 'u', 'y'], $this->word);
}
/**
* If the word begins with two vowels, RV is the region after the third letter,
* otherwise the region after the first vowel not at the beginning of the word,
* or the end of the word if these positions cannot be found.
* (Exceptionally, par, col or tap, at the begining of a word is also taken to define RV as the region to their right.)
*/
protected function rv()
{
$length = mb_strlen($this->word);
$this->rv = '';
$this->rvIndex = $length;
if ($length < 3) {
return true;
}
// If the word begins with two vowels, RV is the region after the third letter
$first = mb_substr($this->word, 0, 1);
$second = mb_substr($this->word, 1, 1);
if ( (in_array($first, static::$vowels)) && (in_array($second, static::$vowels)) ) {
$this->rv = mb_substr($this->word, 3);
$this->rvIndex = 3;
return true;
}
// (Exceptionally, par, col or tap, at the begining of a word is also taken to define RV as the region to their right.)
$begin3 = mb_substr($this->word, 0, 3);
if (in_array($begin3, ['par', 'col', 'tap'])) {
$this->rv = mb_substr($this->word, 3);
$this->rvIndex = 3;
return true;
}
// otherwise the region after the first vowel not at the beginning of the word,
for ($i = 1; $i < $length; ++$i) {
$letter = mb_substr($this->word, $i, 1);
if (in_array($letter, static::$vowels)) {
$this->rv = mb_substr($this->word, ($i + 1));
$this->rvIndex = $i + 1;
return true;
}
}
return false;
}
protected function inRv($position)
{
return ($position >= $this->rvIndex);
}
protected function inR1($position)
{
return ($position >= $this->r1Index);
}
protected function inR2($position)
{
return ($position >= $this->r2Index);
}
protected function searchIfInRv($suffixes)
{
return $this->search($suffixes, $this->rvIndex);
}
protected function searchIfInR2($suffixes)
{
return $this->search($suffixes, $this->r2Index);
}
protected function search($suffixes, $offset = 0)
{
$length = mb_strlen($this->word);
if ($offset > $length) {
return false;
}
foreach ($suffixes as $suffixe) {
if ((($position = mb_strrpos($this->word, $suffixe, $offset)) !== false)
&& ((mb_strlen($suffixe) + $position) == $length)) {
return $position;
}
}
return false;
}
/**
* R1 is the region after the first non-vowel following a vowel, or the end of the word if there is no such non-vowel.
*/
protected function r1()
{
list($this->r1Index, $this->r1) = $this->rx($this->word);
}
/**
* R2 is the region after the first non-vowel following a vowel in R1, or the end of the word if there is no such non-vowel.
*/
protected function r2()
{
list($index, $value) = $this->rx($this->r1);
$this->r2 = $value;
$this->r2Index = $this->r1Index + $index;
}
/**
* Common function for R1 and R2
* Search the region after the first non-vowel following a vowel in $word, or the end of the word if there is no such non-vowel.
* R1 : $in = $this->word
* R2 : $in = R1
*/
protected function rx($in)
{
$length = mb_strlen($in);
// defaults
$value = '';
$index = $length;
// we search all vowels
$vowels = [];
for ($i = 0; $i < $length; ++$i) {
$letter = mb_substr($in, $i, 1);
if (in_array($letter, static::$vowels)) {
$vowels[] = $i;
}
}
// search the non-vowel following a vowel
foreach ($vowels as $position) {
$after = $position + 1;
$letter = mb_substr($in, $after, 1);
if (!in_array($letter, static::$vowels)) {
$index = $after + 1;
$value = mb_substr($in, ($after + 1));
break;
}
}
return [$index, $value];
}
}

View File

@ -0,0 +1,248 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/**
* Copyright (c) 2013 Aris Buzachis (buzachis.aris@gmail.com)
*
* All rights reserved.
*
* This script is free software.
*
* DISCLAIMER:
*
* IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
* ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/**
* Takes a word and reduces it to its German stem using the Porter stemmer algorithm.
*
* References:
* - http://snowball.tartarus.org/algorithms/porter/stemmer.html
* - http://snowball.tartarus.org/algorithms/german/stemmer.html
*
* Usage:
* $stem = GermanStemmer::stem($word);
*
* @author Aris Buzachis <buzachis.aris@gmail.com>
* @author Pascal Landau <kontakt@myseosolution.de>
*/
class GermanStemmer implements Stemmer
{
/**
* R1 and R2 regions (see the Porter algorithm)
*/
private static $R1;
private static $R2;
private static $cache = array();
private static $vowels = array('a', 'e', 'i', 'o', 'u', 'y', 'ä', 'ö', 'ü');
private static $s_ending = array('b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 'r', 't');
private static $st_ending = array('b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 't');
/**
* Gets the stem of $word.
* @param string $word
* @return string
*/
public static function stem($word)
{
$word = mb_strtolower($word);
//check for invalid characters
preg_match("#.#u", $word);
if (preg_last_error() !== 0) {
throw new \InvalidArgumentException("Word '$word' seems to be errornous. Error code from preg_last_error(): " . preg_last_error());
}
if (!isset(self::$cache[$word])) {
$result = self::getStem($word);
self::$cache[$word] = $result;
}
return self::$cache[$word];
}
/**
* @param $word
* @return string
*/
private static function getStem($word)
{
$word = self::step0a($word);
$word = self::step1($word);
$word = self::step2($word);
$word = self::step3($word);
$word = self::step0b($word);
return $word;
}
/**
* Replaces to protect some characters
* @param string $word
* @return string mixed
*/
private static function step0a($word)
{
$vstr = implode('', self::$vowels);
$word = preg_replace('#([' . $vstr . '])u([' . $vstr . '])#u', '$1U$2', $word);
$word = preg_replace('#([' . $vstr . '])y([' . $vstr . '])#u', '$1Y$2', $word);
return $word;
}
/**
* Undo the initial replaces
* @param string $word
* @return string
*/
private static function step0b($word)
{
$word = str_replace(array('ä', 'ö', 'ü', 'U', 'Y'), array('a', 'o', 'u', 'u', 'y'), $word);
return $word;
}
private static function step1($word)
{
$word = str_replace('ß', 'ss', $word);
self::getR($word);
$replaceCount = 0;
$arr = array('em', 'ern', 'er');
foreach ($arr as $s) {
self::$R1 = preg_replace('#' . $s . '$#u', '', self::$R1, -1, $replaceCount);
if ($replaceCount > 0) {
$word = preg_replace('#' . $s . '$#u', '', $word);
}
}
$arr = array('en', 'es', 'e');
foreach ($arr as $s) {
self::$R1 = preg_replace('#' . $s . '$#u', '', self::$R1, -1, $replaceCount);
if ($replaceCount > 0) {
$word = preg_replace('#' . $s . '$#u', '', $word);
$word = preg_replace('#niss$#u', 'nis', $word);
}
}
$word = preg_replace('/([' . implode('', self::$s_ending) . '])s$/u', '$1', $word);
return $word;
}
private static function step2($word)
{
self::getR($word);
$replaceCount = 0;
$arr = array('est', 'er', 'en');
foreach ($arr as $s) {
self::$R1 = preg_replace('#' . $s . '$#u', '', self::$R1, -1, $replaceCount);
if ($replaceCount > 0) {
$word = preg_replace('#' . $s . '$#u', '', $word);
}
}
if (strpos(self::$R1, 'st') !== false) {
self::$R1 = preg_replace('#st$#u', '', self::$R1);
$word = preg_replace('#(...[' . implode('', self::$st_ending) . '])st$#u', '$1', $word);
}
return $word;
}
private static function step3($word)
{
self::getR($word);
$replaceCount = 0;
$arr = array('end', 'ung');
foreach ($arr as $s) {
if (preg_match('#' . $s . '$#u', self::$R2)) {
$word = preg_replace('#([^e])' . $s . '$#u', '$1', $word, -1, $replaceCount);
if ($replaceCount > 0) {
self::$R2 = preg_replace('#' . $s . '$#u', '', self::$R2, -1, $replaceCount);
}
}
}
$arr = array('isch', 'ik', 'ig');
foreach ($arr as $s) {
if (preg_match('#' . $s . '$#u', self::$R2)) {
$word = preg_replace('#([^e])' . $s . '$#u', '$1', $word, -1, $replaceCount);
if ($replaceCount > 0) {
self::$R2 = preg_replace('#' . $s . '$#u', '', self::$R2);
}
}
}
$arr = array('lich', 'heit');
foreach ($arr as $s) {
self::$R2 = preg_replace('#' . $s . '$#u', '', self::$R2, -1, $replaceCount);
if ($replaceCount > 0) {
$word = preg_replace('#' . $s . '$#u', '', $word);
} else {
if (preg_match('#' . $s . '$#u', self::$R1)) {
$word = preg_replace('#(er|en)' . $s . '$#u', '$1', $word, -1, $replaceCount);
if ($replaceCount > 0) {
self::$R1 = preg_replace('#' . $s . '$#u', '', self::$R1);
}
}
}
}
$arr = array('keit');
foreach ($arr as $s) {
self::$R2 = preg_replace('#' . $s . '$#u', '', self::$R2, -1, $replaceCount);
if ($replaceCount > 0) {
$word = preg_replace('#' . $s . '$#u', '', $word);
}
}
return $word;
}
/**
* Find R1 and R2
* @param string $word
*/
private static function getR($word)
{
self::$R1 = "";
self::$R2 = "";
$vowels = implode("", self::$vowels);
$vowelGroup = "[{$vowels}]";
$nonVowelGroup = "[^{$vowels}]";
// R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
$pattern = "#(?P<rest>.*?{$vowelGroup}{$nonVowelGroup})(?P<r>.*)#u";
if (preg_match($pattern, $word, $match)) {
$rest = $match["rest"];
$r1 = $match["r"];
// [...], but then R1 is adjusted so that the region before it contains at least 3 letters.
$cutOff = 3 - mb_strlen($rest);
if ($cutOff > 0) {
$r1 = mb_substr($r1, $cutOff);
}
self::$R1 = $r1;
}
//R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.
if (preg_match($pattern, self::$R1, $match)) {
self::$R2 = $match["r"];
}
}
}

View File

@ -0,0 +1,451 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/*
* The following code, downloaded from <https://www.drupal.org/project/italianstemmer>,
* was originally written by Roberto Mirizzi (<roberto.mirizzi@gmail.com>,
* <http://sisinflab.poliba.it/mirizzi/>) in February 2007. It was the PHP5 implementation
* of Martin Porter's stemming algorithm for Italian language. This algorithm can be found
* at the address: <http://snowball.tartarus.org/algorithms/italian/stemmer.html>.
*
* It was rewritten in March 2017 for TNTSearch by GaspariLab S.r.l., <dev@gasparilab.it>.
*/
/*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
class ItalianStemmer implements Stemmer
{
private static $cache = [];
private static $vocali = ['a', 'e', 'i', 'o', 'u', 'à', 'è', 'ì', 'ò', 'ù'];
private static $consonanti = [
'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z',
'I', 'U',
];
private static $accenti_acuti = ['á', 'é', 'í', 'ó', 'ú'];
private static $accenti_gravi = ['à', 'è', 'ì', 'ò', 'ù'];
private static $suffissi_step0 = [
'ci', 'gli', 'la', 'le', 'li', 'lo', 'mi', 'ne', 'si', 'ti', 'vi', 'sene',
'gliela', 'gliele', 'glieli', 'glielo', 'gliene', 'mela', 'mele', 'meli', 'melo', 'mene', 'tela', 'tele',
'teli', 'telo', 'tene', 'cela', 'cele', 'celi', 'celo', 'cene', 'vela', 'vele', 'veli', 'velo', 'vene',
];
private static $suffissi_step1_a = [
'anza', 'anze', 'ico', 'ici', 'ica', 'ice', 'iche', 'ichi', 'ismo', 'ismi', 'abile', 'abili', 'ibile',
'ibili', 'ista', 'iste', 'isti', 'istà', 'istè', 'istì', 'oso', 'osi', 'osa', 'ose', 'mente', 'atrice',
'atrici', 'ante', 'anti',
];
private static $suffissi_step1_b = ['azione', 'azioni', 'atore', 'atori'];
private static $suffissi_step1_c = ['logia', 'logie'];
private static $suffissi_step1_d = ['uzione', 'uzioni', 'usione', 'usioni'];
private static $suffissi_step1_e = ['enza', 'enze'];
private static $suffissi_step1_f = ['amento', 'amenti', 'imento', 'imenti'];
private static $suffissi_step1_g = ['amente'];
private static $suffissi_step1_h = ['ità'];
private static $suffissi_step1_i = ['ivo', 'ivi', 'iva', 'ive'];
private static $suffissi_step2 = [
'ammo', 'ando', 'ano', 'are', 'arono', 'asse', 'assero', 'assi', 'assimo', 'ata', 'ate', 'ati', 'ato', 'ava',
'avamo', 'avano', 'avate', 'avi', 'avo', 'emmo', 'enda', 'ende', 'endi', 'endo', 'erà', 'erai', 'eranno',
'ere', 'erebbe', 'erebbero', 'erei', 'eremmo', 'eremo', 'ereste', 'eresti', 'erete', 'erò', 'erono', 'essero',
'ete', 'eva', 'evamo', 'evano', 'evate', 'evi', 'evo', 'Yamo', 'iamo', 'immo', 'irà', 'irai', 'iranno', 'ire',
'irebbe', 'irebbero', 'irei', 'iremmo', 'iremo', 'ireste', 'iresti', 'irete', 'irò', 'irono', 'isca',
'iscano', 'isce', 'isci', 'isco', 'iscono', 'issero', 'ita', 'ite', 'iti', 'ito', 'iva', 'ivamo', 'ivano',
'ivate', 'ivi', 'ivo', 'ono', 'uta', 'ute', 'uti', 'uto', 'ar', 'ir',
];
private static $ante_suff_a = ['ando', 'endo'];
private static $ante_suff_b = ['ar', 'er', 'ir'];
public function __construct()
{
usort(self::$suffissi_step0, function($a,$b) { return mb_strlen($a)>mb_strlen($b) ? -1 : 1; });
usort(self::$suffissi_step1_a, function($a,$b) { return mb_strlen($a)>mb_strlen($b) ? -1 : 1;});
usort(self::$suffissi_step2, function($a,$b) { return mb_strlen($a)>mb_strlen($b) ? -1 : 1;});
}
/**
* Gets the stem of $word.
*
* @param string $word
*
* @return string
*/
public static function stem($word)
{
$word = mb_strtolower($word);
// Check for invalid characters
preg_match('#.#u', $word);
if (preg_last_error() !== 0) {
throw new \InvalidArgumentException('Word "'.$word.'" seems to be errornous.
Error code from preg_last_error(): '.preg_last_error());
}
if (!isset(self::$cache[$word])) {
$result = self::getStem($word);
self::$cache[$word] = $result;
}
return self::$cache[$word];
}
/**
* @param $word
*
* @return string
*/
private static function getStem($word)
{
$str = self::trim($word);
$str = self::toLower($str);
$str = self::replaceAccAcuti($str);
$str = self::putUAfterQToUpper($str);
$str = self::IUBetweenVowToUpper($str);
$step0 = self::step0($str);
$step1 = self::step1($step0);
$step2 = self::step2($step0, $step1);
$step3a = self::step3a($step2);
$step3b = self::step3b($step3a);
$step4 = self::step4($step3b);
return $step4;
}
private static function trim($str)
{
return trim($str);
}
private static function toLower($str)
{
return strtolower($str);
}
private static function replaceAccAcuti($str)
{
return str_replace(self::$accenti_acuti, self::$accenti_gravi, $str); //strtr
}
private static function putUAfterQToUpper($str)
{
return str_replace('qu', 'qU', $str);
}
private static function IUBetweenVowToUpper($str)
{
$pattern = '/([aeiouàèìòù])([iu])([aeiouàèìòù])/';
return preg_replace_callback($pattern, function ($matches) {
return strtoupper($matches[0]);
}, $str);
}
private static function returnRV($str)
{
/*
If the second letter is a consonant, RV is the region after the next following vowel,
or if the first two letters are vowels, RV is the region after the next consonant, and otherwise
(consonant-vowel case) RV is the region after the third letter.
But RV is the end of the word if these positions cannot be found. Example:
m a c h o [ho] o l i v a [va] t r a b a j o [bajo] á u r e o [eo] prezzo sprezzante
*/
if (mb_strlen($str) < 2) {
return '';
} //$str;
if (in_array($str[1], self::$consonanti)) {
$str = mb_substr($str, 2);
$str = strpbrk($str, implode(self::$vocali));
return mb_substr($str, 1); //secondo me devo mettere 1
} elseif (in_array($str[0], self::$vocali) && in_array($str[1], self::$vocali)) {
$str = strpbrk($str, implode(self::$consonanti));
return mb_substr($str, 1);
} elseif (in_array($str[0], self::$consonanti) && in_array($str[1], self::$vocali)) {
return mb_substr($str, 3);
}
}
private static function returnR1($str)
{
/*
R1 is the region after the first non-vowel following a vowel, or is the null region at the end
of the word if there is no such non-vowel. Example:
beautiful [iful] beauty [y] beau [NULL] animadversion [imadversion] sprinkled [kled] eucharist [harist]
*/
$pattern = '/['.implode(self::$vocali).']+'.'['.implode(self::$consonanti).']'.'(.*)/';
preg_match($pattern, $str, $matches);
return count($matches) >= 1 ? $matches[1] : '';
}
private static function returnR2($str)
{
/*
R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end
of the word if there is no such non-vowel. Example:
beautiful [ul] beauty [NULL] beau [NULL] animadversion [adversion] sprinkled [NULL] eucharist [ist]
*/
$R1 = self::returnR1($str);
$pattern = '/['.implode(self::$vocali).']+'.'['.implode(self::$consonanti).']'.'(.*)/';
preg_match($pattern, $R1, $matches);
return count($matches) >= 1 ? $matches[1] : '';
}
private static function step0($str)
{
//Step 0: Attached pronoun
//Always do steps 0
$str_len = mb_strlen($str);
$rv = self::returnRV($str);
$rv_len = mb_strlen($rv);
$pos = 0;
foreach (self::$suffissi_step0 as $suff) {
if ($rv_len - mb_strlen($suff) < 0) {
continue;
}
$pos = mb_strpos($rv, $suff, $rv_len - mb_strlen($suff));
if ($pos !== false) {
break;
}
}
$ante_suff = mb_substr($rv, 0, $pos);
$ante_suff_len = mb_strlen($ante_suff);
foreach (self::$ante_suff_a as $ante_a) {
if ($ante_suff_len - mb_strlen($ante_a) < 0) {
continue;
}
$pos_a = mb_strpos($ante_suff, $ante_a, $ante_suff_len - mb_strlen($ante_a));
if ($pos_a !== false) {
return mb_substr($str, 0, $pos + $str_len - $rv_len);
}
}
foreach (self::$ante_suff_b as $ante_b) {
if ($ante_suff_len - mb_strlen($ante_b) < 0) {
continue;
}
$pos_b = mb_strpos($ante_suff, $ante_b, $ante_suff_len - mb_strlen($ante_b));
if ($pos_b !== false) {
return mb_substr($str, 0, $pos + $str_len - $rv_len).'e';
}
}
return $str;
}
private static function deleteStuff($arr_suff, $str, $str_len, $where, $ovunque = false)
{
if ($where === 'r2') {
$r = self::returnR2($str);
} elseif ($where === 'rv') {
$r = self::returnRV($str);
} elseif ($where === 'r1') {
$r = self::returnR1($str);
}
$r_len = mb_strlen($r);
if ($ovunque) {
foreach ($arr_suff as $suff) {
if ($str_len - mb_strlen($suff) < 0) {
continue;
}
$pos = mb_strpos($str, $suff, $str_len - mb_strlen($suff));
if ($pos !== false) {
$pattern = '/'.$suff.'$/';
$ret_str = preg_match($pattern, $r) ? mb_substr($str, 0, $pos) : '';
if ($ret_str !== '') {
return $ret_str;
}
break;
}
}
} else {
foreach ($arr_suff as $suff) {
if ($r_len - mb_strlen($suff) < 0) {
continue;
}
$pos = mb_strpos($r, $suff, $r_len - mb_strlen($suff));
if ($pos !== false) {
return mb_substr($str, 0, $pos + $str_len - $r_len);
}
}
}
}
private static function step1($str)
{
// Step 1: Standard suffix removal
// Always do steps 1
$str_len = mb_strlen($str);
// Delete if in R1, if preceded by 'iv', delete if in R2 (and if further preceded by 'at', delete if in R2),
// otherwise, if preceded by 'os', 'ic' or 'abil', delete if in R2
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_g, $str, $str_len, 'r1'))) {
if (!empty($ret_str1 = self::deleteStuff(['iv'], $ret_str, mb_strlen($ret_str), 'r2'))) {
if (!empty($ret_str2 = self::deleteStuff(['at'], $ret_str1, mb_strlen($ret_str1), 'r2'))) {
return $ret_str2;
} else {
return $ret_str1;
}
} elseif (!empty(
$ret_str1 = self::deleteStuff(['os', 'ic', 'abil'], $ret_str, mb_strlen($ret_str), 'r2')
)) {
return $ret_str1;
} else {
return $ret_str;
}
}
// Delete if in R2
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_a, $str, $str_len, 'r2', true))) {
return $ret_str;
}
// Delete if in R2, if preceded by 'ic', delete if in R2
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_b, $str, $str_len, 'r2'))) {
if (!empty($ret_str1 = self::deleteStuff(['ic'], $ret_str, mb_strlen($ret_str), 'r2'))) {
return $ret_str1;
} else {
return $ret_str;
}
}
// Replace with 'log' if in R2
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_c, $str, $str_len, 'r2'))) {
return $ret_str.'log';
}
// Replace with 'u' if in R2
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_d, $str, $str_len, 'r2'))) {
return $ret_str.'u';
}
// Replace with 'ente' if in R2
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_e, $str, $str_len, 'r2'))) {
return $ret_str.'ente';
}
// Delete if in RV
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_f, $str, $str_len, 'rv'))) {
return $ret_str;
}
// Delete if in R2, if preceded by 'abil', 'ic' or 'iv', delete if in R2
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_h, $str, $str_len, 'r2'))) {
if (!empty($ret_str1 = self::deleteStuff(['abil', 'ic', 'iv'], $ret_str, mb_strlen($ret_str), 'r2'))) {
return $ret_str1;
} else {
return $ret_str;
}
}
// Delete if in R2, if preceded by 'at', delete if in R2 (and if further preceded by 'ic', delete if in R2)
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step1_i, $str, $str_len, 'r2'))) {
if (!empty($ret_str1 = self::deleteStuff(['at'], $ret_str, mb_strlen($ret_str), 'r2'))) {
if (!empty($ret_str2 = self::deleteStuff(['ic'], $ret_str1, mb_strlen($ret_str1), 'r2'))) {
return $ret_str2;
} else {
return $ret_str1;
}
} else {
return $ret_str;
}
}
return $str;
}
private static function step2($str, $str_step1)
{
//Step 2: Verb suffixes
//Do step 2 if no ending was removed by step 1
if ($str != $str_step1) {
return $str_step1;
}
$str_len = mb_strlen($str);
if (!empty($ret_str = self::deleteStuff(self::$suffissi_step2, $str, $str_len, 'rv'))) {
return $ret_str;
}
return $str;
}
private static function step3a($str)
{
// Step 3a: Delete a final 'a', 'e', 'i', 'o',' à', 'è', 'ì' or 'ò' if it is in RV,
// and a preceding 'i' if it is in RV ('crocchi' -> 'crocch', 'crocchio' -> 'crocch')
// Always do steps 3a
$vocale_finale = ['a', 'e', 'i', 'o', 'à', 'è', 'ì', 'ò'];
$str_len = mb_strlen($str);
if (!empty($ret_str = self::deleteStuff($vocale_finale, $str, $str_len, 'rv'))) {
if (!empty($ret_str1 = self::deleteStuff(['i'], $ret_str, mb_strlen($ret_str), 'rv'))) {
return $ret_str1;
} else {
return $ret_str;
}
}
return $str;
}
private static function step3b($str)
{
// Step 3b: Replace final 'ch' (or 'gh') with 'c' (or 'g') if in 'RV' ('crocch' -> 'crocc')
// Always do steps 3b
$rv = self::returnRV($str);
$pattern = '/([cg])h$/';
return mb_substr($str, 0, mb_strlen($str) - mb_strlen($rv))
. preg_replace_callback(
$pattern,
function ($matches) {
return $matches[0];
},
$rv
);
}
private static function step4($str)
{
// Step 4: Finally, turn I and U back into lower case
return strtolower($str);
}
}

View File

@ -0,0 +1,11 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
class NoStemmer implements Stemmer
{
public static function stem($word)
{
return $word;
}
}

View File

@ -0,0 +1,144 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/**
*
* @link https://github.com/Tutanchamon/pl_stemmer
* Simple stemmer for polish language based on pl_stemmer by Błażej Kubiński.
*
*/
class PolishStemmer implements Stemmer
{
public static function removeNouns($word)
{
if (strlen($word) > 7 && in_array(mb_substr($word, -5), array("zacja", "zacją", "zacji"))) {
return mb_substr($word, 0, -4);
}
if (strlen($word) > 6 && in_array(mb_substr($word, -4), array("acja", "acji", "acją", "tach", "anie", "enie", "eniu", "aniu"))) {
return mb_substr($word, 0, -4);
}
if (strlen($word) > 6 && (mb_substr($word, -4) == "tyka")) {
return mb_substr($word, 0, -2);
}
if (strlen($word) > 5 && in_array(mb_substr($word, -3), array("ach", "ami", "nia", "niu", "cia", "ciu"))) {
return mb_substr($word, 0, -3);
}
if (strlen($word) > 5 && in_array(mb_substr($word, -3), array("cji", "cja", "cją"))) {
return mb_substr($word, 0, -2);
}
if (strlen($word) > 5 && in_array(mb_substr($word, -2), array("ce", "ta"))) {
return mb_substr($word, 0, -2);
}
return $word;
}
public static function removeDiminutive($word)
{
if (strlen($word) > 6) {
if (in_array(mb_substr($word, -5), array("eczek", "iczek", "iszek", "aszek", "uszek"))) {
return mb_substr($word, 0, -5);
}
if (in_array(mb_substr($word, -4), array("enek", "ejek", "erek"))) {
return mb_substr($word, 0, -2);
}
}
if (strlen($word) > 4) {
if (in_array(mb_substr($word, -2), array("ek", "ak"))) {
return mb_substr($word, 0, -2);
}
}
return $word;
}
public static function removeAdjectiveEnds($word)
{
if (strlen($word) > 7 && (mb_substr($word, 0, 3) == "naj") && in_array(mb_substr($word, -3), array("sze", "szy"))) {
return mb_substr($word, 3, -3);
}
if (strlen($word) > 7 && (mb_substr($word, 0, 3) == "naj") && (mb_substr($word, 0, 5) == "szych")) {
return mb_substr($word, 3, -5);
}
if (strlen($word) > 6 && (mb_substr($word, -4) == "czny")) {
return mb_substr($word, 0, -4);
}
if (strlen($word) > 5 && in_array(mb_substr($word, -3), array("owy", "owa", "owe", "ych", "ego"))) {
return mb_substr($word, 0, -3);
}
if (strlen($word) > 5 && (mb_substr($word, -2) == "ej")) {
return mb_substr($word, 0, -2);
}
return $word;
}
public static function removeVerbsEnds($word)
{
if (strlen($word) > 5 && (mb_substr($word, -3) == "bym")) {
return mb_substr($word, 0, -3);
}
if (strlen($word) > 5 && in_array(mb_substr($word, -3), array("esz", "asz", "cie", "eść", "aść", "łem", "amy", "emy"))) {
return mb_substr($word, 0, -3);
}
if (strlen($word) > 3 && in_array(mb_substr($word, -3), array("esz", "asz", "eść", "aść", "", ""))) {
return mb_substr($word, 0, -2);
}
if (strlen($word) > 3 && in_array(mb_substr($word, -2), array("aj"))) {
return mb_substr($word, 0, -1);
}
if (strlen($word) > 3 && in_array(mb_substr($word, -2), array("", "em", "am", "", "", "", "ąc"))) {
return mb_substr($word, 0, -2);
}
return $word;
}
public static function removeAdverbsEnds($word)
{
if (strlen($word) > 4 && in_array(mb_substr($word, -3), array("nie", "wie", "rze"))) {
return mb_substr($word, 0, -2);
}
return $word;
}
public static function removePluralForms($word)
{
if (strlen($word) > 4 && in_array(mb_substr($word, -2), array("ów", "om"))) {
return mb_substr($word, 0, -2);
}
if (strlen($word) > 4 && (mb_substr($word, -3) == "ami")) {
return mb_substr($word, 0, -3);
}
return $word;
}
public static function removeGeneralEnds($word)
{
if (strlen($word) > 4 && in_array(substr($word, -2), array("ia", "ie"))) {
return substr($word, 0, -2);
}
if (strlen($word) > 4 && in_array(substr($word, -1), array("u", "ą", "i", "a", "ę", "y", "ę", "ł"))) {
return substr($word, 0, -1);
}
return $word;
}
public static function stem($word)
{
$word = mb_strtolower($word);
$stem = $word;
$stem = self::removeNouns($stem);
$stem = self::removeDiminutive($stem);
$stem = self::removeAdjectiveEnds($stem);
$stem = self::removeVerbsEnds($stem);
$stem = self::removeAdverbsEnds($stem);
$stem = self::removePluralForms($stem);
$stem = self::removeGeneralEnds($stem);
return $stem;
}
}

View File

@ -0,0 +1,424 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/**
* Copyright (c) 2005 Richard Heyes (http://www.phpguru.org/)
*
* All rights reserved.
*
* This script is free software.
*/
/**
* PHP5 Implementation of the Porter Stemmer algorithm. Certain elements
* were borrowed from the (broken) implementation by Jon Abernathy.
*
* Usage:
*
* $stem = PorterStemmer::Stem($word);
*
* How easy is that?
*/
class PorterStemmer implements Stemmer
{
/**
* Regex for matching a consonant
* @var string
*/
private static $regex_consonant = '(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)';
/**
* Regex for matching a vowel
* @var string
*/
private static $regex_vowel = '(?:[aeiou]|(?<![aeiou])y)';
/**
* Stems a word. Simple huh?
*
* @param string $word Word to stem
* @return string Stemmed word
*/
public static function stem($word)
{
if (strlen($word) <= 2) {
return $word;
}
$word = self::step1ab($word);
$word = self::step1c($word);
$word = self::step2($word);
$word = self::step3($word);
$word = self::step4($word);
$word = self::step5($word);
return $word;
}
/**
* Step 1
* @param string $word
* @return string
*/
private static function step1ab($word)
{
$word = self::doPartA($word);
$word = self::doPartB($word);
return $word;
}
/**
* @param string $word
*/
private static function doPartA($word)
{
if (substr($word, -1) == 's') {
self::replace($word, 'sses', 'ss')
|| self::replace($word, 'ies', 'i')
|| self::replace($word, 'ss', 'ss')
|| self::replace($word, 's', '');
}
return $word;
}
private static function doPartB($word)
{
if (substr($word, -2, 1) != 'e' || !self::replace($word, 'eed', 'ee', 0)) {
// First rule
$v = self::$regex_vowel;
// ing and ed
if (preg_match("#$v+#", substr($word, 0, -3)) && self::replace($word, 'ing', '')
|| preg_match("#$v+#", substr($word, 0, -2)) && self::replace($word, 'ed', '')) {
// Note use of && and OR, for precedence reasons
// If one of above two test successful
if (!self::replace($word, 'at', 'ate')
&& !self::replace($word, 'bl', 'ble')
&& !self::replace($word, 'iz', 'ize')) {
// Double consonant ending
if (self::doubleConsonant($word)
&& substr($word, -2) != 'll'
&& substr($word, -2) != 'ss'
&& substr($word, -2) != 'zz') {
$word = substr($word, 0, -1);
} else if (self::m($word) == 1 && self::cvc($word)) {
$word .= 'e';
}
}
}
}
return $word;
}
/**
* Step 1c
*
* @param string $word Word to stem
*/
private static function step1c($word)
{
$v = self::$regex_vowel;
if (substr($word, -1) == 'y' && preg_match("#$v+#", substr($word, 0, -1))) {
self::replace($word, 'y', 'i');
}
return $word;
}
/**
* Step 2
*
* @param string $word Word to stem
*/
private static function step2($word)
{
switch (substr($word, -2, 1)) {
case 'a':
self::replace($word, 'ational', 'ate', 0)
|| self::replace($word, 'tional', 'tion', 0);
break;
case 'c':
self::replace($word, 'enci', 'ence', 0)
|| self::replace($word, 'anci', 'ance', 0);
break;
case 'e':
self::replace($word, 'izer', 'ize', 0);
break;
case 'g':
self::replace($word, 'logi', 'log', 0);
break;
case 'l':
self::replace($word, 'entli', 'ent', 0)
|| self::replace($word, 'ousli', 'ous', 0)
|| self::replace($word, 'alli', 'al', 0)
|| self::replace($word, 'bli', 'ble', 0)
|| self::replace($word, 'eli', 'e', 0);
break;
case 'o':
self::replace($word, 'ization', 'ize', 0)
|| self::replace($word, 'ation', 'ate', 0)
|| self::replace($word, 'ator', 'ate', 0);
break;
case 's':
self::replace($word, 'iveness', 'ive', 0)
|| self::replace($word, 'fulness', 'ful', 0)
|| self::replace($word, 'ousness', 'ous', 0)
|| self::replace($word, 'alism', 'al', 0);
break;
case 't':
self::replace($word, 'biliti', 'ble', 0)
|| self::replace($word, 'aliti', 'al', 0)
|| self::replace($word, 'iviti', 'ive', 0);
break;
}
return $word;
}
/**
* Step 3
*
* @param string $word String to stem
*/
private static function step3($word)
{
switch (substr($word, -2, 1)) {
case 'a':
self::replace($word, 'ical', 'ic', 0);
break;
case 's':
self::replace($word, 'ness', '', 0);
break;
case 't':
self::replace($word, 'icate', 'ic', 0)
|| self::replace($word, 'iciti', 'ic', 0);
break;
case 'u':
self::replace($word, 'ful', '', 0);
break;
case 'v':
self::replace($word, 'ative', '', 0);
break;
case 'z':
self::replace($word, 'alize', 'al', 0);
break;
}
return $word;
}
/**
* Step 4
*
* @param string $word Word to stem
*/
private static function step4($word)
{
switch (substr($word, -2, 1)) {
case 'a':
self::replace($word, 'al', '', 1);
break;
case 'c':
self::replace($word, 'ance', '', 1)
|| self::replace($word, 'ence', '', 1);
break;
case 'e':
self::replace($word, 'er', '', 1);
break;
case 'i':
self::replace($word, 'ic', '', 1);
break;
case 'l':
self::replace($word, 'able', '', 1)
|| self::replace($word, 'ible', '', 1);
break;
case 'n':
self::replace($word, 'ant', '', 1)
|| self::replace($word, 'ement', '', 1)
|| self::replace($word, 'ment', '', 1)
|| self::replace($word, 'ent', '', 1);
break;
case 'o':
if (substr($word, -4) == 'tion' || substr($word, -4) == 'sion') {
self::replace($word, 'ion', '', 1);
} else {
self::replace($word, 'ou', '', 1);
}
break;
case 's':
self::replace($word, 'ism', '', 1);
break;
case 't':
self::replace($word, 'ate', '', 1)
|| self::replace($word, 'iti', '', 1);
break;
case 'u':
self::replace($word, 'ous', '', 1);
break;
case 'v':
self::replace($word, 'ive', '', 1);
break;
case 'z':
self::replace($word, 'ize', '', 1);
break;
}
return $word;
}
/**
* Step 5
*
* @param string $word Word to stem
*/
private static function step5($word)
{
// Part a
if (substr($word, -1) == 'e') {
if (self::m(substr($word, 0, -1)) > 1) {
self::replace($word, 'e', '');
} else if (self::m(substr($word, 0, -1)) == 1) {
if (!self::cvc(substr($word, 0, -1))) {
self::replace($word, 'e', '');
}
}
}
// Part b
if (self::m($word) > 1 && self::doubleConsonant($word) && substr($word, -1) == 'l') {
$word = substr($word, 0, -1);
}
return $word;
}
/**
* Replaces the first string with the second, at the end of the string. If third
* arg is given, then the preceding string must match that m count at least.
*
* @param string $str String to check
* @param string $check Ending to check for
* @param string $repl Replacement string
* @param int $m Optional minimum number of m() to meet
* @return bool Whether the $check string was at the end
* of the $str string. True does not necessarily mean
* that it was replaced.
*/
private static function replace(&$str, $check, $repl, $m = null)
{
$len = 0 - strlen($check);
if (substr($str, $len) == $check) {
$substr = substr($str, 0, $len);
if (is_null($m) || self::m($substr) > $m) {
$str = $substr.$repl;
}
return true;
}
return false;
}
/**
* What, you mean it's not obvious from the name?
*
* m() measures the number of consonant sequences in $str. if c is
* a consonant sequence and v a vowel sequence, and <..> indicates arbitrary
* presence,
*
* <c><v> gives 0
* <c>vc<v> gives 1
* <c>vcvc<v> gives 2
* <c>vcvcvc<v> gives 3
*
* @param string $str The string to return the m count for
* @return int The m count
*/
private static function m($str)
{
$c = self::$regex_consonant;
$v = self::$regex_vowel;
$str = preg_replace("#^$c+#", '', $str);
$str = preg_replace("#$v+$#", '', $str);
preg_match_all("#($v+$c+)#", $str, $matches);
return count($matches[1]);
}
/**
* Returns true/false as to whether the given string contains two
* of the same consonant next to each other at the end of the string.
*
* @param string $str String to check
* @return bool Result
*/
private static function doubleConsonant($str)
{
$c = self::$regex_consonant;
return preg_match("#$c{2}$#", $str, $matches) && $matches[0][0] == $matches[0][1];
}
/**
* Checks for ending CVC sequence where second C is not W, X or Y
*
* @param string $str String to check
* @return bool Result
*/
private static function cvc($str)
{
$c = self::$regex_consonant;
$v = self::$regex_vowel;
$matchFound = preg_match("#($c$v$c)$#", $str, $matches);
$return = false;
if ($matchFound && strlen($matches[1]) == 3) {
$return = true;
if (in_array($matches[1][2], ['w', 'x', 'y'])) {
$return = false;
}
}
return $return;
}
}

View File

@ -0,0 +1,727 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/**
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/**
* This is a reimplementation of the Porter Stemmer Algorithm for Portuguese.
* This script is based on the implementation found on <https://github.com/wamania/php-stemmer>
* and has been rewriten to work with TNTSearch by Lucas Padilha <https://github.com/LucasPadilha>
*
* Takes a word and reduces it to its Portuguese stem using the Porter stemmer algorithm.
*
* References:
* - http://snowball.tartarus.org/algorithms/porter/stemmer.html
* - http://snowball.tartarus.org/algorithms/portuguese/stemmer.html
*
* Usage:
* $stem = PortugueseStemmer::stem($word);
*
* @author Lucas Padilha <https://github.com/LucasPadilha>
*/
class PortugueseStemmer implements Stemmer
{
/**
* UTF-8 Case lookup table
*
* This lookuptable defines the upper case letters to their correspponding
* lower case letter in UTF-8
*
* @author Andreas Gohr <andi@splitbrain.org>
*/
private static $utf8_lower_to_upper = array(
0x0061=>0x0041, 0x03C6=>0x03A6, 0x0163=>0x0162, 0x00E5=>0x00C5, 0x0062=>0x0042,
0x013A=>0x0139, 0x00E1=>0x00C1, 0x0142=>0x0141, 0x03CD=>0x038E, 0x0101=>0x0100,
0x0491=>0x0490, 0x03B4=>0x0394, 0x015B=>0x015A, 0x0064=>0x0044, 0x03B3=>0x0393,
0x00F4=>0x00D4, 0x044A=>0x042A, 0x0439=>0x0419, 0x0113=>0x0112, 0x043C=>0x041C,
0x015F=>0x015E, 0x0144=>0x0143, 0x00EE=>0x00CE, 0x045E=>0x040E, 0x044F=>0x042F,
0x03BA=>0x039A, 0x0155=>0x0154, 0x0069=>0x0049, 0x0073=>0x0053, 0x1E1F=>0x1E1E,
0x0135=>0x0134, 0x0447=>0x0427, 0x03C0=>0x03A0, 0x0438=>0x0418, 0x00F3=>0x00D3,
0x0440=>0x0420, 0x0454=>0x0404, 0x0435=>0x0415, 0x0449=>0x0429, 0x014B=>0x014A,
0x0431=>0x0411, 0x0459=>0x0409, 0x1E03=>0x1E02, 0x00F6=>0x00D6, 0x00F9=>0x00D9,
0x006E=>0x004E, 0x0451=>0x0401, 0x03C4=>0x03A4, 0x0443=>0x0423, 0x015D=>0x015C,
0x0453=>0x0403, 0x03C8=>0x03A8, 0x0159=>0x0158, 0x0067=>0x0047, 0x00E4=>0x00C4,
0x03AC=>0x0386, 0x03AE=>0x0389, 0x0167=>0x0166, 0x03BE=>0x039E, 0x0165=>0x0164,
0x0117=>0x0116, 0x0109=>0x0108, 0x0076=>0x0056, 0x00FE=>0x00DE, 0x0157=>0x0156,
0x00FA=>0x00DA, 0x1E61=>0x1E60, 0x1E83=>0x1E82, 0x00E2=>0x00C2, 0x0119=>0x0118,
0x0146=>0x0145, 0x0070=>0x0050, 0x0151=>0x0150, 0x044E=>0x042E, 0x0129=>0x0128,
0x03C7=>0x03A7, 0x013E=>0x013D, 0x0442=>0x0422, 0x007A=>0x005A, 0x0448=>0x0428,
0x03C1=>0x03A1, 0x1E81=>0x1E80, 0x016D=>0x016C, 0x00F5=>0x00D5, 0x0075=>0x0055,
0x0177=>0x0176, 0x00FC=>0x00DC, 0x1E57=>0x1E56, 0x03C3=>0x03A3, 0x043A=>0x041A,
0x006D=>0x004D, 0x016B=>0x016A, 0x0171=>0x0170, 0x0444=>0x0424, 0x00EC=>0x00CC,
0x0169=>0x0168, 0x03BF=>0x039F, 0x006B=>0x004B, 0x00F2=>0x00D2, 0x00E0=>0x00C0,
0x0434=>0x0414, 0x03C9=>0x03A9, 0x1E6B=>0x1E6A, 0x00E3=>0x00C3, 0x044D=>0x042D,
0x0436=>0x0416, 0x01A1=>0x01A0, 0x010D=>0x010C, 0x011D=>0x011C, 0x00F0=>0x00D0,
0x013C=>0x013B, 0x045F=>0x040F, 0x045A=>0x040A, 0x00E8=>0x00C8, 0x03C5=>0x03A5,
0x0066=>0x0046, 0x00FD=>0x00DD, 0x0063=>0x0043, 0x021B=>0x021A, 0x00EA=>0x00CA,
0x03B9=>0x0399, 0x017A=>0x0179, 0x00EF=>0x00CF, 0x01B0=>0x01AF, 0x0065=>0x0045,
0x03BB=>0x039B, 0x03B8=>0x0398, 0x03BC=>0x039C, 0x045C=>0x040C, 0x043F=>0x041F,
0x044C=>0x042C, 0x00FE=>0x00DE, 0x00F0=>0x00D0, 0x1EF3=>0x1EF2, 0x0068=>0x0048,
0x00EB=>0x00CB, 0x0111=>0x0110, 0x0433=>0x0413, 0x012F=>0x012E, 0x00E6=>0x00C6,
0x0078=>0x0058, 0x0161=>0x0160, 0x016F=>0x016E, 0x03B1=>0x0391, 0x0457=>0x0407,
0x0173=>0x0172, 0x00FF=>0x0178, 0x006F=>0x004F, 0x043B=>0x041B, 0x03B5=>0x0395,
0x0445=>0x0425, 0x0121=>0x0120, 0x017E=>0x017D, 0x017C=>0x017B, 0x03B6=>0x0396,
0x03B2=>0x0392, 0x03AD=>0x0388, 0x1E85=>0x1E84, 0x0175=>0x0174, 0x0071=>0x0051,
0x0437=>0x0417, 0x1E0B=>0x1E0A, 0x0148=>0x0147, 0x0105=>0x0104, 0x0458=>0x0408,
0x014D=>0x014C, 0x00ED=>0x00CD, 0x0079=>0x0059, 0x010B=>0x010A, 0x03CE=>0x038F,
0x0072=>0x0052, 0x0430=>0x0410, 0x0455=>0x0405, 0x0452=>0x0402, 0x0127=>0x0126,
0x0137=>0x0136, 0x012B=>0x012A, 0x03AF=>0x038A, 0x044B=>0x042B, 0x006C=>0x004C,
0x03B7=>0x0397, 0x0125=>0x0124, 0x0219=>0x0218, 0x00FB=>0x00DB, 0x011F=>0x011E,
0x043E=>0x041E, 0x1E41=>0x1E40, 0x03BD=>0x039D, 0x0107=>0x0106, 0x03CB=>0x03AB,
0x0446=>0x0426, 0x00FE=>0x00DE, 0x00E7=>0x00C7, 0x03CA=>0x03AA, 0x0441=>0x0421,
0x0432=>0x0412, 0x010F=>0x010E, 0x00F8=>0x00D8, 0x0077=>0x0057, 0x011B=>0x011A,
0x0074=>0x0054, 0x006A=>0x004A, 0x045B=>0x040B, 0x0456=>0x0406, 0x0103=>0x0102,
0x03BB=>0x039B, 0x00F1=>0x00D1, 0x043D=>0x041D, 0x03CC=>0x038C, 0x00E9=>0x00C9,
0x00F0=>0x00D0, 0x0457=>0x0407, 0x0123=>0x0122,
);
private static $vowels = array('a', 'e', 'i', 'o', 'u', 'á', 'é', 'í', 'ó', 'ú', 'â', 'ê', 'ô');
public static function stem($word)
{
// we do ALL in UTF-8
if (!self::check($word)) {
throw new \Exception('Word must be in UTF-8');
}
$word = self::strtolower($word);
$word = self::str_replace(array('ã', 'õ'), array('a~', 'o~'), $word);
$rv = '';
$rvIndex = '';
self::rv($word, $rv, $rvIndex);
$r1 = '';
$r1Index = '';
self::r1($word, $r1, $r1Index);
$r2 = '';
$r2Index = '';
self::r2($r1, $r1Index, $r2, $r2Index);
$initialWord = $word;
self::step1($word, $r1Index, $r2Index, $rvIndex);
if ($initialWord == $word) {
self::step2($word, $rvIndex);
}
if ($initialWord != $word) {
self::step3($word, $rvIndex);
} else {
self::step4($word, $rvIndex);
}
self::step5($word, $rvIndex);
self::finish($word);
return $word;
}
/**
* R1 is the region after the first non-vowel following a vowel, or the end of the word if there is no such non-vowel.
*/
private static function r1($word, &$r1, &$r1Index)
{
list($index, $value) = self::rx($word);
$r1 = $value;
$r1Index = $index;
return true;
}
/**
* R2 is the region after the first non-vowel following a vowel in R1, or the end of the word if there is no such non-vowel.
*/
private static function r2($r1, $r1Index, &$r2, &$r2Index)
{
list($index, $value) = self::rx($r1);
$r2 = $value;
$r2Index = $r1Index + $index;
return true;
}
/**
* Common function for R1 and R2
* Search the region after the first non-vowel following a vowel in $word, or the end of the word if there is no such non-vowel.
* R1 : $in = $this->word
* R2 : $in = R1
*/
private static function rx($in)
{
$length = self::strlen($in);
// Defaults
$value = '';
$index = $length;
// Search all vowels
$vowels = array();
for ($i = 0; $i < $length; $i++) {
$letter = self::substr($in, $i, 1);
if (in_array($letter, static::$vowels)) {
$vowels[] = $i;
}
}
// Search the non-vowel following a vowel
foreach ($vowels as $position) {
$after = $position + 1;
$letter = self::substr($in, $after, 1);
if (!in_array($letter, static::$vowels)) {
$index = $after + 1;
$value = self::substr($in, ($after+1));
break;
}
}
return array($index, $value);
}
/**
* Used by spanish, italian, portuguese, etc (but not by french)
*
* If the second letter is a consonant, RV is the region after the next following vowel,
* or if the first two letters are vowels, RV is the region after the next consonant,
* and otherwise (consonant-vowel case) RV is the region after the third letter.
* But RV is the end of the word if these positions cannot be found.
*/
private static function rv($word, &$rv, &$rvIndex)
{
$length = self::strlen($word);
if ($length < 3) {
return true;
}
$first = self::substr($word, 0, 1);
$second = self::substr($word, 1, 1);
// If the second letter is a consonant, RV is the region after the next following vowel,
if (!in_array($second, static::$vowels)) {
for ($i = 2; $i < $length; $i++) {
$letter = self::substr($word, $i, 1);
if (in_array($letter, static::$vowels)) {
$rv = self::substr($word, ($i + 1));
$rvIndex = $i + 1;
return true;
}
}
}
// or if the first two letters are vowels, RV is the region after the next consonant,
if ((in_array($first, static::$vowels)) && (in_array($second, static::$vowels))) {
for ($i = 2; $i < $length; $i++) {
$letter = self::substr($word, $i, 1);
if (!in_array($letter, static::$vowels)) {
$rv = self::substr($word, ($i + 1));
$rvIndex = $i + 1;
return true;
}
}
}
// and otherwise (consonant-vowel case) RV is the region after the third letter.
if ((!in_array($first, static::$vowels)) && (in_array($second, static::$vowels))) {
$rv = self::substr($word, 3);
$rvIndex = 3;
return true;
}
return false;
}
private static function inRv($position, $rvIndex)
{
return ($position >= $rvIndex);
}
private static function inR1($position, $r1Index)
{
return ($position >= $r1Index);
}
private static function inR2($position, $r2Index)
{
return ($position >= $r2Index);
}
private static function searchIfInRv($word, $suffixes, $rvIndex)
{
return self::search($word, $suffixes, $rvIndex);
}
private static function searchIfInR2($word, $suffixes, $r2Index)
{
return self::search($word, $suffixes, $r2Index);
}
private static function search($word, $suffixes, $offset = 0)
{
$length = self::strlen($word);
if ($offset > $length) {
return false;
}
foreach ($suffixes as $suffix) {
if ((($position = self::strrpos($word, $suffix, $offset)) !== false) && ((self::strlen($suffix) + $position) == $length)) {
return $position;
}
}
return false;
}
/**
* Step 1: Standard suffix removal
*/
private static function step1(&$word, $r1Index, $r2Index, $rvIndex)
{
// delete if in R2
if (($position = self::search($word, array('amentos', 'imentos', 'adoras', 'adores', 'amento', 'imento', 'adora', 'istas', 'ismos', 'antes', 'ância', 'ezas', 'eza', 'icos', 'icas', 'ismo', 'ável', 'ível', 'ista', 'oso', 'osos', 'osas', 'osa', 'ico', 'ica', 'ador', 'aça~o', 'aço~es' , 'ante'))) !== false) {
if (self::inR2($position, $r2Index)) {
$word = self::substr($word, 0, $position);
}
return true;
}
// replace with log if in R2
if (($position = self::search($word, array('logías', 'logía'))) !== false) {
if (self::inR2($position, $r2Index)) {
$word = preg_replace('#(logías|logía)$#u', 'log', $word);
}
return true;
}
// replace with u if in R2
if (($position = self::search($word, array('uciones', 'ución'))) !== false) {
if (self::inR2($position, $r2Index)) {
$word = preg_replace('#(uciones|ución)$#u', 'u', $word);
}
return true;
}
// replace with ente if in R2
if (($position = self::search($word, array('ências', 'ência'))) !== false) {
if (self::inR2($position, $r2Index)) {
$word = preg_replace('#(ências|ência)$#u', 'ente', $word);
}
return true;
}
// delete if in R1
// if preceded by iv, delete if in R2 (and if further preceded by at, delete if in R2), otherwise,
// if preceded by os, ic or ad, delete if in R2
if (($position = self::search($word, array('amente'))) !== false) {
// delete if in R1
if (self::inR1($position, $r1Index)) {
$word = self::substr($word, 0, $position);
}
// if preceded by iv, delete if in R2 (and if further preceded by at, delete if in R2), otherwise,
if (($position2 = self::searchIfInR2($word, array('iv'), $r2Index)) !== false) {
$word = self::substr($word, 0, $position2);
if (($position3 = self::searchIfInR2($word, array('at'), $r2Index)) !== false) {
$word = self::substr($word, 0, $position3);
}
// if preceded by os, ic or ad, delete if in R2
} elseif (($position4 = self::searchIfInR2($word, array('os', 'ic', 'ad'), $r2Index)) !== false) {
$word = self::substr($word, 0, $position4);
}
return true;
}
// delete if in R2
// if preceded by ante, avel or ível, delete if in R2
if (($position = self::search($word, array('mente'))) !== false) {
// delete if in R2
if (self::inR2($position, $r2Index)) {
$word = self::substr($word, 0, $position);
}
// if preceded by ante, avel or ível, delete if in R2
if (($position2 = self::searchIfInR2($word, array('ante', 'avel', 'ível'), $r2Index)) != false) {
$word = self::substr($word, 0, $position2);
}
return true;
}
// delete if in R2
// if preceded by abil, ic or iv, delete if in R2
if (($position = self::search($word, array('idades', 'idade'))) !== false) {
// delete if in R2
if (self::inR2($position, $r2Index)) {
$word = self::substr($word, 0, $position);
}
// if preceded by abil, ic or iv, delete if in R2
if (($position2 = self::searchIfInR2($word, array('abil', 'ic', 'iv'), $r2Index)) !== false) {
$word = self::substr($word, 0, $position2);
}
return true;
}
// delete if in R2
// if preceded by at, delete if in R2
if (($position = self::search($word, array('ivas', 'ivos', 'iva', 'ivo'))) !== false) {
// delete if in R2
if (self::inR2($position, $r2Index)) {
$word = self::substr($word, 0, $position);
}
// if preceded by at, delete if in R2
if (($position2 = self::searchIfInR2($word, array('at'), $r2Index)) !== false) {
$word = self::substr($word, 0, $position2);
}
return true;
}
// replace with ir if in RV and preceded by e
if (($position = self::search($word, array('iras', 'ira'))) !== false) {
if (self::inRv($position, $rvIndex)) {
$before = $position - 1;
$letter = self::substr($word, $before, 1);
if ($letter == 'e') {
$word = preg_replace('#(iras|ira)$#u', 'ir', $word);
}
}
return true;
}
return false;
}
/**
* Step 2: Verb suffixes
* Search for the longest among the following suffixes in RV, and if found, delete.
*/
private static function step2(&$word, $rvIndex)
{
if (($position = self::searchIfInRv($word, array('aríamos', 'eríamos', 'iríamos', 'ássemos', 'êssemos', 'íssemos', 'aríeis', 'eríeis', 'iríeis', 'ásseis', 'ésseis', 'ísseis', 'áramos', 'éramos', 'íramos', 'ávamos', 'aremos', 'eremos', 'iremos', 'ariam', 'eriam', 'iriam', 'assem', 'essem', 'issem', 'arias', 'erias', 'irias', 'ardes', 'erdes', 'irdes', 'asses', 'esses', 'isses', 'astes', 'estes', 'istes', 'áreis', 'areis', 'éreis', 'ereis', 'íreis', 'ireis', 'áveis', 'íamos', 'armos', 'ermos', 'irmos', 'aria', 'eria', 'iria', 'asse', 'esse', 'isse', 'aste', 'este', 'iste', 'arei', 'erei', 'irei', 'adas', 'idas', 'aram', 'eram', 'iram', 'avam', 'arem', 'erem', 'irem', 'ando', 'endo', 'indo', 'ara~o', 'era~o', 'ira~o', 'arás', 'aras', 'erás', 'eras', 'irás', 'avas', 'ares', 'eres', 'ires', 'íeis', 'ados', 'idos', 'ámos', 'amos', 'emos', 'imos', 'iras', 'ada', 'ida', 'ará', 'ara', 'erá', 'era', 'irá', 'ava', 'iam', 'ado', 'ido', 'ias', 'ais', 'eis', 'ira', 'ia', 'ei', 'am', 'em', 'ar', 'er', 'ir', 'as', 'es', 'is', 'eu', 'iu', 'ou'), $rvIndex)) !== false) {
$word = self::substr($word, 0, $position);
return true;
}
return false;
}
/**
* Step 3: d-suffixes
*
*/
private static function step3(&$word, $rvIndex)
{
// Delete suffix i if in RV and preceded by c
if (self::searchIfInRv($word, array('i'), $rvIndex) !== false) {
$letter = self::substr($word, -2, 1);
if ($letter == 'c') {
$word = self::substr($word, 0, -1);
}
return true;
}
return false;
}
/**
* Step 4
*/
private static function step4(&$word, $rvIndex)
{
// If the word ends with one of the suffixes "os a i o á í ó" in RV, delete it
if (($position = self::searchIfInRv($word, array('os', 'a', 'i', 'o','á', 'í', 'ó'), $rvIndex)) !== false) {
$word = self::substr($word, 0, $position);
return true;
}
return false;
}
/**
* Step 5
*/
private static function step5(&$word, $rvIndex)
{
// If the word ends with one of "e é ê" in RV, delete it, and if preceded by gu (or ci) with the u (or i) in RV, delete the u (or i).
if (self::searchIfInRv($word, array('e', 'é', 'ê'), $rvIndex) !== false) {
$word = self::substr($word, 0, -1);
if (($position2 = self::search($word, array('gu', 'ci'))) !== false) {
if (self::inRv(($position2 + 1), $rvIndex)) {
$word = self::substr($word, 0, -1);
}
}
return true;
} elseif (self::search($word, array('ç')) !== false) {
$word = preg_replace('#(ç)$#u', 'c', $word);
return true;
}
return false;
}
private static function finish(&$word)
{
// turn U and Y back into lower case, and remove the umlaut accent from a, o and u.
$word = self::str_replace(array('a~', 'o~'), array('ã', 'õ'), $word);
}
/**
* Tries to detect if a string is in Unicode encoding
*
* @author <bmorel@ssi.fr>
* @link http://www.php.net/manual/en/function.utf8-encode.php
*/
private static function check($str)
{
for ($i=0; $i<strlen($str); $i++) {
if (ord($str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif ((ord($str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif ((ord($str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif ((ord($str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
elseif ((ord($str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == strlen($str)) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
/**
* Unicode aware replacement for strlen()
*
* utf8_decode() converts characters that are not in ISO-8859-1
* to '?', which, for the purpose of counting, is alright - It's
* even faster than mb_strlen.
*
* @author <chernyshevsky at hotmail dot com>
* @see strlen()
* @see utf8_decode()
*/
private static function strlen($string)
{
return strlen(utf8_decode($string));
}
/**
* Unicode aware replacement for substr()
*
* @author lmak at NOSPAM dot iti dot gr
* @link http://www.php.net/manual/en/function.substr.php
* @see substr()
*/
private static function substr($str,$start,$length=null)
{
$ar = array();
preg_match_all("/./u", $str, $ar);
if($length != null) {
return join("",array_slice($ar[0],$start,$length));
} else {
return join("",array_slice($ar[0],$start));
}
}
/**
* Unicode aware replacement for strrepalce()
*
* @author Harry Fuecks <hfuecks@gmail.com>
* @see strreplace();
*/
private static function str_replace($s,$r,$str)
{
if(!is_array($s)){
$s = '!'.preg_quote($s,'!').'!u';
}else{
foreach ($s as $k => $v) {
$s[$k] = '!'.preg_quote($v).'!u';
}
}
return preg_replace($s,$r,$str);
}
/**
* This is a unicode aware replacement for strtolower()
*
* Uses mb_string extension if available
*
* @author Andreas Gohr <andi@splitbrain.org>
* @see strtolower()
* @see utf8_strtoupper()
*/
private static function strtolower($string)
{
if(!defined('UTF8_NOMBSTRING') && function_exists('mb_strtolower'))
return mb_strtolower($string,'utf-8');
//global $utf8_upper_to_lower;
$utf8_upper_to_lower = array_flip(self::$utf8_lower_to_upper);
$uni = self::utf8_to_unicode($string);
$cnt = count($uni);
for ($i=0; $i < $cnt; $i++){
if($utf8_upper_to_lower[$uni[$i]]){
$uni[$i] = $utf8_upper_to_lower[$uni[$i]];
}
}
return self::unicode_to_utf8($uni);
}
/**
* This function returns any UTF-8 encoded text as a list of
* Unicode values:
*
* @author Scott Michael Reynen <scott@randomchaos.com>
* @link http://www.randomchaos.com/document.php?source=php_and_unicode
* @see unicode_to_utf8()
*/
private static function utf8_to_unicode( &$str )
{
$unicode = array();
$values = array();
$looking_for = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$this_value = ord( $str[ $i ] );
if ( $this_value < 128 ) $unicode[] = $this_value;
else {
if ( count( $values ) == 0 ) $looking_for = ( $this_value < 224 ) ? 2 : 3;
$values[] = $this_value;
if ( count( $values ) == $looking_for ) {
$number = ( $looking_for == 3 ) ?
( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):
( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$unicode[] = $number;
$values = array();
$looking_for = 1;
}
}
}
return $unicode;
}
/**
* This function converts a Unicode array back to its UTF-8 representation
*
* @author Scott Michael Reynen <scott@randomchaos.com>
* @link http://www.randomchaos.com/document.php?source=php_and_unicode
* @see utf8_to_unicode()
*/
private static function unicode_to_utf8( &$str )
{
if (!is_array($str)) return '';
$utf8 = '';
foreach( $str as $unicode ) {
if ( $unicode < 128 ) {
$utf8.= chr( $unicode );
} elseif ( $unicode < 2048 ) {
$utf8.= chr( 192 + ( ( $unicode - ( $unicode % 64 ) ) / 64 ) );
$utf8.= chr( 128 + ( $unicode % 64 ) );
} else {
$utf8.= chr( 224 + ( ( $unicode - ( $unicode % 4096 ) ) / 4096 ) );
$utf8.= chr( 128 + ( ( ( $unicode % 4096 ) - ( $unicode % 64 ) ) / 64 ) );
$utf8.= chr( 128 + ( $unicode % 64 ) );
}
}
return $utf8;
}
/**
* This is an Unicode aware replacement for strrpos
*
* Uses mb_string extension if available
*
* @author Harry Fuecks <hfuecks@gmail.com>
* @see strpos()
*/
private static function strrpos($haystack, $needle, $offset=0)
{
if(!defined('UTF8_NOMBSTRING') && function_exists('mb_strrpos'))
return mb_strrpos($haystack, $needle, $offset, 'utf-8');
if (!$offset) {
$ar = self::explode($needle, $haystack);
$count = count($ar);
if ( $count > 1 ) {
return self::strlen($haystack) - self::strlen($ar[($count-1)]) - self::strlen($needle);
}
return false;
} else {
if ( !is_int($offset) ) {
trigger_error('Offset must be an integer', E_USER_WARNING);
return false;
}
$str = self::substr($haystack, $offset);
if ( false !== ($pos = self::strrpos($str, $needle))){
return $pos + $offset;
}
return false;
}
}
/**
* Unicode aware replacement for explode
*
* @author Harry Fuecks <hfuecks@gmail.com>
* @see explode();
*/
private static function explode($sep, $str)
{
if ( $sep == '' ) {
trigger_error('Empty delimiter',E_USER_WARNING);
return FALSE;
}
return preg_split('!'.preg_quote($sep,'!').'!u',$str);
}
}

View File

@ -0,0 +1,83 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/**
* Semple stemmer for russian language
*/
class RussianStemmer implements Stemmer
{
private static $VOWEL = '/аеиоуыэюя/u';
private static $PERFECTIVEGROUND = '/((ив|ивши|ившись|ыв|ывши|ывшись)|((?<=[ая])(в|вши|вшись)))$/u';
private static $REFLEXIVE = '/(с[яь])$/u';
private static $ADJECTIVE = '/(ее|ие|ые|ое|ими|ыми|ей|ий|ый|ой|ем|им|ым|ом|его|ого|ему|ому|их|ых|ую|юю|ая|яя|ою|ею)$/u';
private static $PARTICIPLE = '/((ивш|ывш|ующ)|((?<=[ая])(ем|нн|вш|ющ|щ)))$/u';
private static $VERB = '/((ила|ыла|ена|ейте|уйте|ите|или|ыли|ей|уй|ил|ыл|им|ым|ен|ило|ыло|ено|ят|ует|уют|ит|ыт|ены|ить|ыть|ишь|ую|ю)|((?<=[ая])(ла|на|ете|йте|ли|й|л|ем|н|ло|но|ет|ют|ны|ть|ешь|нно)))$/u';
private static $NOUN = '/(а|ев|ов|ие|ье|е|иями|ями|ами|еи|ии|и|ией|ей|ой|ий|й|иям|ям|ием|ем|ам|ом|о|у|ах|иях|ях|ы|ь|ию|ью|ю|ия|ья|я)$/u';
private static $RVRE = '/^(.*?[аеиоуыэюя])(.*)$/u';
private static $DERIVATIONAL = '/[^аеиоуыэюя][аеиоуыэюя]+[^аеиоуыэюя]+[аеиоуыэюя].*(?<=о)сть?$/u';
private static function s(&$s, $re, $to)
{
$orig = $s;
$s = preg_replace($re, $to, $s);
return $orig !== $s;
}
private static function m($s, $re)
{
return preg_match($re, $s);
}
public static function stem($word)
{
$word = mb_strtolower($word);
$word = str_replace('ё', 'е', $word);
$stem = $word;
do {
if (!preg_match(self::$RVRE, $word, $p)) {
break;
}
$start = $p[1];
$RV = $p[2];
if (!$RV) {
break;
}
// Step 1
if (!self::s($RV, self::$PERFECTIVEGROUND, '')) {
self::s($RV, self::$REFLEXIVE, '');
if (self::s($RV, self::$ADJECTIVE, '')) {
self::s($RV, self::$PARTICIPLE, '');
} else {
if (!self::s($RV, self::$VERB, '')) {
self::s($RV, self::$NOUN, '');
}
}
}
// Step 2
self::s($RV, '/и$/u', '');
// Step 3
if (self::m($RV, self::$DERIVATIONAL)) {
self::s($RV, '/ость?$/u', '');
}
// Step 4
if (!self::s($RV, '/ь$/u', '')) {
self::s($RV, '/ейше?/u', '');
self::s($RV, '/нн$/u', 'н');
}
$stem = $start . $RV;
} while (FALSE);
return $stem;
}
}

View File

@ -0,0 +1,6 @@
<?php namespace TeamTNT\TNTSearch\Stemmer;
interface Stemmer
{
public static function stem($word);
}

View File

@ -0,0 +1,83 @@
<?php
namespace TeamTNT\TNTSearch\Stemmer;
/**
* Semple stemmer for ukrainian language
*/
class UkrainianStemmer implements Stemmer
{
private static $VOWEL = '/аеиоуюяіїє/u';
/* http://uk.wikipedia.org/wiki/Голосний_звук */
// var $PERFECTIVEGROUND = '/((ив|ивши|ившись|ыв|ывши|ывшись((?<=[ая])(в|вши|вшись)))$/';
private static $PERFECTIVEGROUND = '/(ив|ивши|ившись|ів|івши|івшись((?<=[ая|я])(в|вши|вшись)))$/u';
private static $REFLEXIVE = '/(с[яьи])$/u'; // http://uk.wikipedia.org/wiki/Рефлексивнеієслово
private static $ADJECTIVE = '/(ими|ій|ий|а|е|ова|ове|ів|є|їй|єє|еє|я|ім|ем|им|ім|их|іх|ою|йми|іми|у|ю|ого|ому|ої)$/u'; //http://uk.wikipedia.org/wiki/Прикметник + http://wapedia.mobi/uk/Прикметник
private static $PARTICIPLE = '/(ий|ого|ому|им|ім|а|ій|у|ою|ій|і|их|йми|их)$/u'; //http://uk.wikipedia.org/wiki/Дієприкметник
private static $VERB = '/(сь|ся|ив|ать|ять|у|ю|ав|али|учи|ячи|вши|ши|е|ме|ати|яти|є)$/u'; //http://uk.wikipedia.org/wiki/Дієслово
private static $NOUN = '/(а|ев|ов|е|ями|ами|еи|и|ей|ой|ий|й|иям|ям|ием|ем|ам|ом|о|у|ах|иях|ях|ы|ь|ию|ью|ю|ия|ья|я|і|ові|ї|ею|єю|ою|є|еві|ем|єм|ів|їв|\'ю)$/u'; //http://uk.wikipedia.org/wiki/Іменник
private static $RVRE = '/^(.*?[аеиоуюяіїє])(.*)$/u';
private static $DERIVATIONAL = '/[^аеиоуюяіїє][аеиоуюяіїє]+[^аеиоуюяіїє]+[аеиоуюяіїє].*(?<=о)сть?$/u';
private static function s(&$s, $re, $to)
{
$orig = $s;
$s = preg_replace($re, $to, $s);
return $orig !== $s;
}
private static function m($s, $re)
{
return preg_match($re, $s);
}
public static function stem($word)
{
$word = mb_strtolower($word);
$stem = $word;
do {
if (!preg_match(self::$RVRE, $word, $p)) {
break;
}
$start = $p[1];
$RV = $p[2];
if (!$RV) {
break;
}
// Step 1
if (!self::s($RV, self::$PERFECTIVEGROUND, '')) {
self::s($RV, self::$REFLEXIVE, '');
if (self::s($RV, self::$ADJECTIVE, '')) {
self::s($RV, self::$PARTICIPLE, '');
} else {
if (!self::s($RV, self::$VERB, '')) {
self::s($RV, self::$NOUN, '');
}
}
}
// Step 2
self::s($RV, '/[и|i]$/u', '');
// Step 3
if (self::m($RV, self::$DERIVATIONAL)) {
self::s($RV, '/сть?$/u', '');
}
// Step 4
if (!self::s($RV, '/ь$/u', '')) {
self::s($RV, '/ейше?/u', '');
self::s($RV, '/нн$/u', 'н');
}
$stem = $start . $RV;
} while (FALSE);
return $stem;
}
}

View File

@ -0,0 +1 @@
["a", "ako", "ali", "bi", "bih", "bila", "bili", "bilo", "bio", "bismo", "biste", "biti", "bumo", "da", "do", " duž", "ga", "hoće", "hoćemo", "hoćete", "hoćeš", "hoću", "i", "iako", "ih", "ili", "iz", "ja", "je", "jedna", "jedne", "jedno", "jer", "jesam", "jesi", "jesmo", "jest", "jeste", "jesu", "jim", "joj", "još", "ju", "kada", "kako", "kao", "koja", "koje", "koji", "kojima", "koju", "kroz", "li", "me", "mene", "meni", "mi", "mimo", "moj", "moja", "moje", "mu", "na", "nad", "nakon", "nam", "nama", "nas", "naš", "naša", "naše", "našeg", "ne", "nego", "neka", "neki", "nekog", "neku", "nema", "netko", "neće", "nećemo", "nećete", "nećeš", "neću", "nešto", "ni", "nije", "nikoga", "nikoje", "nikoju", "nisam", "nisi", "nismo", "niste", "nisu", "njega", "njegov", "njegova", "njegovo", "njemu", "njezin", "njezina", "njezino", "njih", "njihov", "njihova", "njihovo", "njim", "njima", "njoj", "nju", "no", "o", "od", "odmah", "on", "ona", "oni", "ono", "ova", "pa", "pak", "po", "pod", "pored", "prije", "s", "sa", "sam", "samo", "se", "sebe", "sebi", "si", "smo", "ste", "su", "sve", "svi", "svog", "svoj", "svoja", "svoje", "svom", "ta", "tada", "taj", "tako", "te", "tebe", "tebi", "ti", "to", "toj", "tome", "tu", "tvoj", "tvoja", "tvoje", "u", "uz", "vam", "vama", "vas", "vaš", "vaša", "vaše", "već", "vi", "vrlo", "za", "zar", "će", "ćemo", "ćete", "ćeš", "ću", "što", "tijekom"]

View File

@ -0,0 +1 @@
["one", "also", "lets", "get", "still", "vs", "re", "our", "their", "couldn", "hadn't", "for", "these", "not", "themselves", "your", "won't", "which", "just", "o", "you're", "can", "shouldn't", "we", "at", "had", "and", "myself", "but", "you've", "having", "my", "was", "ve", "during", "it", "y", "she", "how", "haven't", "other", "aren't", "there", "doesn't", "he", "do", "you'll", "d", "where", "a", "hers", "are", "both", "i", "or", "itself", "while", "over", "have", "me", "him", "ain", "haven", "that", "down", "theirs", "shan", "what", "shan't", "them", "all", "mightn", "from", "when", "won", "then", "most", "wouldn", "now", "again", "why", "only", "by", "too", "don't", "herself", "wasn't", "with", "each", "above", "whom", "ll", "until", "her", "so", "who", "needn't", "ours", "after", "m", "isn't", "they", "weren't", "aren", "will", "doesn", "the", "any", "hasn't", "isn", "were", "his", "up", "yourself", "on", "out", "as", "off", "below", "own", "s", "into", "some", "t", "hasn", "between", "here", "should", "of", "in", "being", "mightn't", "mustn", "ourselves", "shouldn", "does", "an", "than", "mustn't", "yourselves", "to", "no", "about", "its", "more", "hadn", "himself", "further", "you", "is", "against", "once", "this", "should've", "nor", "did", "wasn", "she's", "weren", "has", "those", "been", "wouldn't", "don", "yours", "if", "few", "didn", "be", "needn", "couldn't", "that'll", "didn't", "same", "before", "ma", "because", "it's", "such", "very", "you'd", "doing", "through", "under", "am"]

View File

@ -0,0 +1 @@
["auront", "votre", "ils", "\u00e9tions", "et", "\u00e9tais", "avec", "elle", "nos", "\u00e9taient", "\u00e9tait", "soyez", "seront", "sommes", "eussions", "eus", "eurent", "aient", "ont", "ai", "tu", "aurais", "e\u00fbmes", "serais", "eu", "avait", "ce", "aie", "ayant", "avez", "aurez", "je", "serons", "sont", "aurons", "s", "\u00e9t\u00e9es", "soit", "\u00eates", "e\u00fbtes", "par", "qui", "y", "avaient", "ne", "vos", "auriez", "tes", "serai", "seraient", "\u00e9tiez", "te", "fus", "\u00e9tant", "fussions", "\u00e9t\u00e9s", "mon", "e\u00fbt", "d", "ayants", "avions", "f\u00fbmes", "eues", "eusses", "la", "n", "c", "lui", "est", "ayantes", "nous", "aies", "que", "aurions", "ces", "avons", "mes", "un", "le", "sa", "fusse", "aura", "leur", "eut", "eussent", "se", "les", "m", "ton", "\u00e9tantes", "serait", "ses", "t", "\u00e9t\u00e9", "une", "f\u00fbt", "fusses", "pas", "aux", "vous", "ayez", "ayons", "\u00e9tants", "es", "m\u00eame", "fut", "auraient", "eusse", "toi", "suis", "aviez", "aurai", "ayante", "seras", "ta", "sois", "f\u00fbtes", "auras", "qu", "\u00e9tante", "serions", "seriez", "pour", "ma", "on", "dans", "serez", "\u00e0", "son", "\u00e9t\u00e9e", "furent", "des", "l", "fussent", "ait", "notre", "sera", "me", "soyons", "il", "mais", "du", "en", "sur", "fussiez", "as", "ou", "avais", "de", "soient", "eue", "eux", "aurait", "eussiez", "au", "moi", "j"]

View File

@ -0,0 +1 @@
["anderer", "unseres", "keinem", "jener", "jenes", "keiner", "jedem", "anders", "da", "nichts", "sehr", "unseren", "den", "kein", "wie", "zu", "meine", "sondern", "ihm", "bei", "einige", "wollen", "denn", "ihres", "werde", "viel", "wenn", "eines", "uns", "welchem", "habe", "k\u00f6nnen", "mich", "und", "euren", "anderr", "dazu", "jedes", "kann", "an", "wir", "diesem", "was", "eure", "ihre", "wieder", "dann", "unser", "eurer", "in", "deine", "doch", "ist", "um", "demselben", "nach", "waren", "weil", "manchen", "dem", "ihn", "anderes", "ohne", "einen", "wollte", "jenem", "einiger", "seinen", "dessen", "jede", "mir", "keinen", "dasselbe", "k\u00f6nnte", "es", "hat", "oder", "\u00fcber", "deines", "ihr", "wird", "desselben", "vor", "meiner", "seines", "manchem", "hatten", "einigem", "anderen", "einmal", "diese", "meines", "ich", "also", "derselben", "hinter", "solchem", "war", "damit", "einem", "deiner", "aus", "seinem", "aller", "anderm", "sein", "nur", "einig", "dieselbe", "solchen", "weg", "haben", "hin", "deinen", "dass", "einigen", "da\u00df", "solche", "alle", "diesen", "im", "einiges", "du", "nicht", "zwischen", "w\u00fcrden", "das", "andere", "jenen", "sind", "die", "jetzt", "so", "dein", "vom", "bist", "dieser", "am", "dies", "des", "manche", "ihnen", "w\u00e4hrend", "allem", "indem", "aber", "musste", "dieselben", "eures", "gewesen", "ihrer", "welcher", "derselbe", "euer", "andern", "seiner", "dich", "denselben", "sie", "welchen", "dieses", "eurem", "unserem", "bis", "hier", "allen", "mancher", "wo", "einer", "auch", "gegen", "alles", "weiter", "nun", "keines", "keine", "meinen", "werden", "zwar", "der", "warst", "zur", "eine", "wirst", "ihren", "auf", "dir", "soll", "anderem", "als", "deinem", "durch", "von", "meinem", "jene", "ein", "mit", "unter", "zum", "bin", "hab", "derer", "jeden", "sollte", "w\u00fcrde", "welches", "ander", "er", "etwas", "sich", "manches", "welche", "seine", "jeder", "ins", "f\u00fcr", "solcher", "solches", "ihrem", "unsere", "will", "ob", "dort", "hatte", "mein", "sonst", "man", "muss", "noch", "machen", "selbst", "euch"]

View File

@ -0,0 +1 @@
["gli", "dove", "a", "fossero", "stiano", "alle", "avevano", "hanno", "mie", "sar\u00f2", "suoi", "stai", "questo", "un", "nei", "anche", "facessimo", "starebbe", "stemmo", "questa", "stesse", "sua", "dov", "o", "dallo", "ero", "dell", "starei", "stando", "negl", "fossi", "all", "sarai", "di", "suo", "far\u00f2", "tu", "si", "stavate", "facciano", "degli", "vostra", "avreste", "foste", "avranno", "ha", "facevo", "quelli", "sareste", "loro", "in", "degl", "come", "stanno", "ad", "lo", "avremo", "facciate", "avessi", "dalla", "vostro", "coi", "sugl", "con", "una", "quelle", "avuti", "eri", "eravamo", "eravate", "sono", "fanno", "stessero", "abbiamo", "chi", "sia", "alla", "nello", "tra", "nostra", "nostre", "avemmo", "sar\u00e0", "saremmo", "col", "al", "dei", "da", "facevano", "faceste", "mi", "facesse", "i", "avete", "\u00e8", "siate", "dai", "tuoi", "dal", "avevo", "farete", "avute", "allo", "avr\u00e0", "avuto", "farei", "io", "tua", "avevate", "negli", "l", "la", "faremo", "vostri", "saresti", "stette", "stavo", "avendo", "sarete", "stavamo", "fosse", "faranno", "perch\u00e9", "staremo", "voi", "delle", "noi", "stareste", "stava", "dagl", "se", "avrete", "quanto", "della", "nella", "sull", "sulle", "vi", "facesti", "li", "faceva", "facciamo", "miei", "sul", "fui", "avrai", "avessero", "avuta", "stiamo", "del", "stavi", "agl", "avevi", "erano", "uno", "abbiate", "stessi", "quanta", "staresti", "fosti", "sue", "stettero", "faremmo", "vostre", "nostri", "avevamo", "avrei", "abbia", "sulla", "le", "sarebbero", "quale", "quante", "quella", "ed", "nell", "tue", "far\u00e0", "fossimo", "farebbero", "siano", "aveste", "siamo", "saranno", "star\u00e0", "feci", "sugli", "lui", "fummo", "fai", "stetti", "ebbi", "ebbero", "furono", "ne", "non", "farai", "faccio", "pi\u00f9", "dagli", "avrebbe", "mio", "avesse", "era", "stia", "questi", "starai", "su", "il", "ho", "dalle", "nelle", "sui", "tutto", "ti", "star\u00f2", "fareste", "dello", "stesti", "facessero", "tuo", "aveva", "avessimo", "siete", "essendo", "staranno", "nostro", "ma", "c", "avresti", "stiate", "per", "queste", "stavano", "ci", "ebbe", "sto", "starete", "starebbero", "cui", "nel", "facevate", "fecero", "facendo", "e", "farebbe", "avr\u00f2", "quello", "avrebbero", "dall", "saremo", "ai", "avremmo", "fu", "fece", "stessimo", "contro", "sarebbe", "facevamo", "steste", "avesti", "faccia", "facessi", "agli", "quanti", "abbiano", "facevi", "sta", "facemmo", "faresti", "hai", "sei", "staremmo", "sullo", "mia", "sarei", "lei", "che", "tutti"]

View File

@ -0,0 +1 @@
["больше","может","много","более","ее","со","она","к","потому","и","хорошо","надо","не","же","по","есть","раз","конечно","у","нельзя","быть","кто","под","в","во","об","лучше","какой","даже","ему","до","я","почти","тем","вдруг","как","вы","них","да","но","вас","вам","сам","свою","там","нее","один","то","было","ну","эту","два","того","никогда","этот","чтобы","чего","нет","всего","меня","при","впрочем","этого","такой","после","нас","что","перед","ни","ведь","когда","им","ним","между","ж","а","из","наконец","вот","нибудь","куда","чуть","иногда","все","с","тогда","ты","тоже","ничего","себе","так","уже","они","тут","был","над","эти","какая","опять","этой","можно","совсем","него","ней","была","на","чем","для","еще","без","от","моя","потом","их","сейчас","этом","он","другой","про","здесь","три","были","будто","разве","только","всегда","уж","или","всех","мы","том","чтоб","если","где","за","тот","хоть","ей","зачем","через","о","себя","бы","мне","ли","всю","будет","мой","теперь","тебя","его"]

View File

@ -0,0 +1 @@
["hubierais", "sentido", "suya", "fu\u00e9semos", "estuvi\u00e9semos", "estar\u00e1s", "fuerais", "ha", "estar\u00e1n", "tuvi\u00e9ramos", "t\u00fa", "estuvi\u00e9ramos", "tuviesen", "habido", "hube", "os", "pero", "sentida", "habr\u00e9", "hayan", "otros", "sin", "suyos", "estuviste", "tanto", "tendr\u00eda", "tuvieron", "tuya", "lo", "hubieras", "que", "fueran", "estar\u00edamos", "sobre", "qu\u00e9", "se\u00e1is", "m\u00ed", "haya", "vosotras", "tuvierais", "\u00e9l", "tenidas", "ser\u00edas", "poco", "quien", "m\u00edas", "ti", "esto", "tiene", "hay\u00e1is", "otro", "estar\u00eda", "seremos", "suyas", "como", "ser\u00e9is", "me", "ni", "habr\u00e1", "tu", "algo", "una", "tenemos", "hab\u00edamos", "ten\u00edan", "estuvisteis", "sean", "hubieran", "la", "tuvieran", "tuvo", "soy", "era", "estadas", "estar\u00e1", "mucho", "tendr\u00e1", "estuviera", "fuiste", "fuese", "tendr\u00e9", "estos", "fu\u00e9ramos", "no", "ellas", "cual", "todo", "durante", "para", "est\u00e9", "los", "hemos", "habr\u00e9is", "contra", "habr\u00edas", "fuera", "ten\u00e9is", "estuvo", "con", "habr\u00eda", "cuando", "estad", "las", "estamos", "a", "tendr\u00e1s", "est\u00e1is", "nosotros", "estada", "esa", "tuvieseis", "hubisteis", "tened", "estaremos", "vuestras", "habr\u00edais", "fuesen", "te", "yo", "habr\u00edamos", "hubiesen", "habr\u00e1s", "y", "nosotras", "estuvieses", "tendr\u00e9is", "fueras", "m\u00edos", "vuestra", "estar\u00e9", "quienes", "tengas", "tuvi\u00e9semos", "entre", "mi", "hubiese", "desde", "tuviera", "ser\u00e1", "tendremos", "hubieron", "son", "estuvieseis", "estuvieras", "estando", "has", "tenido", "este", "teng\u00e1is", "muy", "un", "ten\u00eda", "est\u00e9is", "habr\u00edan", "tuve", "fui", "ten\u00edamos", "por", "tuvieras", "tuvieses", "estuvieran", "vuestro", "ser\u00e1n", "tambi\u00e9n", "porque", "nuestro", "ser\u00edan", "estuvimos", "fuimos", "estados", "se", "donde", "nuestra", "hubiera", "fueron", "somos", "est\u00e1n", "habidas", "sentidas", "m\u00edo", "todos", "esta", "fueses", "hayas", "tuvimos", "sois", "hab\u00edais", "algunos", "hubieses", "es", "hab\u00e9is", "de", "ella", "hab\u00edas", "teniendo", "del", "est\u00e1s", "ese", "est\u00e9n", "tus", "otra", "tuviese", "nada", "nuestras", "sus", "habr\u00e1n", "e", "hasta", "fue", "otras", "estuvierais", "est\u00e9s", "esas", "hubiste", "tienen", "ten\u00edas", "uno", "estemos", "tuyo", "tendr\u00edas", "su", "estuviese", "tendr\u00edais", "estaba", "eras", "fuisteis", "habiendo", "tuvisteis", "siente", "estuvieron", "vosotros", "tenidos", "estar\u00edais", "tengamos", "tendr\u00edamos", "hayamos", "hab\u00eda", "tenga", "estar\u00e9is", "estar", "mis", "hay", "tuyos", "sentidos", "ante", "estar\u00edan", "estuviesen", "tendr\u00edan", "habremos", "vuestros", "eso", "tengan", "estabas", "hubimos", "fueseis", "seamos", "hubieseis", "esos", "tenida", "sea", "el", "hubi\u00e9ramos", "en", "habidos", "he", "hubo", "hab\u00edan", "al", "estar\u00edas", "unos", "tuviste", "sintiendo", "ser\u00eda", "tendr\u00e1n", "ten\u00edais", "algunas", "estuve", "s\u00ed", "nuestros", "o", "est\u00e1bamos", "eres", "habida", "nos", "hubi\u00e9semos", "antes", "estaban", "eran", "m\u00e1s", "han", "\u00e9ramos", "estabais", "tuyas", "seas", "les", "ser\u00e1s", "tengo", "ellos", "ser\u00e9", "sentid", "ser\u00edamos", "estas", "muchos", "erais", "estoy", "suyo", "est\u00e1", "tienes", "le", "ser\u00edais", "estado", "m\u00eda", "ya"]

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,16 @@
<?php
namespace TeamTNT\TNTSearch\Support;
abstract class AbstractTokenizer
{
static protected $pattern = '';
public function getPattern()
{
if (empty(static::$pattern)) {
throw new \LogicException("Tokenizer must define split \$pattern value");
} else {
return static::$pattern;
}
}
}

View File

@ -0,0 +1,11 @@
<?php
namespace TeamTNT\TNTSearch\Support;
class BigramTokenizer extends AbstractTokenizer implements TokenizerInterface
{
public function tokenize($text, $stopwords = [])
{
$ngramTokenizer = new NGramTokenizer(2, 2);
return $ngramTokenizer->tokenize($text, $stopwords);
}
}

Some files were not shown because too many files have changed in this diff Show More