Lucene 1.3-by-site database for the final index

  Abstract: Lucene-1.3-site database for the final index 

  </ Td> </ tr> <tr> <td height="35" valign="top" class="ArticleTeitle"> <table width="685" border="0"> <tr> <td width = " 440 "> read lnboy Following is written" with the establishment of Monopoly lucene Forum Full Text Search "after writing test code. 

  For the establishment of full text databases cwb.mdb Index indexdb.jsp 

  <% @ Page import = "org.apache.lucene.analysis.standard .*"%> 
  <% @ Page import = "org.apache.lucene.index .*"%> 
  <% @ Page import = "org.apache.lucene.document .*"%> 
  <% @ Page import = "lucene .*"%> 
  <% @ Page contentType = "text / html; GBK charset ="%> 
<%
  Long start = System.currentTimeMillis (); 
  String aa = getServletContext (). GetRealPath ("/")+" index "; 
  IndexWriter writer = new IndexWriter (aa, new StandardAnalyzer (), true); 
  Try ( 
  Class.forName ( "sun.jdbc.odbc.JdbcOdbcDriver"). NewInstance (); </ td> <td width="235"> 
  </ Td> </ tr> </ table> 
  String url = "jdbc: odbc: driver = (Microsoft Access Driver (*. mdb)) 
  ; DBQ = d: \ \ Tomcat 5.0 \ \ webapps \ \ zz3zcwbwebhome \ \ WEB-INF \ \ cwb.mdb "; 
  Connection conn = DriverManager.getConnection (url); 
  Statement stmt = conn.createStatement (); 
  ResultSet rs = stmt.executeQuery ( 
  "Select Article_id, Article_name, Article_intro from Article"); 
  While (rs.next ()) ( 
  Writer.addDocument (mydocument.Document (rs.getString ( "Article_id") 
  Rs.getString ( "Article_name"), rs.getString ( "Article_intro "))); 
  ) 
  Rs.close (); 
  Stmt.close (); 
  Conn.close (); 

  Out.println ( "indexing finished"); 
  Writer.optimize (); 
  Writer.close (); 
  Out.print (System.currentTimeMillis () - start); 
  Out.println ( "total milliseconds"); 

  ) 
  Catch (Exception e) ( 
  Out.println ( "error" e.getClass + () + 
  "\ N error message:" + e.getMessage ()); 
  ) 
%>

  Used to display the result of a query aftsearch.jsp 
  <% @ Page import = "org.apache.lucene.search .*"%> 
  <% @ Page import = "org.apache.lucene.document .*"%> 
  <% @ Page import = "lucene .*"%> 
  <% @ Page import = "org.apache.lucene.analysis.standard .*"%> 
  <% @ Page import = "org.apache.lucene.queryParser.QueryParser"%> 
  <% @ Page contentType = "text / html; GBK charset ="%> 
<%
  String keyword = request.getParameter ( "keyword"); 
  Keyword = new String (keyword.getBytes ( "ISO8859_1")); 
  Out.println (keyword); 
  Try ( 
  String aa = getServletContext (). GetRealPath ("/")+" index "; 
  Searcher searcher = new IndexSearcher (aa); 
  Query query = QueryParser.parse (keyword, "Article_name" new StandardAnalyzer ()); 

  Out.println ( "View is:" + query.toString ( "Article_name") + " 
");
  Hits hits = searcher.search (query); 
  System.out.println (hits.length () + "total matching documents"); 
  Java.text.NumberFormat format = java.text.NumberFormat.getNumberInstance (); 
  For (int i = 0; i <hits.length (); i + +) ( 
  / / Output query results begin 
  Document doc = hits.doc (i); 
  Out.println (doc.get ( "Article_id")); 
  Out.println ( "accuracy:" + format.format (hits.score (i) * 100.0) + "%"); 
  Out.println (doc.get ( "Article_name") + " 
");
  / / Out.println (doc.get ( "Article_intro")); 
  ) 
  ) Catch (Exception e) ( 
  Out.println ( "error" e.getClass + () + "\ n error message:" + e.getMessage ()); 
  ) 
%>

  Supporting categories: 
  Package lucene; 
  Import org.apache.lucene.document.Document; 
  Import org.apache.lucene.document.Field; 
  Import org.apache.lucene.document.DateField; 

  (Public class mydocument 
  Public static Document Document (String Article_id, String Article_name, String Article_intro) ( 
  Document doc = new Document (); 
  Doc.add (Field.Keyword ( "Article_id" Article_id)); 
  Doc.add (Field.Text ( "Article_name" Article_name)); 
  Doc.add (Field.Text ( "Article_intro" Article_intro)); 
  Return doc; 
  ) 
  Public mydocument () ( 
  ) 
  ) 
  </ Td> </ tr> <tr> 

  ↑ Back 

Lucene index page for examples

  Abstract: Lucene index page for examples 

  </ Td> </ tr> <tr> <td height="35" valign="top" class="ArticleTeitle"> <table width = "100%" border = "0" cellspacing = "0" cellpadding = " 0 "> <tr> <td width="269" height="86" align="center" valign="top"> </ td> <td width="415" valign="top"> a key input word lucene.html 

<body>
  <form Name="form1" method="post" action="search.jsp"> 
  Enter keyword: <input type = "text" name = "keyword"> 
  <input Type="submit" name="Submit" value="提交"> 
  </ Form> 
  </ Body> 

  Drawings: 

  </ Td> </ tr> <tr> <td height="20" colspan="2"> 
  Second, the search and displays the results search.jsp <% @ page contentType = "text / html; gb2312 charset ="%> 
  <% @ Page import = "java.util .*"%> 
  <% @ Page import = "java.text.SimpleDateFormat"%> 
  <% @ Page import = "org.apache.lucene.analysis.standard.StandardAnalyzer"%> 
  <% @ Page import = "org.apache.lucene.index.IndexReader"%> 
  <% @ Page import = "org.apache.lucene.document.Document"%> 
  <% @ Page import = "org.apache.lucene.search.IndexSearcher"%> 
  <% @ Page import = "org.apache.lucene.search.Hits"%> 
  <% @ Page import = "org.apache.lucene.search.Query"%> 
  <% @ Page import = "page.Pagination"%> <% @ page import = "org.apache.lucene.queryParser.QueryParser"%> 
  <% @ Page import = "org.apache.lucene.analysis.Analyzer"%> 
<%

  String queryString = request.getParameter ( "keyword"); if (queryString == null | | queryString.length () == 0) (out.println ( "Search keywords can not be empty");) else (= new queryString String (queryString.getBytes ( "ISO8859_1")); String indexPath = getServletContext (). getRealPath ("/")+" index "; boolean error = false; Document doc; IndexSearcher searcher = null; Query query = null; Hits hits = null; try (searcher = new IndexSearcher (IndexReader.open (indexPath));) catch (Exception e) (out.print ( "did not find index files!"); out.print (e.getMessage ()); error = true;) if (error == false) (Analyzer analyzer = new StandardAnalyzer (); query try (= QueryParser.parse (queryString, "Article_name" analyzer);) catch (Exception e) (out.print (e. getMessage ()); error = true;)) if (error == false & searcher! = null) (hits = searcher.search (query) if (hits.length () == 0) (out.print ( " Sorry! did not find the resources you need. "); error = true;)) if (error == false & searcher! = null) (out.print (" Search keywords: "+ + queryString" "); / / Pagination is the downloading, and we need to pass along a vector, you can change, and this is not done two times Vector list = new Vector (); for (int i = 0; i <hits.length (); i + +) (doc = hits.doc (i); list.add (doc);) out.print ( "find resources"); Pagination pagination = null; String pageNumber = request.getParameter ( "pageNumber"); int showItemNumber = 10; if (pageNumber == null) (pageNumber = "1";) String HTML = ""; if (list! = null & list.size ()> 0) (pagination = new Pagination (); pagination.setPageNumber (Integer.parseInt (pageNumber)); pagination.setShowItemNumber (showItemNumber); pagination.setVisitPageURL ( "search.jsp? keyword =" + queryString); list = (Vector) pagination.interceptListByStarItemNumber (list) for (int i = 0; i <list . size (); i + +) (doc = (Document) list.get (i); String A_id = doc.get ( "Article_id"); String doctitle = doc.get ( "Article_name"); String url = doc.get ( "File_name ")+"? id =" + A_id; out.print ( "<a href = 'http://127.0.0.1:8080/cwbwebhome/" + url +"'>( ★) "+ + doctitle" " );) = pagination.buildHTML HTML ( "600"); out.print (HTML);))) 

  Figure effect: 

  Thirdly, a paging class Pagination.java (download) 

  </ Td> </ tr> </ table> </ td> </ tr> <tr> 

  ↑ Back 

Acquaint Lucene

  Abstract: Acquaint Lucene 

  </ Td> </ tr> <tr> <td height="35" valign="top" class="ArticleTeitle"> <table width = "100%" border = "0" cellspacing = "0" cellpadding = " 0 "> <tr> <td height="69" align="left" valign="top"> </ td> </ tr> </ table> Lucene Profile 
  Lucene is a Java-based text information retrieval tool kits, it is not a complete search applications, but for your application provides indexing and search functions.    Apache Jakarta Lucene is currently a family of open source projects.    Is currently the most popular open-source Java-based full-text retrieval tool kit. 

  Now there are many applications the search function is based on Lucene, such as the Eclipse help system search function.    Lucene can text types of data indexing, so you can as long as you want to index data into text format, you can Lucene document indexing and search.    For example, you want some HTML documents, PDF files indexed so you first need to HTML documents and PDF files into text format, and then the content will be transformed to Lucene indexing, and then create good index file is saved to disk or memory, according to the final user input for the conditions in the index files, as enquiries.    Index does not specify the format of the document that Lucene can also apply to almost all the search application. 

  Figure 1 that the search application procedures and the relationship between the Lucene, also reflects the use of Lucene search Construction of the application process: 


  Figure 1. Search application procedures and the relationship between the Lucene 

  Indexing and search index is the core of modern search engine, indexing process is the source of data processing into very convenient for document indexing process.    Why is such an important index, imagine you are in a large number of files to search for documents containing certain keywords, so if we do not index then you do need to read these documents to the order of memory, and check this article is not supposed to find words, so will spend a lot of time, but think about search engines in the millisecond time to find out the results of the search.    This is the creation of the reasons for the index, you can imagine the index into such a data structure, he can make you fast random access storage in the index of keywords, then find the keywords associated documents.    Lucene is used as a reverse Index (inverted index) mechanism.    Reverse index means that we maintain a word / phrase table, the table of each word / phrase, there is a linked list describes the document which contains the word / phrase.    For user input in such conditions, can be very fast search results.    We will be in this series of articles detailed the second part of Lucene indexing mechanism, as Lucene provides a simple and easy-to-use API, even if the reader has just begun on the full-text indexing mechanism do not quite understand, can be very easy to use Lucene documents you realize Index. 

  On the establishment of indexed documents, these can be indexed in a search of the above.    First of all search engines will search keyword analysis, and then in the establishment of good index above to find, and eventually returned to user input keywords associated documents. 

  Lucene analysis package 

  Lucene form of the release of the package is a JAR file, Next, we analyze the JAR file inside the main JAVA kits, so that the readers have a preliminary understanding. 

  Package: org.apache.lucene.document 

  This package provides some index for the package to the documents required by the category, such as the Document, and Field.    In this way, each document was eventually Packaging has become a Document object. 

  Package: org.apache.lucene.analysis 

  This packet main function is to a document segmentation, as in the establishment of indexing documents must be carried out before the words, the role of this package can be regarded as a preparation for the establishment of indexing work. 

  Package: org.apache.lucene.index 

  This package provides some type to help create good index, as well as on the creation of index updates.    There are the basis of two categories: IndexWriter and IndexReader, IndexWriter is used to create indexes and add documents to the index, IndexReader is used to delete files in the index. 

  Package: org.apache.lucene.search 

  This package provides a good index, in the establishment of the required search category.    For example IndexSearcher and Hits, IndexSearcher specified in the definition of the index on the search methods used to preserve Search Hits the results. 

  A simple search application 

  We assume that the computer directory contains a lot of text documents, we need to find a document which contains the word.    In order to achieve this function, we first use of this directory Lucene document indexing, and then in the establishment of good search index we want to find documents.    Through this example of how to use Lucene readers will build its own search applications have a clear understanding. 

  Index 

  In order to document indexing, Lucene provides the basis of the five categories, they are Document, and Field, IndexWriter, Analyzer, Directory.    Next, we were introduced to the use of these five categories: 

  Document 

  Document is used to describe a document, a document here that can be an HTML page, an e-mail or a text file.    Document Object from an object composed of a number of Field.    Document object can be an imagination into a record in the database, each record is the Field object field. 

  Field 

  Field object is used to describe a particular attribute of a document, such as an e-mail the title and content can be targeted by two Field described. 

  Analyzer 

  In a document to be indexed before, first of all, the need to document the contents of the word processing, which is part of the work done by the Analyzer.    Analyzer class is an abstract class, it has a number of achieving.    Different languages and applications need to select suitable Analyzer.    Segmentation Analyzer to the IndexWriter to create content to the index. 

  IndexWriter 

  IndexWriter Lucene is used to create a core index of categories, and his role is one of the Document Object added to the index. 

  Directory 

  This class represents the Lucene indexing storage location, it is an abstract class, which currently has two realization is the first FSDirectory, it said that one stored in the file system in the location of the index.    The second is RAMDirectory, it said that one of them stored in the memory of the index position. 

  Familiar with the indexing needs of these categories, we started on a directory indexing text files, a list is given to a directory indexing text files of source code. 


  List 1. Text file index 
  <table CellSpacing=0 cellPadding=5 width="100%" bgColor=#eeeeee border=1> <tr> <td> 

  Import java.io.File; import java.io.FileReader; import java.io.Reader; import java.util.Date; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard. StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; public class TxtFileIndexer (public static void main (String [] args) throws Exception (File indexDir = new File ( "D: \ \ luceneIndex"); / / directory stored Index File dataDir = new File ( "D: \ \ luceneData"); Analyzer luceneAnalyzer = new StandardAnalyzer (); File [] = dataFiles dataDir.listFiles (); IndexWriter indexWriter = new IndexWriter (indexDir, luceneAnalyzer, true); long startTime = new Date (). getTime (); for (int i = 0; i <dataFiles.length; i + +) (if (dataFiles [i]. isFile () & & dataFiles [i]. getName (). endsWith (. "txt")) (System.out.println ( "Indexing file" [i] + dataFiles. getCanonicalPath ()); Document document = new Document (); Reader txtReader = new FileReader (dataFiles [i]); document.add (Field.Text ( "path" dataFiles [i]. getCanonicalPath ())); document.add (Field.Text ( "contents "txtReader)); indexWriter.addDocument (document);)) indexWriter.optimize (); indexWriter.close (); long endTime = new Date (). getTime (); System.out.println (" It takes the "+ (endTime - startTime) + "milliseconds to create index for the files in directory" + dataDir.getPath ());)) 

  </ Td> </ tr> </ table> 
  1 in the list, we note that the structure of IndexWriter function requires three parameters, one of the parameters specified by the creation of an index to the location of storage, he can be a File object, it can also be a target or a FSDirectory RAMDirectory object.    The second argument specifies the type of a realization Analyzer, which is designated by the index which is the word of the content of the text block segmentation.    The third parameter is a Boolean variable, and if it is true it represents the creation of a new index for the false words in the original index on behalf on the basis of operation.    Then the directory traversal procedures of all text files, and text files for each of the creation of a Document object.    And the text file two properties: path and content added to the two Field object, and then these two objects into the Document Field object, the last of the documents by this type of IndexWriter add methods into index to.    In this way we will have completed the creation of the index.    Next we enter in the establishment of good index on the part of a search. 

  Document Search 

  Use Lucene search index as the same is also very convenient.    In the above part, we have a directory for the establishment of good text documents indexed, and now we will be in the index to search in order to find a keyword or phrase containing the documents.    Lucene to provide the basis of a few categories to complete this process, they are then IndexSearcher, Term, Query, TermQuery, Hits. Below we introduced the function of these categories. 

  Query 

  This is an abstract class, he has more than achieved, for example TermQuery, BooleanQuery, PrefixQuery. Purpose of this class is to the user input query strings can be packaged as Lucene recognition Query. 

  Term 

  Term search is the basic unit of a Term String object has two types of domains.    Term object can generate a following to complete a sentence: Term term = new Term ( "fieldName", "queryWord"), which represents the first parameter to the documents on which Field View, and the second parameter representative To find the words. 

  TermQuery 

  Query TermQuery is an abstract class of a sub-category, it is also the most basic Lucene support of a search category.    TermQuery object generates a completed by the following statement: TermQuery termQuery = new TermQuery (new Term ( "fieldName", "queryWord")); its constructor function only accept a parameter, which is a Term object. 

  IndexSearcher 

  IndexSearcher is used in the establishment of good indexed on the search.    It can only way to open a CD-indexing, can have multiple examples in a IndexSearcher Index to operate. 

  Hits 

  Hits is used to preserve the results of the search. 

  Introduction to the End of these search after the class, we have begun before the establishment of the index on the search, list 2 is given by the need to complete the search function code. 

  List 2: Index in the establishment of good conduct search 

  Package TestLucene; import java.io.File; import org.apache.lucene.document.Document; import org.apache.lucene.index.Term; import org.apache.lucene.search.Hits; import org.apache.lucene. search.IndexSearcher; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.FSDirectory; public class TxtFileSearcher (public static void main (String [] args) throws Exception (String queryStr = "lucene"; File indexDir = new File ( "D: \ \ luceneIndex"); FSDirectory directory = FSDirectory.getDirectory (indexDir, false); IndexSearcher searcher = new IndexSearcher (directory); if (! indexDir.exists ()) (System.out. println ( "The Lucene index is not exist"); return;) = new Term Term term ( "contents" queryStr.toLowerCase ()); TermQuery luceneQuery = new TermQuery (term); Hits hits = searcher.search (luceneQuery) ; for (int i = 0; i <hits.length (); i + +) (Document document = hits.doc (i); System.out.println ( "File:" + document.get ( "path")); ))) 

  2 in the list, type IndexSearcher constructor to accept a type of object Directory, Directory is an abstract class, which currently has two sub-categories: FSDirctory and RAMDirectory. Our procedures in the introduction of a FSDirctory object as its parameter, the representative stored in a disk indexing position.    Upon completion of the implementation of structural function, on behalf of this CD-IndexSearcher to open the way an index.    Then we construct a process Term object, this Term, and we designated to search the contents of documents containing keywords "lucene" documents.    Then use this Term TermQuery constructed object and the object of this TermQuery IndexSearcher object to the introduction of the method in the search query, the return of Save Hits object.    Finally, we used a loop to search the path to the file are out of print.    Well, our search completed application has been developed, how kind, using Lucene search application development process is not very simple. 

  Aggregate 

  In this paper, the Lucene some of the basic concepts, and then developed a demo application using Lucene index and the index on the search process.    Hope that this study can help readers Lucene. 

  </ Td> </ tr> <tr> 

  ↑ Back 

Lucene problems

  Abstract: The problem lucene 


  I downloaded 2.0 lucene running lucene onboard luceneweb example, the following errors, under the guidance of Daxia 



  Exception report 

  Message 

  Description The server encountered an internal error () that prevented it from fulfilling this request. 

  Exception 

  Org.apache.jasper.JasperException: Unable to compile class for JSP 

  An error occurred at line: 60 in the jsp file: / results.jsp 
  Generated servlet error: 
  D: / Tomcat / work / Catalina / localhost / luceneweb / org / apache / jsp / results_jsp.java: 169: parse (java.lang.String) in org.apache.lucene.queryParser.QueryParser cannot be applied to (java. lang.String, java.lang.String, org.apache.lucene.analysis.Analyzer) 
  Query = QueryParser.parse (queryString, "contents" analyzer); / / parse the 
^


  An error occurred at line: 60 in the jsp file: / results.jsp 
  Generated servlet error: 
  Note: D: / Tomcat / work / Catalina / localhost / luceneweb / org / apache / jsp / results_jsp.java uses or overrides a deprecated API. 
  Note: Recompile with-deprecation for details. 
  1 error 



  Org.apache.jasper.compiler.DefaultErrorHandler.javacError (DefaultErrorHandler.java: 84) 
  Org.apache.jasper.compiler.ErrorDispatcher.javacError (ErrorDispatcher.java: 332) 
  Org.apache.jasper.compiler.Compiler.generateClass (Compiler.java: 412) 
  Org.apache.jasper.compiler.Compiler.compile (Compiler.java: 472) 
  Org.apache.jasper.compiler.Compiler.compile (Compiler.java: 451) 
  Org.apache.jasper.compiler.Compiler.compile (Compiler.java: 439) 
  Org.apache.jasper.JspCompilationContext.compile (JspCompilationContext.java: 511) 
  Org.apache.jasper.servlet.JspServletWrapper.service (JspServletWrapper.java: 295) 
  Org.apache.jasper.servlet.JspServlet.serviceJspFile (JspServlet.java: 292) 
  Org.apache.jasper.servlet.JspServlet.service (JspServlet.java: 236) 
  Javax.servlet.http.HttpServlet.service (HttpServlet.java: 802) 

  Note The full stack trace of the root cause is available in the Apache Tomcat/5.0.28 logs. 
  Apache Tomcat/5.0.28 



  Wrong is not very clear?    QueryParser.parse () This method String there can be only one type of parameters, while the three-get. 


  Query = new QueryParser ( "contents" analyzer). Parse (queryString); 

  ↑ Back 

Please see me doing movies with Lucene search engine

  Abstract: Please see me doing movies with Lucene search engine 


  Advertising is not posted, and we mainly want to discuss technical. 
  Address http://search.mdbchina.com 

  Core Lucene 2.0, Chinese word segmentation is my own do, segmentation algorithm is my own unique, phonetics search, with Traditional Chinese search, correct typos, search recommendations, and related search function together with related search keywords . 

  I am pursuing the word about some major categories 
  ChineseAnalyzer: No Lucene file, um, I was out of the heap 
  ChineseTokenizer: No Lucene file, um, I whole-word - 
  ChineseTokenizerConstants: an easily seen 
  ChineseTokenizerTokenManager: StandardTokenizerTokenManager "patch" 
  ChineseSimplificationFilter: Traditional filters to Simplified 

  I did not use the accepted mode, namely: Chinese word segmentation ->****** Analyzer, I chose a more bottom, innovative my own ChineseAnalyzer please Senator trial. 



  Up 



  Doing a good job in very fast speed 



  TU Yes, some Bangding 



  Very good! 



  Yes, but from a purely business perspective, it seems that there is no need to search phonetics than add years, actors, directors, film, the type of options 



  Pinyin search is also very important actor mouthful very common name, this time on the importation of phonetics that he can position 



  Years, actors, directors, films, these types have been included in the search, the search engine will automatically determine if input on the 2006 list of the films released in 2006, if the input beam master actor on the list of films, if input action films on the list all action films, directors, distributors are similar 



  Yes ah Bangding boost 



  Http://jf.jf.cn 



 å¥½æƒ³can read about the landlord CODE! 



  Ting strong! 




  ChineseSimplificationFilter: Traditional filters to Simplified 

  LZ ask, how to judge character is "Simplified" or "traditional"?? 


—————–
  Www.ruansou.com small section to search engine 



  A dictionary, it probably more than 2,000 words 



  , And to encourage 



  Liuguangshui@163.com 

  Thank you for giving me a pull! 


  ↑ Back 

Lucene search results, the "Summary" section of the beginning of the article are the words, how the "Summary" display keyword around the text? To有分top!

  Abstract: lucene search results, the "Summary" section of the beginning of the article are the words, how the "Summary" display keyword around the text? To有分top! 


  Rt 



UP



  Meirenhuida Why? 
  Nobody has a difficult problem?    Or too simple disdain answer? 



  Dual Meihuaguo, sorry 



  Two methods: (assuming jsp do) 
  In the background, a method of dealing with the articles into String string, and then use string processing API, keyword positioning, the location of the string (method is probably indexOf () category, and not the details), and then returned to the The location near the string (length can be specified subSting () methods such as bar) output can disadvantage is that the spent some background processing resources to increase pressure on the server 

  Method 2 in the pages dealing with the entire article conveyed on the page, javaScript use of the approach and methods of string almost a specified string interception of the process output disadvantage is that the whole article output to the browser side, the increase Web transmission capacity 

  Younger brother think basically it is that the two methods 




  Thank you upstairs, but the use of such methods may be too much waste of resources. 
  I would like to know, we used lucene What is the situation? 
  Query results in the summary article is the beginning of the text or keywords around a paragraph? 
  Either "yes" or "no", please return a message ah! 



  Up 



  Is the keyword around the course, and keyword frequency is the largest section of 



  To journay (When you gaze at night, the night is also a deep staring you 
=============
  Hello, first I would like to thank you replies, 
  I used lucene1.4 lucene-2.0 and have final test, a summary of the search results are the beginning of the section of the article, I do not know where the problem, and I has been carried out for a few days, or can not find a solution. 
  I will help you ah, the guidance to help me. 
  Thank you!    Thank you!    Thank you! 



  Ding, up, the roof! 



  Ding! 



  Up 



  Lucene search by the first section of the beginning of the article so .. then it is the keyword around .. in this area seems to be a class in the configuration parameters can be set in the size interval Volume 



  I do not know how, to points? 



  You can achieve that function, lucene1.4.3 not support that! 



  To 
  Ruanjiantaotao (Taotao) () Blue: 100 
=======
  Which version support? 
  I also used lucene2.0 test, or the beginning of a paragraph article 


  ↑ Back 

Will Lucene the Highlighter is not support multiple keywords ah?

  Abstract: I would like to ask the Highlighter is Lucene not support multiple keywords ah? 


  Rt 
  I use the keywords highlighted Highlighter achieve, if a single keyword search Yes, if the importation of a number of input keywords such as "Internet today," such words, that is, if the input keywords in the box, on the error. 
  Is the procedure I was wrong, or do not support their own Highlighter many keywords highlighted? 



  Up 



  According to Ran halo Highlighter I do not know. 

  Highlight keywords, I use the method: 
  . Split ( ""); 
  . ReplaceAll (); 



  Key words to use when Highlighter search multiple use spaces between keywords? 




  To cocoysy (forgotten Aizelashi) () 
  Key words to use when Highlighter search multiple use spaces between keywords? 
==========
  I use spaces between keywords, tips mistakes, I think it is not supported Highlighter space? 





  Iwlk (6th century) () Blue: 97 Blog 
  Highlight keywords, I use the method: 
  . Split ( ""); 
  . ReplaceAll (); 
===========
  If this method can also be used, but I think there is a malpractice is a summary of the problem. Lucene default only to return to the summary section of the beginning of the article, and if found to be the beginning of the article, a paragraph no words will be lost highlighted the role. 
  This is only my personal understanding. 






  Iwlk (6th century) () 
=========
  You split ( ""); gave me great inspiration, thank you! 
  Guitar posts! 


  ↑ Back 

Lucene is a full-text search of the API

  Lucene is a full-text search API, introduced its articles and examples of their application are, and this may refer to lucene references. 
  The study will be skills-oriented, first, simple application, and the second is Web applications, and the third is Han, the four applications (on the home page in the SandBox Lucene). 

  0, preparations for the Home Lucene to download the current stable version lucene-1.2.tar.gz, decompress, lucene-1.2 directory of the two jar files lucene-1.2.jar and lucene-demo2 put-1.2.jar appropriate directory, and its accession to the CLASSPATH environment variable.    Tar zxvf lucene-1.2.tar.gz <—- decompress 
  Cd lucene-1.2 
  Cp *. jar $ DP    <- The directory storage jar, according to specific job requirements with the actual directory replacement 
  CLASSPATH = $ CLASSPATH: $ DP/lucene-1.2.jar: $ DP/lucene-demos-1.2.jar; export CLASSPATH  Winclasspath.gif   If we do not want in each log, you can edit / etc / profile or in their own directory. Profile will be added to the last line above the last line of the paper.    Windows settings, right-click the desktop, "My Computer" for the "High" -> "environment variable" -> selected CLASSPATH-> "Edit" in the input box to add the whole jar file path name, attention separator is the semicolon (;).    See Right. 

  1, running demo 
  $ Java org.apache.lucene.demo.IndexFiles / usr/local/man/man1 /    <- Man document indexing 
  Adding / usr/local/man/man1/mysql.1 
………..
  Adding / usr/local/man/man1/cvs.1 
  1614 total milliseconds 
  $ Java org.apache.lucene.demo.SearchFiles    <- Retrieval 
  Query: password 
  Searching for: password 
  7 total matching documents 
  0. / Usr/local/man/man1/mysql.1 
……
  6. / Usr/local/man/man1/mysqlshow.1 
  Query: OK! Lucene demo since the question of the success of this operation the main demo procedure call API functions: / * Index on the main function * / 
  File file = new File (argv []); 
  IndexWriter writer = new IndexWriter ( "index" new StandardAnalyzer (), true); 

  Document doc = new Document (); 
  Doc.add (Field.Text ( "path" file.getPath ())); 
  Doc.add (Field.Keyword (the "modified" DateField.timeToString (file.lastModified ()))); 
  FileInputStream is = new FileInputStream (f); 
  Reader reader = new BufferedReader (new InputStreamReader (is)); 
  Doc.add (Field.Text ( "contents", reader)); 

  Writer.addDocument (doc); 

  Writer.optimize (); 
  Writer.close (); 

  / * Search on the main function * / 
  Searcher searcher = new IndexSearcher ( "index"); 
  Analyzer analyzer = new StandardAnalyzer (); 
  Query query = QueryParser.parse (lineforsearch, "contents" analyzer); 
  Hits hits = searcher.search (query); 
  For (int i = start; i <hits.length (); i + +) ( 
  Document doc = hits.doc (i); 
  String path = doc.get ( "path"); 
  System.out.println (i + "." + Path); 
  ) 


  3, run LuceneWeb 
  Tomcat assumed $ TOMCATHOME installed in the directory, and specific applications using real $ TOMCATHOME replacement directory.    Cd $ TOMCATHOME / webapps 
  Mkdir lucenedb 
  Cd lucenedb 
  Java org.apache.lucene.demo.IndexHTML-create-index $ TOMCAT / webapps / lucenedb .. / examples    <- With a relative path to a specified "..", indexed file, and the Index to be used to show the URL of the paper, because the retrieval process in the jsp luceneweb subdirectory under. Examples available of the other real - directory name to replace 
  Cd .. 
  Cp ~ / lucene-1.2/luceneweb.war.    <- Luceneweb. War in your decompress generated lucene-1.2 directory 
  .. / Bin / shudown.sh 
  .. / Bin / startup.sh 
  Then visit http://yourdomain.com:8080/luceneweb client, if there should be a smooth browser shown in the right content.    . Server then cd luceneweb 
  Vi configuration.jsp    <- IndexLocation the value of the words "$ TOMCATHOME / webapps / lucenedb"; 
  Cd .. 
  Jar-ur luceneweb.war luceneweb then to the client, just refresh the page and then can enter a word retrieval.    Unfortunately, this is the only English word retrieval.    And if hit html page title is Chinese characters, shows that there are also problems.    Figure. 

  IndexHTML here, we can htm, html and txt type of document indexing, which is a HTMLParser, in addition to the previous one is basically the same. 

Inside Lucene / ultra-popular search engine learning (0) - Sequence

  In a following section 

  Abstract sequence 

  Super Girl ended, the "China Search" began to follow suit, running from the first session of the network "super-found" competition. Xianbushui have technical content, a Web site has been "re-IT bubble, found in the distribution of automotive" for the first time that we had a surprise. Europe still remember a few years ago, received 10,000 yuan IT prize money of ecstasy Now, the IT wave is money everywhere, such as the seizure of course, the rich and powerful search engine veteran with car When the prizes. more sources said that the recent fire on the concept of fire in the "search engines". 

  In fact, some people may not be dollar is not a good thing, at least this matter in the search engine's popularity among Internet users has been very high degree. Audience increase the quantity and quality will be a personalized service and innovative applications, which of course is a good thing to users , ITer, the more I eat, and the Lucene …… he has "a forest of laughter." 

  In the use of Lucene, he will be more or less functional though strong, but after all the needs of most people to do a high generalization, often in some detail people feel embarrassed. Sometimes find himself completely satisfied with the way , and do not want to start from the structure. this is due to the use of the comfortable, and the second is not done a thorough grasp of the Lucene. ITer really bother people sometimes sad, with a number of Lucene to be happy for myself to be a Google out. was not as simple as a search found "Lucene" Manping is "how to use / FAQ", a bit meaningless but also the Lucene manuals, I said Lucene in Action (see Title tell it is the use of guide) excerpt translation. look at this article really like to see the kindergarten children to discuss the war in Iraq, is also sorry to interrupt the boredom. most interesting is with a full-page on what were called inverted index articles for never read with the IT index of the students. Google to do, not impossible, but this level is not enough. 

  Just as people can not understand how the brain, can not be expected to deal with search engine Google is the answer. Lucene I put the source code up numerous times, at least somewhat understand. Taking advantage of its own recall of these, they remember a little bit, so the past few days Thinking sports dream-like yellow sorghum have no attribution. hope that this series of successful (even virgins). 
  If you are Lucene users / developers, and equally interested in the realization of Search Engine (not to discuss the principle, it is too generalized), please contact me, I look forward to receiving any help / correction. 

  Keyword Lucene search engine indexes achieve 

  Valuable resources: 

  •   If not used Lucene point here - briefly use Lucene (car East) 
  •   Download / Documentation - Lucene Home 
  •   Advanced Topics - Tuning Lucene 

  In a following section 

Lucene ultra-fast search engine portal

  The latest version of lucene easily support the Chinese, can easily sort 

  Download the latest lucene-1.4-final-src.tar.gz, untied there are docs, lib, such as src directory lib under the original compiler code-3.8.1.jar junit need, we can use to ant1.6 build, and can also choose to use eclipse to build. 

  In the src a demo package, read the docs directory and SearchFiles IndexFiles documents that the two categories were used to build text index, and the search query. 

  IndexFiles need Chuan-jin parameters: build index was absolute path to a directory (the directory of all documents will be traversing build index).    SearchFiles does not require any parameters can be directly-run operation will be prompted to enter your query things Query:.    Build the index generated documents will be the default src the index directory. 

  Demo bag only two very simple demo, there is also a classic demo is IndexHtml such complicated, and interested friends careful study.    Below talked about how to sort. 

  Lucene in the google search inside, and almost always find a car on the East lucene Chinese students handled erupted, but also referred to a man named CJKAnalyzer (chinese, Japanese, Korea) category, the Analyzer is used to analyze as Chinese, Japanese , Korean This two-byte characters, the solution was also included in the project lucene jakarta, Hei hei East students such vehicles can be famous. 

  Very puzzled as is the current download the original code inside to find no fundamental CJKAnalyzer said earlier this Dongdong, any event, that lucene with doubts on the official website of the room, Show changes.txt, long look at the 1.3 ~ 1.4 This version of a period of escalation of the situation it carefully Show, has actually found that the default version 1.4 can support the Chinese (in changes.txt quickly find CTRL + F chinese input can be rapid positioning to the bank note).    Again look at the sort that can be arbitrarily?    But let himself look at the javadoc Search # search, read javadoc, a small experiment, really!    !    Can also sort through multiple index to all sort function not only applied to content stored on the index. 

  Sort: the need for new Sort Sort sort = ( "username", true); 

  Bother to write, to see their own specific documents it Search 

  Note: lucene application with a simple, it has already been ported to. Net, by reading the text of hope that we can get started quickly lucene, can build their own collection of documents, and easy access, but also the feel so cool 

keep looking »