David Fraser
davidf at sjsoft.com
Wed Aug 17 11:04:59 EDT 2005
I see you've also asked this on pylucene-dev - Andi Vadja's answer there is good, and that's probably the best case to continue the discussion... (I'm on that list too) Manjeet Chaudhary wrote: > We are working for indexing and searching of Arabic text. > > Brief about our application is : > > 1. It's a web based indexing and searching application. > 2. It should index the Arabic/English text > 3. Searching Arabic/English text. > > PyLucene is best for Indexing using Python. But there is no Arabic > Analyzer available for Python. > Aramorph is a Java based Arabic Analyzer. How can i use > ArabicAnalyzer.jar for my case using Python and Mod_Python. > > I want to pull Java Lucene into Python: PyLucene.I have an > "ArabicAnalyzer.jar" file in Java and i want to use in Python.I have > gone through a documentation of how to convert Java into Python using > "gcj" and "swig" , but i am confused about how to go about. > > So please help me and tell me the exact procedure to follow to pull > that jar file in python. > > Thank you > M Chaudhary > > > David Fraser wrote: > >> We use PyLucene too and its fantastic, but just be aware that if you >> run Apache in multi-threaded mode (e.g. under Windows) there are >> conflicts with the gcj threading (for which there are also >> workarounds, mainly running a separate process for PyLucene) >> >> Julien wrote: >> >>> Why not using PyLucene ? http://pylucene.osafoundation.org/ >>> >>> It's a SWIG python port of Java Lucene, and it works perfeclty with >>> mod_python (we use it every day here at work) >>> >>> On Tue, 2005-08-16 at 15:53 +0530, Manjeet Chaudhary wrote: >>> >>> >>>> Hello All >>>> >>>> I am using mod_python and Lucene for searching of Arabic Text from >>>> an Index created using Lucene+Java. >>>> I am facing some problems in running "java" command using mod_python. >>>> >>>> 1. I am trying to run java program using "os.system" command. >>>> Check = os.system("java ..........") >>>> >>>> 2. The command returns me a value of 256. >>>> i.e Check = 256 >>>> >>>> 3. I am not able to figure it out why that java program is not >>>> being executed properly. >>>> >>>> 4. For the java program i have set the classpath and compiling >>>> and running that program using the classpath. >>>> >>>> export CLASSPATH = >>>> /home/ajinkyan/cgi-bin/infoviewer/Jar_Files/lucene-1.4.3.jar >>>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/PDFBox-0.6.7a.jar >>>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/log4j-1.2.8.jar >>>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/tm-extractors-0.4.jar >>>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/ArabicAnalyzer-1.0b.jar >>>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/commons-collections-3.1.jar >>>> >>>> >>>> javac -classpath $CLASSPATH:. SearchFiles.java >>>> >>>> java -classpath $ClASSPATH:. SearchFiles >>>> >>>> 5. SearchFiles program is running properly if i run it from shell. >>>> >>>> So please help me and tell where i am going wrong. >>>> I have attached the copy of code at the end . >>>> >>>> Thank you >>>> Manjeet Chaudhary >>>> >>>> >>>> >>>> #!/usr/bin/env python >>>> """ >>>> #------------------------------------------------------------------------------------------------------- >>>> >>>> #Name: ar_search.py >>>> #Version: 1.0 >>>> #Purpose: Searching for Infoviewer >>>> #Author: Alok Khandelwal >>>> #Created: 10-08-2005 >>>> #Licence: Infogrid Pacific EULA >>>> #-------------------------------------------------------------------------------------------------------- >>>> >>>> #Revision History >>>> #-------------------------------------------------------------------------------------------------------- >>>> >>>> >>>> #----------------Different modules required are imported >>>> here------------------- >>>> """ >>>> import cgitb,cgi,sys >>>> cgitb.enable() >>>> import os >>>> from os.path import join, abspath >>>> from mod_python import apache,util,Session >>>> """ >>>> #----------------End of importing >>>> modules---------------------------------------- >>>> >>>> >>>> def search(req): >>>> >>>> """This part takes Arabic keyword from req variable and write >>>> that keyword into a temp file. This temp file will be opened by >>>> java program and that program will take that value and search for >>>> the variable in the index file. >>>> Java program will store the result in results.txt file""" >>>> >>>> The_Form = util.FieldStorage(req) >>>> for name in The_Form.keys(): >>>> #--------Checking for the selected Fields and turning those >>>> flag on-------- >>>> if name == 'search_contents': >>>> value=The_Form[name] >>>> value.encode('utf-8') >>>> f=open("/home/ajinkyan/cgi-bin/infoviewer/temp.txt","w") >>>> f.write(value) >>>> f.close() >>>> >>>> try: >>>> command = "export >>>> CLASSPATH=/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/lucene-1.4.3.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/PDFBox-0.6.7a.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/log4j-1.2.8.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/tm-extractors-0.4.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/ArabicAnalyzer-1.0b.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/commons-collections-3.1.jar" >>>> >>>> import os >>>> os.system(command) >>>> command = "java -classpath >>>> $CLASSPATH:/home/ajinkyan/cgi-bin/infoviewer SearchFiles" >>>> check=os.system(command) >>>> req.write("\ncheck = " + str(check)) >>>> f = open( >>>> "/home/ajinkyan/cgi-bin/infoviewer/results.txt","r") >>>> result=f.read() >>>> req.write(result) >>>> except: >>>> >>>> req.write("\n") >>>> req.write(str(sys.exc_type)) >>>> req.write("\n") >>>> req.write(str(sys.exc_value)) >>>> >>>> >>>> >>>> ____________________________________________________________________________________________________________________________ >>>> >>>> >>>> # After executing the os.system command the value returned in check >>>> is maximum times 256 or some thing near 32000 . >>>> >>>> # Java program is running independently without any errors. Only >>>> when it is called using os.system it doesn't run . >>>> >>>> # The Java program is as follows >>>> >>>> ___________________________________________________________________________________________________________________________ >>>> >>>> >>>> >>>> /** >>>> * @Name : SearchFiles.java >>>> * @Version: 1.0 >>>> * @Author : Alok Khandelwal >>>> * @Created: 2 August 2005 >>>> * @Purpose: Searching in index for arabic contents >>>> * @Copyright:(c)2004 by Infogrid Pacific Pte. Ltd. >>>> * @Licenece :Infogrid Pacific Eula >>>> */ >>>> >>>> >>>> >>>> >>>> import org.apache.lucene.search.Searcher; >>>> import org.apache.lucene.search.IndexSearcher; >>>> import org.apache.lucene.search.Query; >>>> import org.apache.lucene.search.Hits; >>>> import org.apache.lucene.queryParser.QueryParser; >>>> import gpl.pierrick.brihaye.aramorph.lucene.ArabicStemAnalyzer; >>>> import java.io.*; >>>> import org.apache.lucene.document.Document; >>>> class SearchFiles >>>> { >>>> public static void main(String[] args) >>>> { >>>> try >>>> { >>>> FileInputStream fis = new FileInputStream(new File ( >>>> "/home/ajinkyan/cgi-bin/infoviewer/temp.txt")); >>>> BufferedReader in = new BufferedReader(new >>>> InputStreamReader(fis,"UTF8")); >>>> char [] buf = new char[80]; >>>> int numRead; >>>> numRead=in.read(buf,0,80); >>>> String field="\""; >>>> for(int k=0;k<numRead;k++) >>>> { >>>> field=field+buf[k]; >>>> } >>>> field=field+"\""; >>>> FileOutputStream fos = new FileOutputStream(new File ( >>>> "/home/ajinkyan/cgi-bin/infoviewer/results.txt")); >>>> BufferedWriter out = new BufferedWriter(new >>>> OutputStreamWriter(fos,"UTF8")); >>>> Searcher searcher = new >>>> IndexSearcher("/home/ajinkyan/cgi-bin/infoviewer/index_arabic"); >>>> try >>>> { >>>> Query query = QueryParser.parse(field, >>>> "contents", new ArabicStemAnalyzer()); >>>> Hits hits = searcher.search(query); >>>> String output="Total match document = \n"; >>>> for(int j=0;j<hits.length();j++) >>>> { >>>> Document doc = hits.doc(j); >>>> // System.out.print(hits.score(j)+" "); >>>> output = output +"\n"+ doc.get("filename"); >>>> } >>>> out.write(output); >>>> out.write("\n"); >>>> } >>>> catch(Exception e) >>>> { >>>> out.write("Parser Error"); >>>> out.write("\n"); >>>> >>>> } >>>> searcher.close(); >>>> out.close(); >>>> } >>>> catch (Exception e) >>>> { >>>> String s= " caught a " + e.getClass() + >>>> "\n with message: " + e.getMessage(); >>>> } >>>> }//end of main >>>> }//end of SerchFiles >>>
|