Manjeet Chaudhary
manjeet at infogridpacific.com
Wed Aug 17 07:09:40 EDT 2005
We are working for indexing and searching of Arabic text. Brief about our application is : 1. It's a web based indexing and searching application. 2. It should index the Arabic/English text 3. Searching Arabic/English text. PyLucene is best for Indexing using Python. But there is no Arabic Analyzer available for Python. Aramorph is a Java based Arabic Analyzer. How can i use ArabicAnalyzer.jar for my case using Python and Mod_Python. I want to pull Java Lucene into Python: PyLucene.I have an "ArabicAnalyzer.jar" file in Java and i want to use in Python.I have gone through a documentation of how to convert Java into Python using "gcj" and "swig" , but i am confused about how to go about. So please help me and tell me the exact procedure to follow to pull that jar file in python. Thank you M Chaudhary David Fraser wrote: > We use PyLucene too and its fantastic, but just be aware that if you > run Apache in multi-threaded mode (e.g. under Windows) there are > conflicts with the gcj threading (for which there are also > workarounds, mainly running a separate process for PyLucene) > > Julien wrote: > >> Why not using PyLucene ? http://pylucene.osafoundation.org/ >> >> It's a SWIG python port of Java Lucene, and it works perfeclty with >> mod_python (we use it every day here at work) >> >> On Tue, 2005-08-16 at 15:53 +0530, Manjeet Chaudhary wrote: >> >> >>> Hello All >>> >>> I am using mod_python and Lucene for searching of Arabic Text from >>> an Index created using Lucene+Java. >>> I am facing some problems in running "java" command using mod_python. >>> >>> 1. I am trying to run java program using "os.system" command. >>> Check = os.system("java ..........") >>> >>> 2. The command returns me a value of 256. >>> i.e Check = 256 >>> >>> 3. I am not able to figure it out why that java program is not >>> being executed properly. >>> >>> 4. For the java program i have set the classpath and compiling and >>> running that program using the classpath. >>> >>> export CLASSPATH = >>> /home/ajinkyan/cgi-bin/infoviewer/Jar_Files/lucene-1.4.3.jar >>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/PDFBox-0.6.7a.jar >>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/log4j-1.2.8.jar >>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/tm-extractors-0.4.jar >>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/ArabicAnalyzer-1.0b.jar >>> :/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/commons-collections-3.1.jar >>> >>> >>> javac -classpath $CLASSPATH:. SearchFiles.java >>> >>> java -classpath $ClASSPATH:. SearchFiles >>> >>> 5. SearchFiles program is running properly if i run it from shell. >>> >>> So please help me and tell where i am going wrong. >>> I have attached the copy of code at the end . >>> >>> Thank you >>> Manjeet Chaudhary >>> >>> >>> >>> #!/usr/bin/env python >>> """ >>> #------------------------------------------------------------------------------------------------------- >>> >>> #Name: ar_search.py >>> #Version: 1.0 >>> #Purpose: Searching for Infoviewer >>> #Author: Alok Khandelwal >>> #Created: 10-08-2005 >>> #Licence: Infogrid Pacific EULA >>> #-------------------------------------------------------------------------------------------------------- >>> >>> #Revision History >>> #-------------------------------------------------------------------------------------------------------- >>> >>> >>> #----------------Different modules required are imported >>> here------------------- >>> """ >>> import cgitb,cgi,sys >>> cgitb.enable() >>> import os >>> from os.path import join, abspath >>> from mod_python import apache,util,Session >>> """ >>> #----------------End of importing >>> modules---------------------------------------- >>> >>> >>> def search(req): >>> >>> """This part takes Arabic keyword from req variable and write >>> that keyword into a temp file. This temp file will be opened by java >>> program and that program will take that value and search for the >>> variable in the index file. >>> Java program will store the result in results.txt file""" >>> >>> The_Form = util.FieldStorage(req) >>> for name in The_Form.keys(): >>> #--------Checking for the selected Fields and turning those >>> flag on-------- >>> if name == 'search_contents': >>> value=The_Form[name] >>> value.encode('utf-8') >>> f=open("/home/ajinkyan/cgi-bin/infoviewer/temp.txt","w") >>> f.write(value) >>> f.close() >>> >>> try: >>> command = "export >>> CLASSPATH=/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/lucene-1.4.3.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/PDFBox-0.6.7a.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/log4j-1.2.8.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/tm-extractors-0.4.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/ArabicAnalyzer-1.0b.jar:/home/ajinkyan/cgi-bin/infoviewer/Jar_Files/commons-collections-3.1.jar" >>> >>> import os >>> os.system(command) >>> command = "java -classpath >>> $CLASSPATH:/home/ajinkyan/cgi-bin/infoviewer SearchFiles" >>> check=os.system(command) >>> req.write("\ncheck = " + str(check)) >>> f = open( >>> "/home/ajinkyan/cgi-bin/infoviewer/results.txt","r") >>> result=f.read() >>> req.write(result) >>> except: >>> >>> req.write("\n") >>> req.write(str(sys.exc_type)) >>> req.write("\n") >>> req.write(str(sys.exc_value)) >>> >>> >>> >>> ____________________________________________________________________________________________________________________________ >>> >>> >>> # After executing the os.system command the value returned in check >>> is maximum times 256 or some thing near 32000 . >>> >>> # Java program is running independently without any errors. Only >>> when it is called using os.system it doesn't run . >>> >>> # The Java program is as follows >>> >>> ___________________________________________________________________________________________________________________________ >>> >>> >>> >>> /** >>> * @Name : SearchFiles.java >>> * @Version: 1.0 >>> * @Author : Alok Khandelwal >>> * @Created: 2 August 2005 >>> * @Purpose: Searching in index for arabic contents >>> * @Copyright:(c)2004 by Infogrid Pacific Pte. Ltd. >>> * @Licenece :Infogrid Pacific Eula >>> */ >>> >>> >>> >>> >>> import org.apache.lucene.search.Searcher; >>> import org.apache.lucene.search.IndexSearcher; >>> import org.apache.lucene.search.Query; >>> import org.apache.lucene.search.Hits; >>> import org.apache.lucene.queryParser.QueryParser; >>> import gpl.pierrick.brihaye.aramorph.lucene.ArabicStemAnalyzer; >>> import java.io.*; >>> import org.apache.lucene.document.Document; >>> class SearchFiles >>> { >>> public static void main(String[] args) >>> { >>> try >>> { >>> FileInputStream fis = new FileInputStream(new File ( >>> "/home/ajinkyan/cgi-bin/infoviewer/temp.txt")); >>> BufferedReader in = new BufferedReader(new >>> InputStreamReader(fis,"UTF8")); >>> char [] buf = new char[80]; >>> int numRead; >>> numRead=in.read(buf,0,80); >>> String field="\""; >>> for(int k=0;k<numRead;k++) >>> { >>> field=field+buf[k]; >>> } >>> field=field+"\""; >>> FileOutputStream fos = new FileOutputStream(new File ( >>> "/home/ajinkyan/cgi-bin/infoviewer/results.txt")); >>> BufferedWriter out = new BufferedWriter(new >>> OutputStreamWriter(fos,"UTF8")); >>> Searcher searcher = new >>> IndexSearcher("/home/ajinkyan/cgi-bin/infoviewer/index_arabic"); >>> try >>> { >>> Query query = QueryParser.parse(field, >>> "contents", new ArabicStemAnalyzer()); >>> Hits hits = searcher.search(query); >>> String output="Total match document = \n"; >>> for(int j=0;j<hits.length();j++) >>> { >>> Document doc = hits.doc(j); >>> // System.out.print(hits.score(j)+" "); >>> output = output +"\n"+ doc.get("filename"); >>> } >>> out.write(output); >>> out.write("\n"); >>> } >>> catch(Exception e) >>> { >>> out.write("Parser Error"); >>> out.write("\n"); >>> >>> } >>> searcher.close(); >>> out.close(); >>> } >>> catch (Exception e) >>> { >>> String s= " caught a " + e.getClass() + >>> "\n with message: " + e.getMessage(); >>> } >>> }//end of main >>> }//end of SerchFiles >>> _______________________________________________ >>> Mod_python mailing list >>> Mod_python at modpython.org >>> http://mailman.modpython.org/mailman/listinfo/mod_python >>> >>> >> > > > > >
|