Bug #238

Microsoft .xlsx files create java heap space error Out of memory error

Added by Ulrik Kautsky 822 days ago. Updated 803 days ago.

Status:New Start:11/04/2009
Priority:Normal Due date:
Assigned to:- % Done:

0%

Category:-
Target version:-

Description

While scanning a quite large repository it seem repeatedly to create the error below for excel.xlsx files which are larger than about 10 Mb.
It runs on Windows Xp with 3 Gb memory and there seems to physical memory left. Seem no difference when I try to close all unnecssary files and processes. If the memory management is difficult to change, at least a trap for the error just neglecting these large files should it make easy to continue scanning the repository.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3039

at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3060

at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale
java:3250)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.
ava:1082)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(
iccoloLexer.java:1822)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(Pic
oloLexer.java:1521)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(Piccolo
exer.java:1362)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer
java:4678)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290

at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:14
0)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:343
)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1
70)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1
57)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTyp
LoaderBase.java:345)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocumen
$Factory.parse(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:126)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.jav
:118)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbo
k.java:201)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:
64)
at org.apache.poi.xssf.extractor.XSSFExcelExtractor.<init>(XSSFExcelExt
actor.java:48)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorF
ctory.java:100)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorF
ctory.java:86)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser
java:47)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:10
)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:
8)
at com.soebes.supose.scan.document.ScanExcelDocument.scan(ScanExcelDocu
ent.java:78)
at com.soebes.supose.scan.document.ScanExcelDocument.indexDocument(Scan
xcelDocument.java:64)
at com.soebes.supose.scan.FileExtensionHandler.execute(FileExtensionHan
ler.java:61)
at com.soebes.supose.scan.ScanRepository.indexFile(ScanRepository.java:
56)
at com.soebes.supose.scan.ScanRepository.workOnChangeSet(ScanRepository
java:204)
at com.soebes.supose.scan.ScanRepository.scan(ScanRepository.java:136)@

History

Updated by Karl Heinz Marbaise 822 days ago

Hi Ulrik,

what is your configuration? Changes made in the batch file in bin folder? Did you use -Xms or -Xmx options in any way?

Kind regards
Karl Heinz Marbaise

Updated by Ulrik Kautsky 820 days ago

what is your configuration? Changes made in the batch file in bin folder? Did you use -Xms or -Xmx options in any way?

When made the issue it was the default -Xmx of 1024
Then I increased this to 1536. I didn't dare to go further. I don't really know upper limits of Windows XP 32bit. But it didn't change the result. I looked on the memory usage and realised that java at least all the space.

Updated by Ulrik Kautsky 803 days ago

what is your configuration? Changes made in the batch file in bin folder? Did you use -Xms or -Xmx options in any way?

We tested on solaris also with heap 2008kb, same trouble. If there is no current fix on this, maybe just make an option to jump to next file above a certain size of xlsx files.

Also available in: Atom PDF