SPSS .net plugin is awfully slow - Performance issues

Nov 21, 2012 at 4:25 AM
Edited Nov 21, 2012 at 4:26 AM

Hi All,

The plugin works fine with .NET 4.0 and SPSS v21.0

However, if I were to read a SPSS file with say 10,000 rows and 300 columns it takes more than a minute to read the data. Is this normal?

//Code segment

Dim dt As DataTable = SpssConvert.ToDataTable("Your SAV file")

I have noticed that SPSS reads the file within a couple of seconds, so does another software called Q for the same file.

Therefore, why would the .NET plugin take such a long time to read large .sav files in a .NET application?

Any help will be much appreciated.

Thanks

SPSSnetUser

Nov 21, 2012 at 5:10 AM
I wonder if it was our recent transition to the IBM spssio32.dll. Maybe it's not as fast the built-in SPSS one? You might try getting a version of the SPSS project from a few changes back in history (before the change to IBM) and see if that one is any faster. Please report back here.
Nov 21, 2012 at 5:14 AM

Can you suggest the exact version that I should be looking at?

Nov 21, 2012 at 6:07 AM
This version:
Nov 21, 2012 at 6:21 AM

Here are my results for a sample SPSS file with 10,000 rows and 300 columns.

The version suggested by AArnott took 35 seconds

The latest version available on the site (v2.0) took 1 minute and 15 seconds.

Even with the suggested version it is still taking awhile.

http://spss.codeplex.com/SourceControl/changeset/view/45d0ec3eca78#

I know for a fact that SPSS and a software called "Q" read the same SPSS file within a couple of seconds or so. Is there any other version (maybe a version that was done ages ago perhaps) that will read massive files in a quicker time?

Thanks AArnott for the speedy relies!

 

Nov 21, 2012 at 7:11 AM
That's good to hear. At this point we might say that the IBM dll is slower, or we might say that some accompanying changes in the managed code slowed it down. I believe SPSS itself uses the same spssio32.dll file to read data that your 35 second time used. So I suspect the rest of the time is from poorly performing code within this SPSS.NET project itself. The good news there is that it can probably be fixed.
If you run perf tools to find the problems and fix them, please send a pull request.

--
Andrew Arnott
"I [may] not agree with what you have to say, but I'll defend to the death your right to say it." - S. G. Tallentyre


On Tue, Nov 20, 2012 at 10:21 PM, spssnetUser <notifications@codeplex.com> wrote:

From: spssnetUser

Here are my results for a sample SPSS file with 10,000 rows and 300 columns.

The version suggested by AArnott took 35 seconds

The latest version available on the site (v2.0) took 1 minute and 15 seconds.

Even with the suggested version it is still taking awhile.

http://spss.codeplex.com/SourceControl/changeset/view/45d0ec3eca78#

I know for a fact that SPSS and a software called "Q" read the same SPSS file within a couple of seconds or so. Is there any other version (maybe a version that was done ages ago perhaps) that will read massive files in a quicker time?

Thanks AArnott for the speedy relies!

Read the full discussion online.

To add a post to this discussion, reply to this email (spss@discussions.codeplex.com)

To start a new discussion for this project, email spss@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Nov 22, 2012 at 1:02 AM

Thanks for the post Arnott. The older version has its own issues - Its not reading all SPSS files... Sometimes it fetches a data table with zero rows for some SPSS files..

So It would be safer to use the latest version in any event (It always reads the data), even with the performance issues. Just out of curiosity, why do you reckon that SPSS and other software's (like Q) are reading spss files fairly quickly? Do you reckon they are using some other plugin for reading large spss data files? Do you reckon they are using the R or python plugins for speedy access?

Unfortunately the current performance issues with the .net plugin is a big concern..

Nov 22, 2012 at 1:06 AM
Again, unless someone takes perf traces of the scenario you are describing, we won't know what the problem is, and perf trace will tell us very quickly what the problem is, so speculating would be just guesswork. I don't have any basis on which to guess.

--
Andrew Arnott
"I [may] not agree with what you have to say, but I'll defend to the death your right to say it." - S. G. Tallentyre


On Wed, Nov 21, 2012 at 5:02 PM, spssnetUser <notifications@codeplex.com> wrote:

From: spssnetUser

Thanks for the post Arnott. The older version has its own issues - Its not reading all SPSS files... Sometimes it fetches a data table with zero rows for some SPSS files..

So It would be safer to use the latest version in any event (It always reads the data), even with the performance issues. Just out of curiosity, why do you reckon that SPSS and other software's (like Q) are reading spss files fairly quickly? Do you reckon they are using some other plugin for reading large spss data files? Do you reckon they are using the R or python plugins for speedy access?

Unfortunately the current performance issues with the .net plugin is a big concern..

Read the full discussion online.

To add a post to this discussion, reply to this email (spss@discussions.codeplex.com)

To start a new discussion for this project, email spss@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Nov 22, 2012 at 1:14 AM

Is it possible for me to send you a trace file?

Nov 22, 2012 at 1:36 AM
You could, but honestly I have a lot of demands on my time lately, including a few other open source projects, so I don't think I'll take time to work on it in the near future.
But if you make perf fixes, I'm happy to accept pull requests (so long as the unit tests pass).
Sorry.

--
Andrew Arnott
"I [may] not agree with what you have to say, but I'll defend to the death your right to say it." - S. G. Tallentyre


On Wed, Nov 21, 2012 at 5:14 PM, spssnetUser <notifications@codeplex.com> wrote:

From: spssnetUser

Is it possible for me to send you a trace file?

Read the full discussion online.

To add a post to this discussion, reply to this email (spss@discussions.codeplex.com)

To start a new discussion for this project, email spss@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com