log
graph
tags
changeset
browse
dnetc_cuda
log
find changesets by author, revision, files, or words in the commit message
rev 32:
(0)
tip
age
author
description
2 years
paul
Added a declaration for the OGR_cuda array.
default
tip
2 years
paul
Added support for properly handling "state" and "pnodes" that is passed into ogr_cycle. Still untested... 1 thread block, 1 thread...
2 years
paul
Added a define for OGR_DEBUG.
2 years
paul
Changed the default OGROPT_STRENGTH_REDUCE_CHOOSE to 1.
2 years
paul
Changed OGROPT_ALTERNATE_CYCLE to 2.
2 years
paul
Moved the ogr_first_blank_8bit to the end of the file.
2 years
paul
Modified org_cycle to properly handle the return value. Not tested on actual hardware... Still 1 thread block, 1 thread.
2 years
paul
Removed the unneeded "inline" in the funciton declaration since device functions are inlined. The core is still only 1 thread block, 1 thread and has NOT been tested on actual hardware.
2 years
paul
Modifed the configure file to no longer pass -deviceemu to nvcc.
2 years
paul
Ignore vim .swp files
2 years
paul
Pulled in support from ogr.cpp for both the register sparse and dense macros.
2 years
paul
Modifed the code so that it builds cleanly without the -deviceemu flag. This still not ready to be run on the target hardware.
2 years
paul
Removed the unused check_* function arguments from the cuda_core function. Indentation clean-ups.
2 years
paul
Initial commit of the cude ogr core. This is a direct port of the core from ansi/ogr.cpp. This will only build and execute properly if the -deviceemu flag is passed to nvcc. Only 1 gpu thread is instanciated.
2 years
paul
Added support for building the cuda ogr core. Set the -deviceemu flag for nvcc so that the ogr core builds and can be debugged.
2 years
paul
Added support for calling the cuda ogr core.
2 years
paul
Modified the core early exit code to use "return" instead of goto.
2 years
paul
Modified the initialization of the S[] to be directly from the KEY_INIT macro. Net gain of 2Mkeys/sec.
2 years
paul
Removed unnecessary include.
2 years
paul
Optimized the processing of the results[] to only occur if some kind of match was found. Now seeing key rates of 144 MKeys/sec (8800 GTX) on actual W/Us.
2 years
paul
Fixed the initialization of grid_dim that prevented "-stress" from executing properly.
2 years
paul
Restored the original opts* settings as the modified valies caused the "-stress" option to fail.
2 years
paul
Changed the pipeline_count from 8 to 1.
2 years
paul
Optimized host side result check loop. We now get ~ 124 MKeys/sec on an 8800 GTX. Added proper malloc and cuda error handling. Cleaned up unused code.
2 years
paul
Modified the cuda core.
2 years
paul
Added a makefile target for building cuda ptx files. This is a dump of the GPU assembly output.
2 years
paul
Added support for ignoring .ptx files in the output directory.
2 years
paul
Cleanups and commenting the rc5 cuda code. No functional or performance changes.
2 years
paul
Hacked in support for an RC5-72 CUDA core. This is extremely rough around the edges... Totally unoptimized... Linux only...
2 years
Paul Kurucz
Ignore dnetc and dnetc.1
2 years
Paul Kurucz
Ignore the Makefile
2 years
Paul Kurucz
Corrected the hgignore.
2 years
Paul Kurucz
dnetc public snapshot: June 14, 2006
rev 32:
(0)
tip