Go to the source code of this file.
$Id: csr.c 13437 2013-01-11 21:54:10Z karypis $
Definition in file csr.c.
gk_csr_t* gk_csr_Create | ( | ) |
Allocate memory for a CSR matrix and initializes it
Definition at line 19 of file csr.c.
References gk_csr_Init(), gk_csr_t, and gk_malloc().
Referenced by gk_csr_Dup(), gk_csr_ExtractPartition(), gk_csr_ExtractRows(), gk_csr_ExtractSubmatrix(), gk_csr_LowFilter(), gk_csr_Prune(), gk_csr_Read(), gk_csr_Split(), gk_csr_TopKPlusFilter(), gk_csr_ZScoreFilter(), gk_find_frequent_itemsets(), and itemsets_project_matrix().
void gk_csr_Init | ( | gk_csr_t * | mat | ) |
Initializes the matrix
mat | is the matrix to be initialized. |
Definition at line 36 of file csr.c.
References gk_csr_t.
Referenced by gk_csr_Create().
void gk_csr_Free | ( | gk_csr_t ** | mat | ) |
Frees all the memory allocated for matrix.
mat | is the matrix to be freed. |
Definition at line 48 of file csr.c.
References gk_csr_FreeContents(), and gk_free().
Referenced by gk_csr_LowFilter(), gk_csr_Prune(), gk_csr_TopKPlusFilter(), gk_csr_ZScoreFilter(), gk_find_frequent_itemsets(), itemsets_find_frequent_itemsets(), and main().
void gk_csr_FreeContents | ( | gk_csr_t * | mat | ) |
Frees only the memory allocated for the matrix's different fields and sets them to NULL.
mat | is the matrix whose contents will be freed. |
Definition at line 63 of file csr.c.
References gk_free().
Referenced by gk_csr_Free().
Returns a copy of a matrix.
mat | is the matrix to be duplicated. |
Definition at line 80 of file csr.c.
References gk_csr_Create(), and gk_csr_t.
Returns a submatrix containint a set of consecutive rows.
mat | is the original matrix. | |
rstart | is the starting row. | |
nrows | is the number of rows from rstart to extract. |
Definition at line 135 of file csr.c.
References gk_csr_Create(), and gk_csr_t.
Returns a submatrix containing a certain set of rows.
mat | is the original matrix. | |
nrows | is the number of rows to extract. | |
rind | is the set of row numbers to extract. |
Definition at line 191 of file csr.c.
References gk_csr_Create(), and gk_csr_t.
Returns a submatrix corresponding to a specified partitioning of rows.
mat | is the original matrix. | |
part | is the partitioning vector of the rows. | |
pid | is the partition ID that will be extracted. |
Definition at line 230 of file csr.c.
References gk_csr_Create(), and gk_csr_t.
Splits the matrix into multiple sub-matrices based on the provided color array.
mat | is the original matrix. | |
color | is an array of size equal to the number of non-zeros in the matrix (row-wise structure). The matrix is split into as many parts as the number of colors. For meaningfull results, the colors should be numbered consecutively starting from 0. |
Definition at line 277 of file csr.c.
References gk_csr_Create(), gk_csr_t, and gk_malloc().
Reads a CSR matrix from the supplied file and stores it the matrix's forward structure.
filename | is the file that stores the data. | |
format | is either GK_CSR_FMT_METIS, GK_CSR_FMT_CLUTO, GK_CSR_FMT_CSR, GK_CSR_FMT_BINROW, GK_CSR_FMT_BINCOL specifying the type of the input format. The GK_CSR_FMT_CSR does not contain a header line, whereas the GK_CSR_FMT_BINROW is a binary format written by gk_csr_Write() using the same format specifier. | |
readvals | is either 1 or 0, indicating if the CSR file contains values or it does not. It only applies when GK_CSR_FMT_CSR is used. | |
numbering | is either 1 or 0, indicating if the numbering of the indices start from 1 or 0, respectively. If they start from 1, they are automatically decreamented during input so that they will start from 0. It only applies when GK_CSR_FMT_CSR is used. |
Definition at line 349 of file csr.c.
References errexit(), float, gk_csr_Create(), gk_csr_t, gk_errexit(), gk_fclose(), gk_fexists(), gk_fopen(), gk_free(), gk_getfilestats(), gk_getline(), int, int32_t, PUP::l, and ncon.
Referenced by main().
Writes the row-based structure of a matrix into a file.
mat | is the matrix to be written, | |
filename | is the name of the output file. | |
format | is one of: GK_CSR_FMT_CLUTO, GK_CSR_FMT_CSR, GK_CSR_FMT_BINROW, GK_CSR_FMT_BINCOL. | |
writevals | is either 1 or 0 indicating if the values will be written or not. This is only applicable when GK_CSR_FMT_CSR is used. | |
numbering | is either 1 or 0 indicating if the internal 0-based numbering will be shifted by one or not during output. This is only applicable when GK_CSR_FMT_CSR is used. |
Definition at line 591 of file csr.c.
References gk_errexit(), gk_fclose(), gk_fopen(), and int32_t.
Prunes certain rows/columns of the matrix. The prunning takes place by analyzing the row structure of the matrix. The prunning takes place by removing rows/columns but it does not affect the numbering of the remaining rows/columns.
mat | the matrix to be prunned, | |
what | indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned, | |
minf | is the minimum number of rows (columns) that a column (row) must be present in order to be kept, | |
maxf | is the maximum number of rows (columns) that a column (row) must be present at in order to be kept. |
Definition at line 669 of file csr.c.
References gk_csr_Create(), gk_csr_Free(), gk_csr_t, gk_errexit(), and gk_free().
Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight entries whose sum accounts for a certain fraction of the overall weight of the row/column.
mat | the matrix to be prunned, | |
what | indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned, | |
norm | indicates the norm that will be used to aggregate the weights and possible values are 1 or 2, | |
fraction | is the fraction of the overall norm that will be retained by the kept entries. |
Definition at line 759 of file csr.c.
References gk_csr_Create(), gk_csr_Free(), gk_csr_t, gk_errexit(), gk_fkvsortd(), and gk_free().
Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight top-K entries along each row/column and those entries whose weight is greater than a specified value.
mat | the matrix to be prunned, | |
what | indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned, | |
topk | is the number of the highest weight entries to keep. | |
keepval | is the weight of a term above which will be kept. This is used to select additional terms past the first topk. |
Definition at line 901 of file csr.c.
References gk_csr_Create(), gk_csr_Free(), gk_csr_t, gk_errexit(), gk_fkvsortd(), gk_free(), and key.
Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the terms whose contribution to the total length of the document is greater than a user-splied multiple over the average.
This routine assumes that the vectors are normalized to be unit length.
mat | the matrix to be prunned, | |
what | indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned, | |
zscore | is the multiplicative factor over the average contribution to the length of the document. |
Definition at line 1031 of file csr.c.
References gk_csr_Create(), gk_csr_Free(), gk_csr_t, and gk_errexit().
void gk_csr_CompactColumns | ( | gk_csr_t * | mat | ) |
Compacts the column-space of the matrix by removing empty columns. As a result of the compaction, the column numbers are renumbered. The compaction operation is done in place and only affects the row-based representation of the matrix. The new columns are ordered in decreasing frequency.
mat | the matrix whose empty columns will be removed. |
Definition at line 1098 of file csr.c.
References gk_free(), gk_ikvsortd(), and key.
Sorts the indices in increasing order
mat | the matrix itself, | |
what | is either GK_CSR_ROW or GK_CSR_COL indicating which set of indices to sort. |
Definition at line 1146 of file csr.c.
References gk_errexit(), gk_free(), gk_ikvsorti(), n, and nn.
Creates a row/column index from the column/row data.
mat | the matrix itself, | |
what | is either GK_CSR_ROW or GK_CSR_COL indicating which index will be created. |
Definition at line 1223 of file csr.c.
References gk_errexit(), and gk_free().
Referenced by gk_find_frequent_itemsets(), and main().
Normalizes the rows/columns of the matrix to be unit length.
mat | the matrix itself, | |
what | indicates what will be normalized and is obtained by specifying GK_CSR_ROW, GK_CSR_COL, GK_CSR_ROW|GK_CSR_COL. | |
norm | indicates what norm is to normalize to, 1: 1-norm, 2: 2-norm |
Definition at line 1319 of file csr.c.
References n.
Applies different row scaling methods.
mat | the matrix itself, | |
type | indicates the type of row scaling. Possible values are: GK_CSR_MAXTF, GK_CSR_SQRT, GK_CSR_LOG, GK_CSR_IDF, GK_CSR_MAXTF2. |
Definition at line 1389 of file csr.c.
References gk_errexit(), and gk_free().
Computes the sums of the rows/columns
mat | the matrix itself, | |
what | is either GK_CSR_ROW or GK_CSR_COL indicating which sums to compute. |
Definition at line 1600 of file csr.c.
References gk_errexit(), gk_free(), and n.
Computes the squared of the norms of the rows/columns
mat | the matrix itself, | |
what | is either GK_CSR_ROW or GK_CSR_COL indicating which squared norms to compute. |
Definition at line 1646 of file csr.c.
References gk_errexit(), gk_free(), and n.
Computes the similarity between two rows/columns
mat | the matrix itself. The routine assumes that the indices are sorted in increasing order. | |
i1 | is the first row/column, | |
i2 | is the second row/column, | |
what | is either GK_CSR_ROW or GK_CSR_COL indicating the type of objects between the similarity will be computed, | |
simtype | is the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN, GK_CSR_AMIN |
Definition at line 1697 of file csr.c.
References gk_errexit().
int gk_csr_GetSimilarRows | ( | gk_csr_t * | mat, | |
int | nqterms, | |||
int * | qind, | |||
float * | qval, | |||
int | simtype, | |||
int | nsim, | |||
float | minsim, | |||
gk_fkv_t * | hits, | |||
int * | i_marker, | |||
gk_fkv_t * | i_cand | |||
) |
Finds the n most similar rows (neighbors) to the query using cosine similarity.
mat | the matrix itself | |
nqterms | is the number of columns in the query | |
qind | is the list of query columns | |
qval | is the list of correspodning query weights | |
simtype | is the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN, GK_CSR_AMIN | |
nsim | is the maximum number of requested most similar rows. If -1 is provided, then everything is returned unsorted. | |
minsim | is the minimum similarity of the requested most similar rows | |
hits | is the result set. This array should be at least of length nsim. | |
i_marker | is an array of size equal to the number of rows whose values are initialized to -1. If NULL is provided then this array is allocated and freed internally. | |
i_cand | is an array of size equal to the number of rows. If NULL is provided then this array is allocated and freed internally. |
Definition at line 1869 of file csr.c.
References gk_dfkvkselect(), gk_errexit(), gk_fkvsortd(), gk_free(), and key.