PPL Logo

libs/ck-libs/metis/GKlib/csr.c File Reference

Various routines with dealing with CSR matrices. More...

Go to the source code of this file.

Functions

gk_csr_tgk_csr_Create ()
void gk_csr_Init (gk_csr_t *mat)
void gk_csr_Free (gk_csr_t **mat)
void gk_csr_FreeContents (gk_csr_t *mat)
gk_csr_tgk_csr_Dup (gk_csr_t *mat)
gk_csr_tgk_csr_ExtractSubmatrix (gk_csr_t *mat, int rstart, int nrows)
gk_csr_tgk_csr_ExtractRows (gk_csr_t *mat, int nrows, int *rind)
gk_csr_tgk_csr_ExtractPartition (gk_csr_t *mat, int *part, int pid)
gk_csr_t ** gk_csr_Split (gk_csr_t *mat, int *color)
gk_csr_tgk_csr_Read (char *filename, int format, int readvals, int numbering)
void gk_csr_Write (gk_csr_t *mat, char *filename, int format, int writevals, int numbering)
gk_csr_tgk_csr_Prune (gk_csr_t *mat, int what, int minf, int maxf)
gk_csr_tgk_csr_LowFilter (gk_csr_t *mat, int what, int norm, float fraction)
gk_csr_tgk_csr_TopKPlusFilter (gk_csr_t *mat, int what, int topk, float keepval)
gk_csr_tgk_csr_ZScoreFilter (gk_csr_t *mat, int what, float zscore)
void gk_csr_CompactColumns (gk_csr_t *mat)
void gk_csr_SortIndices (gk_csr_t *mat, int what)
void gk_csr_CreateIndex (gk_csr_t *mat, int what)
void gk_csr_Normalize (gk_csr_t *mat, int what, int norm)
void gk_csr_Scale (gk_csr_t *mat, int type)
void gk_csr_ComputeSums (gk_csr_t *mat, int what)
void gk_csr_ComputeSquaredNorms (gk_csr_t *mat, int what)
float gk_csr_ComputeSimilarity (gk_csr_t *mat, int i1, int i2, int what, int simtype)
int gk_csr_GetSimilarRows (gk_csr_t *mat, int nqterms, int *qind, float *qval, int simtype, int nsim, float minsim, gk_fkv_t *hits, int *i_marker, gk_fkv_t *i_cand)


Detailed Description

Various routines with dealing with CSR matrices.

Author:
George Karypis
Version:
$Id: csr.c 13437 2013-01-11 21:54:10Z karypis $ 

Definition in file csr.c.


Function Documentation

gk_csr_t* gk_csr_Create (  ) 

Allocate memory for a CSR matrix and initializes it

Returns:
the allocated matrix. The various fields are set to NULL.

Definition at line 19 of file csr.c.

References gk_csr_Init(), gk_csr_t, and gk_malloc().

Referenced by gk_csr_Dup(), gk_csr_ExtractPartition(), gk_csr_ExtractRows(), gk_csr_ExtractSubmatrix(), gk_csr_LowFilter(), gk_csr_Prune(), gk_csr_Read(), gk_csr_Split(), gk_csr_TopKPlusFilter(), gk_csr_ZScoreFilter(), gk_find_frequent_itemsets(), and itemsets_project_matrix().

Here is the call graph for this function:

Here is the caller graph for this function:

void gk_csr_Init ( gk_csr_t mat  ) 

Initializes the matrix

Parameters:
mat is the matrix to be initialized.

Definition at line 36 of file csr.c.

References gk_csr_t.

Referenced by gk_csr_Create().

Here is the caller graph for this function:

void gk_csr_Free ( gk_csr_t **  mat  ) 

Frees all the memory allocated for matrix.

Parameters:
mat is the matrix to be freed.

Definition at line 48 of file csr.c.

References gk_csr_FreeContents(), and gk_free().

Referenced by gk_csr_LowFilter(), gk_csr_Prune(), gk_csr_TopKPlusFilter(), gk_csr_ZScoreFilter(), gk_find_frequent_itemsets(), itemsets_find_frequent_itemsets(), and main().

Here is the call graph for this function:

Here is the caller graph for this function:

void gk_csr_FreeContents ( gk_csr_t mat  ) 

Frees only the memory allocated for the matrix's different fields and sets them to NULL.

Parameters:
mat is the matrix whose contents will be freed.

Definition at line 63 of file csr.c.

References gk_free().

Referenced by gk_csr_Free().

Here is the call graph for this function:

Here is the caller graph for this function:

gk_csr_t* gk_csr_Dup ( gk_csr_t mat  ) 

Returns a copy of a matrix.

Parameters:
mat is the matrix to be duplicated.
Returns:
the newly created copy of the matrix.

Definition at line 80 of file csr.c.

References gk_csr_Create(), and gk_csr_t.

Here is the call graph for this function:

gk_csr_t* gk_csr_ExtractSubmatrix ( gk_csr_t mat,
int  rstart,
int  nrows 
)

Returns a submatrix containint a set of consecutive rows.

Parameters:
mat is the original matrix.
rstart is the starting row.
nrows is the number of rows from rstart to extract.
Returns:
the row structure of the newly created submatrix.

Definition at line 135 of file csr.c.

References gk_csr_Create(), and gk_csr_t.

Here is the call graph for this function:

gk_csr_t* gk_csr_ExtractRows ( gk_csr_t mat,
int  nrows,
int rind 
)

Returns a submatrix containing a certain set of rows.

Parameters:
mat is the original matrix.
nrows is the number of rows to extract.
rind is the set of row numbers to extract.
Returns:
the row structure of the newly created submatrix.

Definition at line 191 of file csr.c.

References gk_csr_Create(), and gk_csr_t.

Here is the call graph for this function:

gk_csr_t* gk_csr_ExtractPartition ( gk_csr_t mat,
int part,
int  pid 
)

Returns a submatrix corresponding to a specified partitioning of rows.

Parameters:
mat is the original matrix.
part is the partitioning vector of the rows.
pid is the partition ID that will be extracted.
Returns:
the row structure of the newly created submatrix.

Definition at line 230 of file csr.c.

References gk_csr_Create(), and gk_csr_t.

Here is the call graph for this function:

gk_csr_t** gk_csr_Split ( gk_csr_t mat,
int color 
)

Splits the matrix into multiple sub-matrices based on the provided color array.

Parameters:
mat is the original matrix.
color is an array of size equal to the number of non-zeros in the matrix (row-wise structure). The matrix is split into as many parts as the number of colors. For meaningfull results, the colors should be numbered consecutively starting from 0.
Returns:
an array of matrices for each supplied color number.

Definition at line 277 of file csr.c.

References gk_csr_Create(), gk_csr_t, and gk_malloc().

Here is the call graph for this function:

gk_csr_t* gk_csr_Read ( char *  filename,
int  format,
int  readvals,
int  numbering 
)

Reads a CSR matrix from the supplied file and stores it the matrix's forward structure.

Parameters:
filename is the file that stores the data.
format is either GK_CSR_FMT_METIS, GK_CSR_FMT_CLUTO, GK_CSR_FMT_CSR, GK_CSR_FMT_BINROW, GK_CSR_FMT_BINCOL specifying the type of the input format. The GK_CSR_FMT_CSR does not contain a header line, whereas the GK_CSR_FMT_BINROW is a binary format written by gk_csr_Write() using the same format specifier.
readvals is either 1 or 0, indicating if the CSR file contains values or it does not. It only applies when GK_CSR_FMT_CSR is used.
numbering is either 1 or 0, indicating if the numbering of the indices start from 1 or 0, respectively. If they start from 1, they are automatically decreamented during input so that they will start from 0. It only applies when GK_CSR_FMT_CSR is used.
Returns:
the matrix that was read.

Definition at line 349 of file csr.c.

References errexit(), float, gk_csr_Create(), gk_csr_t, gk_errexit(), gk_fclose(), gk_fexists(), gk_fopen(), gk_free(), gk_getfilestats(), gk_getline(), int, int32_t, PUP::l, and ncon.

Referenced by main().

Here is the call graph for this function:

Here is the caller graph for this function:

void gk_csr_Write ( gk_csr_t mat,
char *  filename,
int  format,
int  writevals,
int  numbering 
)

Writes the row-based structure of a matrix into a file.

Parameters:
mat is the matrix to be written,
filename is the name of the output file.
format is one of: GK_CSR_FMT_CLUTO, GK_CSR_FMT_CSR, GK_CSR_FMT_BINROW, GK_CSR_FMT_BINCOL.
writevals is either 1 or 0 indicating if the values will be written or not. This is only applicable when GK_CSR_FMT_CSR is used.
numbering is either 1 or 0 indicating if the internal 0-based numbering will be shifted by one or not during output. This is only applicable when GK_CSR_FMT_CSR is used.

Definition at line 591 of file csr.c.

References gk_errexit(), gk_fclose(), gk_fopen(), and int32_t.

Here is the call graph for this function:

gk_csr_t* gk_csr_Prune ( gk_csr_t mat,
int  what,
int  minf,
int  maxf 
)

Prunes certain rows/columns of the matrix. The prunning takes place by analyzing the row structure of the matrix. The prunning takes place by removing rows/columns but it does not affect the numbering of the remaining rows/columns.

Parameters:
mat the matrix to be prunned,
what indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
minf is the minimum number of rows (columns) that a column (row) must be present in order to be kept,
maxf is the maximum number of rows (columns) that a column (row) must be present at in order to be kept.
Returns:
the prunned matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 669 of file csr.c.

References gk_csr_Create(), gk_csr_Free(), gk_csr_t, gk_errexit(), and gk_free().

Here is the call graph for this function:

gk_csr_t* gk_csr_LowFilter ( gk_csr_t mat,
int  what,
int  norm,
float  fraction 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight entries whose sum accounts for a certain fraction of the overall weight of the row/column.

Parameters:
mat the matrix to be prunned,
what indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
norm indicates the norm that will be used to aggregate the weights and possible values are 1 or 2,
fraction is the fraction of the overall norm that will be retained by the kept entries.
Returns:
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 759 of file csr.c.

References gk_csr_Create(), gk_csr_Free(), gk_csr_t, gk_errexit(), gk_fkvsortd(), and gk_free().

Here is the call graph for this function:

gk_csr_t* gk_csr_TopKPlusFilter ( gk_csr_t mat,
int  what,
int  topk,
float  keepval 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight top-K entries along each row/column and those entries whose weight is greater than a specified value.

Parameters:
mat the matrix to be prunned,
what indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
topk is the number of the highest weight entries to keep.
keepval is the weight of a term above which will be kept. This is used to select additional terms past the first topk.
Returns:
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 901 of file csr.c.

References gk_csr_Create(), gk_csr_Free(), gk_csr_t, gk_errexit(), gk_fkvsortd(), gk_free(), and key.

Here is the call graph for this function:

gk_csr_t* gk_csr_ZScoreFilter ( gk_csr_t mat,
int  what,
float  zscore 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the terms whose contribution to the total length of the document is greater than a user-splied multiple over the average.

This routine assumes that the vectors are normalized to be unit length.

Parameters:
mat the matrix to be prunned,
what indicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
zscore is the multiplicative factor over the average contribution to the length of the document.
Returns:
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.

Definition at line 1031 of file csr.c.

References gk_csr_Create(), gk_csr_Free(), gk_csr_t, and gk_errexit().

Here is the call graph for this function:

void gk_csr_CompactColumns ( gk_csr_t mat  ) 

Compacts the column-space of the matrix by removing empty columns. As a result of the compaction, the column numbers are renumbered. The compaction operation is done in place and only affects the row-based representation of the matrix. The new columns are ordered in decreasing frequency.

Parameters:
mat the matrix whose empty columns will be removed.

Definition at line 1098 of file csr.c.

References gk_free(), gk_ikvsortd(), and key.

Here is the call graph for this function:

void gk_csr_SortIndices ( gk_csr_t mat,
int  what 
)

Sorts the indices in increasing order

Parameters:
mat the matrix itself,
what is either GK_CSR_ROW or GK_CSR_COL indicating which set of indices to sort.

Definition at line 1146 of file csr.c.

References gk_errexit(), gk_free(), gk_ikvsorti(), n, and nn.

Here is the call graph for this function:

void gk_csr_CreateIndex ( gk_csr_t mat,
int  what 
)

Creates a row/column index from the column/row data.

Parameters:
mat the matrix itself,
what is either GK_CSR_ROW or GK_CSR_COL indicating which index will be created.

Definition at line 1223 of file csr.c.

References gk_errexit(), and gk_free().

Referenced by gk_find_frequent_itemsets(), and main().

Here is the call graph for this function:

Here is the caller graph for this function:

void gk_csr_Normalize ( gk_csr_t mat,
int  what,
int  norm 
)

Normalizes the rows/columns of the matrix to be unit length.

Parameters:
mat the matrix itself,
what indicates what will be normalized and is obtained by specifying GK_CSR_ROW, GK_CSR_COL, GK_CSR_ROW|GK_CSR_COL.
norm indicates what norm is to normalize to, 1: 1-norm, 2: 2-norm

Definition at line 1319 of file csr.c.

References n.

void gk_csr_Scale ( gk_csr_t mat,
int  type 
)

Applies different row scaling methods.

Parameters:
mat the matrix itself,
type indicates the type of row scaling. Possible values are: GK_CSR_MAXTF, GK_CSR_SQRT, GK_CSR_LOG, GK_CSR_IDF, GK_CSR_MAXTF2.

Definition at line 1389 of file csr.c.

References gk_errexit(), and gk_free().

Here is the call graph for this function:

void gk_csr_ComputeSums ( gk_csr_t mat,
int  what 
)

Computes the sums of the rows/columns

Parameters:
mat the matrix itself,
what is either GK_CSR_ROW or GK_CSR_COL indicating which sums to compute.

Definition at line 1600 of file csr.c.

References gk_errexit(), gk_free(), and n.

Here is the call graph for this function:

void gk_csr_ComputeSquaredNorms ( gk_csr_t mat,
int  what 
)

Computes the squared of the norms of the rows/columns

Parameters:
mat the matrix itself,
what is either GK_CSR_ROW or GK_CSR_COL indicating which squared norms to compute.

Definition at line 1646 of file csr.c.

References gk_errexit(), gk_free(), and n.

Here is the call graph for this function:

float gk_csr_ComputeSimilarity ( gk_csr_t mat,
int  i1,
int  i2,
int  what,
int  simtype 
)

Computes the similarity between two rows/columns

Parameters:
mat the matrix itself. The routine assumes that the indices are sorted in increasing order.
i1 is the first row/column,
i2 is the second row/column,
what is either GK_CSR_ROW or GK_CSR_COL indicating the type of objects between the similarity will be computed,
simtype is the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN, GK_CSR_AMIN
Returns:
the similarity between the two rows/columns.

Definition at line 1697 of file csr.c.

References gk_errexit().

Here is the call graph for this function:

int gk_csr_GetSimilarRows ( gk_csr_t mat,
int  nqterms,
int qind,
float qval,
int  simtype,
int  nsim,
float  minsim,
gk_fkv_t *  hits,
int i_marker,
gk_fkv_t *  i_cand 
)

Finds the n most similar rows (neighbors) to the query using cosine similarity.

Parameters:
mat the matrix itself
nqterms is the number of columns in the query
qind is the list of query columns
qval is the list of correspodning query weights
simtype is the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN, GK_CSR_AMIN
nsim is the maximum number of requested most similar rows. If -1 is provided, then everything is returned unsorted.
minsim is the minimum similarity of the requested most similar rows
hits is the result set. This array should be at least of length nsim.
i_marker is an array of size equal to the number of rows whose values are initialized to -1. If NULL is provided then this array is allocated and freed internally.
i_cand is an array of size equal to the number of rows. If NULL is provided then this array is allocated and freed internally.
Returns:
the number of identified most similar rows, which can be smaller than the requested number of nnbrs in those cases in which there are no sufficiently many neighbors.

Definition at line 1869 of file csr.c.

References gk_dfkvkselect(), gk_errexit(), gk_fkvsortd(), gk_free(), and key.

Here is the call graph for this function:


Generated on Mon Sep 21 08:08:01 2020 for Charm++ by  doxygen 1.5.5