Showing posts with label R Code. Show all posts
Showing posts with label R Code. Show all posts

Friday, March 26, 2021

Market/Volume Profile and Matrix Profile

Market/Volume Profile and Matrix Profile

A quick preview of what I am currently working on: using Matrix Profile to search for time series motifs, using the R tsmp package. The exact motifs I'm looking for are the various "initial balance" set ups of Market Profile charts. 

To do so, I'm concentrating the investigation around both the London and New York opening times, with a custom annotation vector (av). Below is a simple R function to set up this custom av, which is produced separately in Octave and then loaded into R.

mp_adjusted_by_custom_av <- function( mp_object , custom_av ){
## https://stackoverflow.com/questions/66726578/custom-annotation-vector-with-tsmp-r-package
mp_object$av <- custom_av
class( mp_object ) <- tsmp:::update_class( class( mp_object ) , "AnnotationVector" )
mp_adjusted_by_custom_av <- tsmp::av_apply( mp_object )
return( mp_adjusted_by_custom_av )
}
This animated GIF shows plots of short, exemplar adjusted market profile objects highlighting the London only, New York only and combined results of the relevant annotation vectors.
This is currently a work in progress and so I shall report results in due course.

Saturday, January 30, 2021

Temporal Clustering Times on Forex Majors Pairs

Temporal Clustering Times on Forex Majors Pairs

In the following code box there are the results from the temporal clustering routine of my last few posts on the four forex majors pairs of EUR_USD, GBP_USD, USD_CHF and USD_JPY.

###### EUR_USD 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##
## Delta turning point filter only ##
## "Normal" turning point filter only

###################### Monday ##############################################
K_opt == 8, ix values == 13 38 63 89 112 135 162 186 ## averaged over all 15 n_bars 1 to 15 inclusive
00 4:10 8:20 12:40 16:30 20:20 00:50 4:50

K_opt == 8, ix values == 13 39 64 89 112 135 161 186 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 5, ix_values == 21 60 97 134 175 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )
K == 6, ix values == 21 59 94 125 158 184

K_opt == 11, ix values == 9 26 43 60 78 95 113 132 151 169 185 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 8, ix values == 13 36 61 86 111 136 161 186 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 8, ix values == 13 34 61 87 110 137 164 187 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 8, ix values == 13 38 63 88 112 137 162 186 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 10, ix values == 10 31 52 72 91 112 131 150 169 188 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 8, ix values == 12 35 62 88 112 137 164 187 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt == 6, ix values == 131 169 206 244 283 322 ## averaged over all 15 n_bars 1 to 15 inclusive
19:40 02:00 8:10 14:30 21:00 03:30

K_opt == 6, ix values == 131 170 207 245 284 323 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 7, ix values == 131 168 206 243 274 305 330 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 11, ix values == 124 143 164 184 205 226 247 268 289 310 331 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 11, ix values == 124 144 164 185 204 225 246 267 288 309 332 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 7, ix values = 133 169 206 241 273 304 329 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 127 152 175 202 228 253 278 305 330 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 9, ix values == 127 152 177 202 228 253 278 304 329 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 7, ix values == 132 168 205 242 273 304 329 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt == 6, ix values == 275 312 351 389 426 465 ## averaged over all 15 n_bars 1 to 15 inclusive
19:40 01:50 08:20 14:40 20:50 03:20

K_opt == 6, ix values == 275 313 352 391 428 466 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 274 312 350 389 424 463 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 272 299 322 347 372 397 422 449 474 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 11, ix values == 268 288 308 329 348 369 390 411 432 453 476 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 275 312 351 388 424 463 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 272 297 322 348 373 398 423 449 474 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 9, ix values == 271 297 322 348 373 398 423 448 473 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 276 311 350 389 426 465 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

####################### Thursday ###########################################
K_opt == 6, ix values == 420 457 495 532 570 609 ## averaged over all 15 n_bars 1 to 15 inclusive
19:50 02:00 08:20 14:30 20:50 03:20

K_opt == 6, ix values == 420 457 494 531 570 610 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 6, ix values == 420 457 495 532 568 607 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 416 443 466 492 518 543 568 593 618 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 10, ix values == 414 437 460 483 506 527 550 573 596 619 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 9, ix values == 416 443 466 493 520 543 568 595 618 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 415 440 465 492 518 543 568 593 618 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 9, ix values == 415 440 465 492 518 543 568 593 618 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 7, ix values == 420 457 494 529 561 592 617 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

####################### Friday #############################################
K_opt == 5, ix values == 564 599 635 670 703 ## averaged over all 15 n_bars 1 to 15 inclusive
19:50 01:40 07:40 13:30 19:00

K_opt == 6, ix values == 563 596 627 654 680 707 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )
K == 5, ix values == 564 599 635 668 703

K_opt == 5, ix values == 564 601 639 674 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 556 575 595 614 633 652 672 691 711 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 11, ix values == 554 570 587 602 619 634 651 667 682 698 713 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 9, ix values == 556 575 595 614 633 652 671 691 711 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 9, ix values == 556 575 596 613 634 652 672 691 711 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

K_opt == 9, ix values == 556 575 594 613 633 652 672 691 710 ## averaged over all 15 n_bars 1 to 15 inclusive

K_opt == 9, ix values == 556 575 594 613 634 653 672 691 710 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt == 5, ix values == 564 600 637 674 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

############################################################################

###### GBP_USD 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##

###################### Monday ##############################################
K_opt = 8, ix_values = 13 36 61 86 111 136 162 186 ## averaged over all 15 n_bars 1 to 15 inclusive
0:00 3:50 8:00 12:10 16:20 20:30 0:50 4:50

K_opt = 9, ix_values = 12 34 56 78 99 120 141 164 187 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 8, ix_values = 12 35 61 86 110 136 163 186 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt = 12, ix_values = 124 143 162 180 199 216 235 254 274 293 312 332 ## averaged over all 15 n_bars 1 to 15 inclusive
18:30 21:40 0:50 3:50 7:00 9:50 13:00 16:10 19:30 22:40 1:50 5:10

K_opt = 11, ix_values = 124 143 164 185 206 227 248 269 290 311 332 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 9, ix_values = 128 154 177 205 230 254 279 307 330 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt = 11, ix_values = 269 290 311 331 352 373 394 415 434 455 476 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 22:10 1:40 5:00 8:30 12:00 15:30 19:00 22:10 1:40 5:10

K_opt = 11, ix_values = 269 289 310 330 351 372 393 413 434 455 476 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 8, ix_values = 275 310 341 367 394 422 451 475 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Thursday ############################################
K_opt = 9, ix_values = 415 440 465 492 517 542 568 594 618 ## averaged over all 15 n_bars 1 to 15 inclusive
19:00 23:10 3:20 7:50 12:00 16:10 20:30 0:50 4:50

K_opt = 9, ix_values = 415 440 465 491 517 542 568 593 618 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 9, ix_values = 416 441 464 492 519 542 569 596 619 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Friday ##############################################
K_opt = 9, ix_values = 557 576 595 614 633 652 671 690 711 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 21:50 1:00 4:10 7:20 10:30 13:40 16:50 20:20

K_opt = 9, ix_values = 557 576 595 614 633 652 671 691 711 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 8, ix_values = 557 576 599 621 642 665 686 709 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

############################################################################

###### USD_CHF 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##

###################### Monday ##############################################
K_opt = 11, ix_values = 8 25 42 61 79 96 113 131 150 169 188 ## averaged over all 15 n_bars 1 to 15 inclusive
23:10 2:00 4:50 8:00 11:00 13:50 16:40 19:40 22:50 2:00 5:10

K_opt = 11, ix_values = 9 26 43 60 79 96 114 133 151 170 189 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 13 38 66 99 127 157 184 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt = 9, ix_values = 127 152 177 202 228 253 279 306 330 ## averaged over all 15 n_bars 1 to 15 inclusive
19:00 23:10 3:20 7:30 11:50 16:00 20:20 0:50 4:50

K_opt = 11, ix_values = 124 144 165 185 204 225 246 267 288 309 331 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 133 170 205 240 270 301 328 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt = 10, ix_values = 270 293 316 342 365 388 411 432 454 475 ## averaged over all 15 n_bars 1 to 15 inclusive
18:50 22:40 2:30 6:50 10:40 14:30 18:20 21:50 1:30 5:00

K_opt = 12, ix_values = 268 287 308 327 346 365 384 401 420 439 458 477 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 276 313 349 383 414 444 471 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Thursday ############################################
K_opt = 11, ix_values = 413 432 452 471 491 512 533 554 575 598 619 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 21:50 1:10 4:20 7:40 11:10 14:40 18:10 21:40 1:30 5:00

K_opt = 12, ix_values = 412 431 450 469 488 507 526 545 563 582 601 621 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 9, ix_values = 415 440 463 491 518 543 570 597 619 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Friday ##############################################
K_opt = 9, ix_values = 557 576 596 615 634 653 672 691 710 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 21:50 1:10 4:20 7:30 10:40 13:50 17:00 20:10

K_opt = 9, ix_values = 556 575 595 614 633 652 671 690 710 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 558 579 602 629 652 677 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

############################################################################

###### USD_JPY 10 minute bars #######
## In the following order
## Both Delta turning point filter and "normal" TPF combined ##

###################### Monday ##############################################
K_opt = 12, ix_values = 8 24 41 58 73 90 107 124 141 158 173 190 ## averaged over all 15 n_bars 1 to 15 inclusive
23:10 1:50 4:40 7:30 10:00 12:50 15:40 18:30 21:20 0:10 2:40 5:30

K_opt = 12, ix_values = 8 24 41 56 73 90 107 124 141 158 173 190 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 5, ix_values = 20 60 99 136 175 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Tuesday #############################################
K_opt = 9, ix_values = 128 154 179 204 229 254 279 306 331 ## averaged over all 15 n_bars 1 to 15 inclusive
19:10 23:30 3:40 7:50 12:00 16:10 20:20 0:50 5:00

K_opt = 9, ix_values = 128 153 178 203 228 254 279 305 330 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 133 168 205 240 271 302 329 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Wednesday ###########################################
K_opt = 11, ix_values = 269 289 310 331 352 373 394 414 433 454 476 ## averaged over all 15 n_bars 1 to 15 inclusive
18:40 22:00 1:30 5:00 8:30 12:00 15:30 18:50 22:00 1:30 5:10

K_opt = 9, ix_values = 272 297 322 348 374 399 424 449 474 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 10, ix_values = 269 288 309 331 352 376 398 423 450 475 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Thursday ############################################
K_opt = 9, ix_values = 416 442 467 492 518 543 568 593 618 ## averaged over all 15 n_bars 1 to 15 inclusive
19:10 23:30 3:40 7:50 12:10 16:20 20:30 0:40 4:50

K_opt = 12, ix_values = 412 431 450 469 488 507 526 545 564 583 602 621 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 7, ix_values = 420 455 492 527 560 591 618 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

###################### Friday ##############################################
K_opt = 7, 8 or 9
ix_values 7 = 561 588 613 638 663 686 709 ## averaged over all 15 n_bars 1 to 15 inclusive
ix_values 8 = 557 578 599 622 643 666 687 710
ix_values 9 = 557 576 596 616 635 653 672 691 711 ## timings are for this bottom row
18:40 21:50 1:10 4:30 7:40 10:40 13:50 17:00 20:20

K_opt = 8, ix_values = 558 579 600 621 644 665 687 709 ## averaged over n_bars 1 to 6 inclusive ( upto and include 1 hour )

K_opt = 6, ix_values = 563 594 621 646 676 705 ## averaged over n_bars 7 to 15 inclusive ( over 1 hour )

############################################################################

This is based on 10 minute bars over the last year or so. Readers should read my last few previous posts for background.

The first set of results, EUR_USD, are what the charts of my previous posts were based on and include combined results of my "Delta Turning Point Filter" and "Normal Turning Point Filter" and the results for each filter separately. Since there doesn't appear to be significant differences between these, the other three pairs' results are the combined filter results only.

The K_opt variable is the optimal number of clusters (see my temporal-clustering-part-3 post for how "optimal" is decided) and the ix_values are also described in this post. For convenience the first set of ix_values per day have the relevant times anotated underneath and therefore it is a simple matter to count forwards/backwards in 10 minute increments to place times to the other ix_values. The variable n_bars is an input to the turning point filter functions and essentially indicates the lookback/lookforward period (n_bar == 2 would mean 2 x 10 minute periods) used for determining a local high/low according to each function's logic.

As to how to interpret this, a typical sequence of times per day might look like this:

18:40 22:00 1:30 5:00 8:30 12:00 15:30 18:50 22:00 1:30 5:10

where the highlighted times represent the BST times for the period covering the London session open to the New York session close for one day. The preceding and following times are the two "book-ending" Asian sessions. 

Close inspection of these results reveals some surprising regularities. In even just the above single example (an actual copy and paste of a code box example) there appear to be definite times per day at which a local high/low occurs. I hopefully will be able to incorporate this into some type of chart for a nice visual presentation of the data. 

More in due course. Enjoy.

Saturday, November 14, 2020

Temporal Clustering, Part 3

Temporal Clustering, Part 3

Continuing on with the subject matter of my last post, in the code box below there is R code which is a straight forward refactoring of the Octave code contained in the second code box of my last post. This code is my implementation of the cross validation routine described in the paper Cluster Validation by Prediction Strength, but adapted for use in the one dimensional case. I have refactored this into R code so that I can use the Ckmeans.1d.dp package for optimal, one dimensional clustering.

library( Ckmeans.1d.dp )

## load the training data from Octave output (comment out as necessary )
data = read.csv( "~/path/to//all_data_matrix" , header = FALSE )

## comment out as necessary
adjust = 0 ## default adjust value
sum_seq = seq( from = 1 , to = 198 , by = 1 ) ; adjust = 1 ; sum_seq_l = as.numeric( length( sum_seq ) )## Monday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Tuesday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Wednesday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Thursday
##sum_seq = seq( from = 547 , to = 720 , by = 1 ) ; adjust = 2 ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Friday

## intraday --- commnet out or adjust as necessary
##sum_seq = seq( from = 25 , to = 100 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) )

upper_tri_mask = 1 * upper.tri( matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l ) , diag = FALSE )
no_sample_iters = 1000
max_K = 20
all_k_ps = matrix( 0L , nrow = 1 , ncol = max_K )

for ( iters in 1 : no_sample_iters ) {

## sample the data in data by rows
train_ix = sample( nrow( data ) , size = round( nrow( data ) / 2 ) , replace = FALSE )
train_data = data[ train_ix , sum_seq ] ## extract training data using train_ix rows of data
train_data_sum = colSums( train_data ) ## sum down the columns of train_data
test_data = data[ -train_ix , sum_seq ] ## extract test data using NOT train_ix rows of data
test_data_sum = colSums( test_data ) ## sum down the columns of test_data
## adjust for weekend if necessary
if ( adjust == 1 ) { ## Monday, so correct artifacts of weekend gap
train_data_sum[ 1 : 5 ] = mean( train_data_sum[ 1 : 48 ] )
test_data_sum[ 1 : 5 ] = mean( test_data_sum[ 1 : 48 ] )
} else if ( adjust == 2 ) { ## Friday, so correct artifacts of weekend gap
train_data_sum[ ( sum_seq_l - 4 ) : sum_seq_l ] = mean( train_data_sum[ ( sum_seq_l - 47 ) : sum_seq_l ] )
test_data_sum[ ( sum_seq_l - 4 ) : sum_seq_l ] = mean( test_data_sum[ ( sum_seq_l - 47 ) : sum_seq_l ] )
}

for ( k in 1 : max_K ) {

## K segment train_data_sum
train_res = Ckmeans.1d.dp( sum_seq , k , train_data_sum )
train_out_pairs_mat = matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l )

## K segment test_data_sum
test_res = Ckmeans.1d.dp( sum_seq , k , test_data_sum )
test_out_pairs_mat = matrix( 0L , nrow = sum_seq_l , ncol = sum_seq_l )

for ( ii in 1 : length( train_res$centers ) ) {
ix = which( train_res$cluster == ii )
train_out_pairs_mat[ ix , ix ] = 1
ix = which( test_res$cluster == ii )
test_out_pairs_mat[ ix , ix ] = 1
}
## coerce to upper triangular matrix
train_out_pairs_mat = train_out_pairs_mat * upper_tri_mask
test_out_pairs_mat = test_out_pairs_mat * upper_tri_mask

## get minimum co-membership cluster proportion
sample_min_vec = matrix( 0L , nrow = 1 , ncol = length( test_res$centers ) )
for ( ii in 1 : length( test_res$centers ) ) {
ix = which( test_res$cluster == ii )
test_cluster_sum = sum( test_out_pairs_mat[ ix , ix ] )
train_cluster_sum = sum( test_out_pairs_mat[ ix , ix ] * train_out_pairs_mat[ ix , ix ] )
sample_min_vec[ , ii ] = train_cluster_sum / test_cluster_sum
}

## get min of sample_min_vec
min_val = min( sample_min_vec[ !is.nan( sample_min_vec ) ] ) ## removing any NaN
all_k_ps[ , k ] = all_k_ps[ , k ] + min_val

} ## end of K for loop

} ## end of sample loop

all_k_ps = all_k_ps / no_sample_iters ## average values
plot( 1 : length( all_k_ps ) , all_k_ps , "b" , xlab = "Number of Clusters K" , ylab = "Prediction Strength Value" )
abline( h = 0.8 , col = "red" )

The purpose of the cross validation routine is to select the number of clusters K, in the model selection sense, that is best supported by the available data. The above linked paper suggests that the optimal number of clusters K is the highest number K that has a prediction strength value over some given threshold (e.g. 0.8 or 0.9). The last part of the code plots the values of prediction strength for K (x-axis) vs. prediction strength (y-axis), along with the threshold value of 0.8 in red. For the particular set of data in question, it can be seen that the optimal K value for the number of clusters is 8.

This second code box shows code, re-using some of the above code, to visualise the clusters for a given K,
library( Ckmeans.1d.dp )

## load the training data from Octave output (comment out as necessary )
data = read.csv( "~/path/to/all_data_matrix" , header = FALSE )
data_sum = colSums( data ) ## sum down the columns of data
data_sum[ 1 : 5 ] = mean( data_sum[ 1 : 48 ] ) ## correct artifacts of weekend gap
data_sum[ 716 : 720 ] = mean( data_sum[ 1 : 48 ] ) ## correct artifacts of weekend gap

## comment out as necessary
adjust = 0 ## default adjust value
sum_seq = seq( from = 1 , to = 198 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Monday
##sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Tuesday
# sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Wednesday
# sum_seq = seq( from = 115 , to = 342 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Thursday
##sum_seq = seq( from = 547 , to = 720 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) ) ## Friday

## intraday --- commnet out or adjust as necessary
##sum_seq = seq( from = 25 , to = 100 , by = 1 ) ; sum_seq_l = as.numeric( length( sum_seq ) )

k = 8
res = Ckmeans.1d.dp( sum_seq , k , data_sum[ sum_seq ] )

plot( sum_seq , data_sum[ sum_seq ], main = "Cluster centres. Cluster centre ix is a predicted turning point",
col = res$cluster,
pch = res$cluster, type = "h", xlab = "Count from beginning ix at ix = 1",
ylab = "Total Counts per ix" )

abline( v = res$centers, col = "chocolate" , lty = "dashed" )

text( res$centers, max(data_sum[sum_seq]) * 0.95, cex = 0.75, font = 2,
paste( round(res$centers) ) )
a typical plot for which is shown below.
The above plot can be thought of as a clustering at a particular scale, and one can go down in scale by selecting smaller ranges of the data. For example, taking all the datum clustered in the 3 clusters centred at x-axis ix values 38, 63 and 89 and re-running the code in the first code box on just this data gives this prediction strength plot, which suggests a K value of 6.
Re-running the code in the second code box plots these 6 clusters thus.

Looking at this last plot, it can be seen that there is a cluster at x-axis ix value 58, which corresponds to 7.30 a.m. London time, and within this green cluster there are 2 distinct peaks which correspond to 7.00 a.m. and 8.00 a.m. A similar, visual analysis of the far right cluster, centre ix = 94, shows a peak at the time of the New York open.

My hypothesis is that by clustering in the above manner it will be possible to identify distinct, intraday times at which the probability of a market turn is greater than at other times. More in due course.