News Sources Feb 12 2020
“http://www.rt.com”, “http://www.cnn.com”, “https://www.rte.ie/”,
“https://www.breitbart.com/”, “https://www.bbc.com/”,
“https://usatoday.com/”, “http://www.xinhuanet.com/english/”,
“https://mexiconewsdaily.com/”, “https://www.cbc.ca/”,
“https://www.huffpost.com/”, “https://www.nytimes.com/”,
“https://www.foxnews.com/”,
“https://www.theguardian.com/international”, “https://www.wsj.com/”,
“https://www.latimes.com/”, “https://www.aljazeera.com/”
Sample Word Vector
Text extracted from the news websites and turned into variety of word lists.
Words turned into numerical vectors called Features
TFIDF or other algorithms convert the textual word lists into numerical vectors numbering in thousands of columns.
{0., -0.00336273, -0.0213353, -0.00224182, -0.0100882, 0.,
-0.000560454, 0., 0., -0.00574844, 0., 0., 0., 0., -0.00171913,
-0.071092, 0., -0.00952772, 0., 0., 0., -0.000859564, -0.117617,
-0.0322764, 0., -0.0443117, -0.000938294, 0., 0., 0., 0.,
-0.00171913, 0., 0., -0.00392318, -0.00765881, 0., 0., -0.0156927,
0., -0.00172453, -0.00107288, -0.00347522, -0.00711175, -0.00437646,
0., -0.00171913, 0., 0., 0., -0.00492352, 0., 0., 0., 0., 0.,
-0.0257117, 0., 0., -0.00177116, 0., -0.00107288, 0., -0.0235235, 0.,
0., 0., -0.000760377, 0., 0., 0., -0.000859564, 0., 0., 0.,
-0.000859564, 0., -0.0151323, 0., 0., 0., -0.0100882, 0., 0.,
-0.006165, -0.000938294, 0., 0., 0., -0.00129249, -0.000938294, 0.,
0., 0., -0.000938294, 0., 0., -0.0289941, -0.14646, -0.134029,
-0.211672, 0., 0., -0.00875293, 0., 0., -0.00114969, 0.,
-0.000646245, 0., 0., -0.00129249, -0.000803705, 0., -0.0158647, 0.,
0., -0.00168136, -0.0231541, -0.00448363, 0., 0., -0.00295193,
-0.141141, 0., 0., -0.00109412, -0.000938294, 0., -0.0251647,
-0.0443117, -0.000803705, 0., -0.0160588, 0., -0.000574844, 0.,
-0.151805, -0.00187659, -0.00448363, 0., -0.0114882, -0.0224294, 0.,
0., -0.000760377, 0., 0., -0.0530646, 0., -0.000938294, 0.,
-0.00114969, -0.056347, 0., -0.00118077, -0.0082654, -0.00107288,
-0.00214577, -0.0158647, 0., -0.0134509, -0.00347522, 0.,
-0.00187659, 0., 0., 0., -0.000938294, 0., 0., -0.00224182,
-0.626381, 0., 0., -0.165759, -0.000859564, 0., 0., 0., 0., 0., 0.,
0., -0.00765881, 0., -0.000574844, -0.00437646, -0.00171913, 0.,
-0.131294, 0., 0., -0.000859564, -0.0112091, 0., 0., 0., -0.00336273,
-0.00751018, 0., 0., 0., 0., 0., 0., 0., 0., -0.0128904, 0.,
-0.000938294, 0., 0., 0., 0., 0., -0.00448363, 0., 0., -0.00784636,
0., 0., -0.028447, 0., 0., 0., -0.00152075, -0.0164117, -0.00672545,
0., -0.0694884, 0., 0., -0.00214577, 0., 0., 0., 0., -0.00382941, 0.,
0., 0., 0., 0., 0., 0., -0.00107288, 0., -0.000938294, -0.00515738,
-0.0798704}
Dendrograms
https://en.wikipedia.org/wiki/Dendrogram
Similarity Matrix below is annotated with axial Dendrograms.
Each leaf of the Dendrogram is a news URL. Black Tile or Pixel in the Similarity Matrix indicates very dissimilar and white indicates very similar with gray gradations.
Clearly the visualization shows that Breitbart is quite dissimilar from all other other sources of news.
Nearest Neighbor Graph (K-NN-G)
https://en.wikipedia.org/wiki/Nearest_neighbor_graph
An element elemj is a k-nearest neighbor of an element elemi whenever the distance from elemi to elemj is among the k smallest distances from elemi to any other element.
For the following K-NN-G graph has set k=2.
Each vertex is a news source, each arrow from vertex A to vertex B indicates B is most similar ranking between 1 and 2 top contenders for being similar.