News Similarity: Biases and Similarities of News Sources

Square

News Sources Feb 12 2020

“http://www.rt.com”, “http://www.cnn.com”, “https://www.rte.ie/”,
“https://www.breitbart.com/”, “https://www.bbc.com/”,
“https://usatoday.com/”, “http://www.xinhuanet.com/english/”,
“https://mexiconewsdaily.com/”, “https://www.cbc.ca/”,
“https://www.huffpost.com/”, “https://www.nytimes.com/”,
“https://www.foxnews.com/”,
“https://www.theguardian.com/international”, “https://www.wsj.com/”,
“https://www.latimes.com/”, “https://www.aljazeera.com/”

Sample Word Vector

Text extracted from the news websites and turned into variety of word lists.

 

 

Words turned into numerical vectors called Features

TFIDF or other algorithms convert the textual word lists into numerical vectors numbering in thousands of columns.

{0., -0.00336273, -0.0213353, -0.00224182, -0.0100882, 0.,
-0.000560454, 0., 0., -0.00574844, 0., 0., 0., 0., -0.00171913,
-0.071092, 0., -0.00952772, 0., 0., 0., -0.000859564, -0.117617,
-0.0322764, 0., -0.0443117, -0.000938294, 0., 0., 0., 0.,
-0.00171913, 0., 0., -0.00392318, -0.00765881, 0., 0., -0.0156927,
0., -0.00172453, -0.00107288, -0.00347522, -0.00711175, -0.00437646,
0., -0.00171913, 0., 0., 0., -0.00492352, 0., 0., 0., 0., 0.,
-0.0257117, 0., 0., -0.00177116, 0., -0.00107288, 0., -0.0235235, 0.,
0., 0., -0.000760377, 0., 0., 0., -0.000859564, 0., 0., 0.,
-0.000859564, 0., -0.0151323, 0., 0., 0., -0.0100882, 0., 0.,
-0.006165, -0.000938294, 0., 0., 0., -0.00129249, -0.000938294, 0.,
0., 0., -0.000938294, 0., 0., -0.0289941, -0.14646, -0.134029,
-0.211672, 0., 0., -0.00875293, 0., 0., -0.00114969, 0.,
-0.000646245, 0., 0., -0.00129249, -0.000803705, 0., -0.0158647, 0.,
0., -0.00168136, -0.0231541, -0.00448363, 0., 0., -0.00295193,
-0.141141, 0., 0., -0.00109412, -0.000938294, 0., -0.0251647,
-0.0443117, -0.000803705, 0., -0.0160588, 0., -0.000574844, 0.,
-0.151805, -0.00187659, -0.00448363, 0., -0.0114882, -0.0224294, 0.,
0., -0.000760377, 0., 0., -0.0530646, 0., -0.000938294, 0.,
-0.00114969, -0.056347, 0., -0.00118077, -0.0082654, -0.00107288,
-0.00214577, -0.0158647, 0., -0.0134509, -0.00347522, 0.,
-0.00187659, 0., 0., 0., -0.000938294, 0., 0., -0.00224182,
-0.626381, 0., 0., -0.165759, -0.000859564, 0., 0., 0., 0., 0., 0.,
0., -0.00765881, 0., -0.000574844, -0.00437646, -0.00171913, 0.,
-0.131294, 0., 0., -0.000859564, -0.0112091, 0., 0., 0., -0.00336273,
-0.00751018, 0., 0., 0., 0., 0., 0., 0., 0., -0.0128904, 0.,
-0.000938294, 0., 0., 0., 0., 0., -0.00448363, 0., 0., -0.00784636,
0., 0., -0.028447, 0., 0., 0., -0.00152075, -0.0164117, -0.00672545,
0., -0.0694884, 0., 0., -0.00214577, 0., 0., 0., 0., -0.00382941, 0.,
0., 0., 0., 0., 0., 0., -0.00107288, 0., -0.000938294, -0.00515738,
-0.0798704}

 

Dendrograms

https://en.wikipedia.org/wiki/Dendrogram

Similarity Matrix below is annotated with axial Dendrograms.

Each leaf of the Dendrogram is a news URL. Black Tile or Pixel in the Similarity Matrix indicates very dissimilar and white indicates very similar with gray gradations.

Clearly the visualization shows that Breitbart is quite dissimilar from all other other sources of news.

 

Nearest Neighbor Graph (K-NN-G)

https://en.wikipedia.org/wiki/Nearest_neighbor_graph

An element elemj is a k-nearest neighbor of an element elemi whenever the distance from elemi to elemj is among the k^(th) smallest distances from elemi to any other element.

For the following K-NN-G graph has set k=2.

Each vertex is a news source, each arrow from vertex A to vertex B indicates B is most similar ranking between 1 and 2 top contenders for being similar.