Skip to Content View Previous Reports

Methodology

Methodology

As the Internet continues to change the news industry and the methods of production, circulation and consumption, it is ever more critical to understand the emerging trends and news outlets available online. Citizens must make daily choices about what sites to go to for various kinds of news information, but it is largely up to them to figure out which site can best fit their needs at the moment. And in many instances they may be making choices without fully understanding why.

The content analysis element of the 2007 Annual Report on the State of the News Media was designed to try to sort through the many different kinds of sites that offer news information. What do some sites emphasize over other things? Are there common tendencies? The creation of the study and the analysis of the findings was a multi-step process.

Sample Design and Web Site Capture

To assess the range of news Web sites available, we selected 38 different Web sites that provide such information. The sites were initially drawn from the seven media sectors that PEJ analyzes in each annual report:

  • Newspaper (9 sites from a mix of national, regional and local papers)
  • Cable news (3 sites)
  • Network News (3 sites, commercial and public; NBC’s online identity is merged with that of MSNBC)
  • Local TV (2 sites)
  • Radio (2 sites, one national network and one local)
  • Weekly news magazine (3 sites)
  • Online-only news sites (10 sites ranging from aggregators to citizen-based sites to online magazines)
  • Online blogs (4)

In addition, we included one foreign broadcast site (BBC News) and the site of one wire service. (Due to the language barrier, Ethnic, non-English language Web sites were not included in the study.)

The result was the following list of sites:

Sites Studied

ABC News Com http://abcnews.go.com

BBC News http://news.bbc.co.uk

Benicia News http://www.benicianews.com

Boston Phoenix http://www.thephoenix.com

CBS11 TV http://cbs11tv.com

CBS News http://www.cbsnews.com

Chicago Sun Times http://www.suntimes.com

CNN http://www.cnn.com

Crooks and Liars http://www.crooksandliars.com

Daily Kos http://www.dailykos.com

Des Moines Register http://www.desmoinesregister.com

Digg http://digg.com

Economist http://www.economist.com

Fox News http://www.foxnews.com

Global voices http://www.globalvoicesonline.org

King5 TV http://www.king5.com

Los Angeles Times http://www.latimes.com

Little Green Footballs http://www.littlegreenfootballs.com

Michelle Malkin http://www.michellemalkin.com

MSNBC http://www.msnbc.msn.com

AOL News http://news.aol.com

Google News http://news.google.com

Yahoo News http://news.yahoo.com

New York Post http://www.nypost.com

New York Times http://www.nytimes.com

NPR http://www.npr.org

Ohmynews.com http://english.ohmynews.com

PBS NewsHour http://www.pbs.org/newshour

Reuters http://www.reuters.com

Salon http://salon.com

San Francisco Bay Guardian http://www.sfbg.com

Slate http://slate.com

Time Magazine http://www.time.com

Topix http://www.topix.net

USA Today http://www.usatoday.com

Washington Post http://www.washingtonpost.com

The Week Magazine http://www.theweekmagazine.com

WTOP Radio http://www.wtop.com

Web sites were captured by a team of professional content coders. At each download, coders made an electronic and printed hard-copy of the homepages for each site as well as the top five news stories. Prominence was determined as follows:

The biggest headline at the top of the screen is the most prominent story. It may or may not have an image associated with it. The second-most prominent story is one that is attached to an image at the top of the screen, if that is a different story from the most prominent story. If there is no image at the top of the screen, (or there are two significant stories attached to the same image) refer then to the next-largest headline. To determine the next-most-prominent stories, refer first to the size of the headlines, and then the place (height) on the screen. If two stories have the same font size and are at the same height on the screen, then give the story on the left more prominence.

Stories were defined as:

  • Any headlines that linked to a landing page within the Web site rather than a specific news report were omitted, as were links to landing pages of other Web sites.
  • We did include links to specific stories on other Web sites as well as video or audio stories.

Capture Timing

Web sites were initially studied from September 18 through October 6, 2006. For that initial review, each site was captured and coded four different times. For two captures, the research team coded for the entire set of variables, both the homepage analysis and the variables related to the content of news stories. The other two rounds of capture were coded only for the variables relating to the content of the lead stories.

Each site was then studied again during the week of February 12-16, 2007, and coded separately. Results for the two time periods were compared. In cases where features had changed, we closely examined the site again to confirm the change or correct inconsistencies. Final analyses were based on the confirmed February site scores.

Coding Scheme and Procedure

To create the coding scheme, we first worked to identify the different kinds of features available online — everything from contacting the author to quickly finding just what you want to receiving your news free — and how they could be measured. After several weeks of exploratory research, we identified 63 different quantitative measures and developed those into a working codebook (see list of primary variables below).

Coding was performed at the PEJ by a team of seven professional in-house coders, overseen by a senior researcher and a methodologist. Coders were trained on a standardized codebook that contained a dictionary of coding variables, operations definitions, measurement scales and detailed instructions and examples. The codebook was divided into two sections. The first was based on an inventory of the Web site’s homepage. That was performed three separate times — twice in September, 2006, and once in February, 2007. The second component involved coding the content of news stories themselves. We included the top five stories for the variables related to the content of the news and took the average score for each variable.

Before coding began, coders were trained on the codebook. Excel coding sheets were designed and used consistently throughout the process. Meetings were held throughout to discuss questions, and where necessary additional captures took place to verify findings.

Coders followed a series of standardized rules for coding and quantifying Web site traits. Three variables deserve specific mention:

1. Multimedia components on the homepage: Coders counted all content items, defined as links to all material other than landing pages or indexes of some sort. Included were narrative text, still photos, interactive graphics, video, audio, live streams, live Q&A’s, polls, user-based blogs, podcast content and slide shows. Next, the coders tallied the total number of content items on the page as well as the totals for each media form and entered the percentages for each into the data base.

2. Advertisements: In counting advertisements on the homepage, coders included all ads, from obvious banners and flash advertisements to the smaller single-link sponsors of a site. Self-promotional ads were also included in the total. The idea of this variable was to estimate the economic agenda of a given site based on the amount of advertising on the homepage. Advertisements on internal pages were not included in the tally. Because of day-to-day variance in the total number of homepage ads, the final figure was either the average based on all the visits to a site or, in cases where a site redesign had clearly occurred, the latest use of ads.

3. Also in the Byline variable, blog posts required special rules. In counting bylines, for instance, researchers coded a blog entry as if the entry was posted by the blog host—John Amato on Crooks and Liars, for example. If the blog entry was posted by a regular contributor or staff, the “story” scored a “2.” And if the blog entry was posted by an outside contributor, not bylined, or consisted primarily of outside material (an entry, for instance, that simply said, “Read this,” followed by an excerpt from another source), then the post received a score of “3,” the lowest on the scale of original stories.

Analysis

In analyzing the data, we were able to group variables into six different areas of Web emphasis: User Customization, User Participation, Multimedia Use, Editorial Branding and Originality, Depth of Content and Revenue Streams.

Customization includes

  • Homepage customization (allows user to tailor page)
  • Search options (simple or advanced search)
  • RSS feeds — options and prominence
  • Podcasts — options and prominence
  • Mobile phone delivery options

Participation includes

  • Users’ contribution to content
  • Scheduled, live discussions
  • Ability to:
    • e-mail author
    • post comments
    • rate the article/post
    • take a poll
  • List of most-viewed stories
  • List of most-e-mailed stories
  • List of most-linked-to stories

Multimedia includes

Percent of homepage content devoted to:

  • Narrative
  • Photos/non-interactive graphics
  • Video
  • Audio
  • Live stream
  • User blog
  • Live Q & A
  • Slide show
  • Poll
  • Interactive graphic

Editorial Branding includes

  • Breadth of sources
  • Editorial process
  • Use of bylines
  • Direction of story links (internal or external)

Story Depth includes

  • Frequency of updates
  • Use of related story links
  • Use of archive links

Revenue Streams includes

  • Registration requirements
  • Fee-based content
  • Archive fees
  • Number of homepage ads (self-promotional and external)

Codes within each variable were translated into a numerical rating from low to high for that particular feature. Then PEJ research analysts produced an Excel template to tally the scores (summing the variables) for each site within the six categories. Thus for each of the six categories, each site had a final score. The range of scores was then divided into four quartiles and sites were marked according to which quartile they fell into.