IBM Research’s Deep Search product uses natural language processing (NLP) to “ingest and analyze massive amounts of data—structured and unstructured.” Over the years, Deep Search has seen a wide range of scientific uses, from Covid-19 research to molecular synthesis. Now, IBM Research is streamlining the scientific applications of Deep Search by open-sourcing part of the product through the release of Deep Search for Scientific Discovery (DS4SD).
DS4SD includes specific segments of Deep Search aimed at document conversion and processing. First is the Deep Search Experience, a document conversion service that includes a drag-and-drop interface and interactive conversion to allow for quality checks. The second element of DS4SD is the Deep Search Toolkit, a Python package that allows users to “programmatically upload and convert documents in bulk” by pointing the toolkit to a folder whose contents will then be uploaded and converted from PDFs into “easily decipherable” JSON files. The toolkit integrates with existing services, and IBM Research is welcoming contributions to the open-source toolkit from the developer community.
IBM Research paints DS4SD as a boon for handling unstructured data (data not contained in a structured database). This data, IBM Research said, holds a “lot of value” for scientific research; by way of example, they cited IBM’s own Project Photoresist, which in 2020 used Deep Search to comb through more than 6,000 patents, documents, and material data sheets in the hunt for a new molecule. IBM Research says that Deep Search offers up to a 1,000× data ingestion speedup and up to a 100× data screening speedup compared to manual alternatives.
The launch of DS4SD follows the launch of GT4SD—IBM Research’s Generative Toolkit for Scientific Discovery—in March of this year. GT4SD is an open-source library to accelerate hypothesis generation for scientific discovery. Together, DS4SD and GT4SD constitute the first steps in what IBM Research is calling its Open Science Hub for Accelerated Discovery. IBM Research says more is yet to come, with “new capabilities, such as AI models and high quality data sources” to be made available through DS4SD in the future. Deep Search has also added “over 364 million” public documents (like patents and research papers) for users to leverage in their research—a big change from the previous “bring your own data” nature of the tool.
The Deep Search Toolkit is accessible here.
Debdoot Mukherjee is the Chief Data Scientist and Head of AI at Meesho, the Indian origin social commerce platform at the forefront of the boundaryless workplace model that became a norm in the aftermath of the Covid-19 pandemic. Upon completing his postgraduate degree from IIT-Delhi, Mukherjee began his career in the research division at IBM, where he attained expertise in Information Retrieval and Machine Learning techniques. He then journeyed on to work in impactful roles at companies like Hike, Myntra and ShareChat before leading the AI and data science division at Meesho.
In an exclusive interview with Analytics India Magazine, Debdoot Mukherjee opened up about his journey into data science, machine learning and everything AI.
AIM: What attracted you to this field?
Debdoot: My first brush with machine learning was during my masters where I took a few courses related to the subject. As I progressed, my interest in the field kept growing. Post graduation, I joined IBM research where I got a chance to go deep into new technologies. It became a routine where machine learning was turning out to be a great tool to apply in every project. In the last decade, progress in the field of AI/ML has trumped all of us. Later when I moved to Myntra, I got the opportunity to apply all the techniques that I’d learnt to achieve significant results. That’s what keeps me going in this field.
AIM: Would you say holding a degree in data science/AI is enough?
Debdoot: Machine learning is a field where theoretical knowledge is very important. Awareness of the right state-of-the-art ML techniques and knowing how to implement them on the problem statement requires a great deal of clarity on the theoretical foundations of the subject. So, from the standpoint of formal training, a degree does not seem important. However, it is of paramount importance that the foundations are clear, which then comes from proper college training. After gaining theoretical knowledge, the next step is to understand the practical applications, which comes with hands-on projects, hackathons and such. Practicing these techniques as part of an industry or academia provides a broad perspective on applications which result in out of the box solutions.
AIM: With so many patents to your name, how were you able to come up with such ideas?
Debdoot: It is all part and parcel of working in a research lab. The goal of researchers is to look for and develop ideas that have a significant impact. One is also expected to drive this impact in both the business world and academic world. Overtime, one does get a playbook on how to convert ideas into patents.
AIM: How does Meesho leverage AI/ML in its business ?
Debdoot: The mission statement of Meesho is to use AI/ML as a sort of enabler to all pillars of e-commerce platforms, marketplace trends, and such. There are a lot of applications on the demand side, like the consumer side where people discover products—be it the feed that one landed on, or opening a different category listing pages on an app or sifting through the search interface itself—AI is being integrated in features like computer vision, virtual assistants, search enablement to Improve the user experience. We are also working on the preempt mechanism and, with time and history of user preferences, we will be able to recommend certain products that a user will need in the future. However, a lot of this is serendipitous discovery, where, based on the depth of understanding, the user can be recommended a lot of products, without having a clear shopping intent in that category. Now from the supply side, the scenario is not that different. A lot of applications are largely led by recommendation systems and ranking monitors on a variety of touch points.
AIM: Your vision for the future?
Debdoot: In this day and age, AI has become a prerequisite for a successful business as a major part of the business process has AI/ML techniques integrated into them. However, there are many other industries where AI adoption is still in its infancy. Artificial intelligence has the power to not only transform businesses but also society at large. AI can do well in some variability of large and structured data sets but it struggles to replicate intuition. Natural Language Processing, object detection and image generation are some of the challenges that research institutes and scientists are working to crack. My vision is that AI/ML models create solutions that humans can utilise in various tasks, but not replace humans in any manner.
AIM: What is your point of view on AGI? Have we achieved it yet?
Debdoot: I’m pretty sure that we haven’t achieved it yet. However, sentience in its essence is fairly subjective, like emotion, perception and so on. AI has not reached the level of human intelligence as a lot of these machines still fall short in comparison to the human brain. Keeping that in mind, the next phase of development is mimicking the workings of the human brain. The metric might not be the same and for most cases, AI requires a lot of data, pre-conditions, and such. One must look into nature for answers. The solution is natural and causal. So far, the end result has been very good. But, we need to fundamentally change the approach and then you can think of getting closer to AGI.
IBM has accused a Swiss tech start-up of using a British front company to steal and copy its trade secrets.
LzLabs created a “shell company” called Winsopia in 2013 that existed solely for intellectual property infringement, IBM said in claims made in the High Court.
IBM said: “Winsopia has no business, except to act at the direction of LzLabs.
“And that direction is to engage in improper reverse engineering of the IBM software to gain IBM’s trade secret and proprietary information.”
IBM alleged that Winsopia posed as a genuine customer to lease an IBM mainframe – a type of computer data server – together with a copy of the mainframe’s software. It also claimed that Winsopia then copied the software so LzLabs could create a competing product.
In allegations dating back to the mid-2010s, IBM said the company infringed patents on its software as well as Winsopia’s licence to use the mainframe. This led to court proceedings being filed in London and in Texas, where LzLabs also operates.
On Friday, Mr Justice Waksman threw out LzLabs’ attempt to stop the Texas case from going ahead. The judge said IBM UK could not be ordered to halt legal action brought outside Britain by its US parent company, declining to grant anti-suit injunctions in LzLabs’ favour.
The Swiss startup’s main product is called a “software-defined mainframe”. This lets a customer run programs created for mainframes on a modern computer server instead. LzLabs says using its product saves customers the cost of leasing machines from IBM.
Mainframes were popularised by IBM during the latter half of the 20th Century. Business-critical software used by organisations such as banks and financial institutions is run on mainframes to this day. While the servers are outdated by modern standards, the cost and risk of rewriting software and moving an institution onto modern computers is less than maintaining the original setup.
At the time of the alleged intellectual property infringements LzLabs and Winsopia shared two of the same directors, Mark Cresswell and Thilo Rockmann.
While both were named as defendants in the London case, a judge ruled in May that they could not be held personally liable for the activities of their companies. Mr Creswell stepped down from his directorships of both businesses in June.
LzLabs and Winsopia deny infringing IBM’s patents on its technology. The London and Texas cases continue.
Last week, after IBM’s report of positive quarterly earnings, CEO Arvind Krishna and CNBC’s Jim Cramer shared their frustration that IBM’s stock “got clobbered.” IBM’s stock price immediately fell by10%, while the S&P500 remained steady (Figure 1)
While a five-day stock price fluctuation is by itself meaningless, questions remain about the IBM’s longer-term picture. “These are great numbers,” declared Krishna.
“You gave solid revenue growth and solid earnings,” Cramer sympathized. “You far exceeded expectations. Maybe someone is changing the goal posts here?”
It is also possible that Krishna and Cramer missed where today’s goal posts are located. Strong quarterly numbers do not a digital winner make. They may induce the stock market to regard a firm as a valuable cash cow, like other remnants of the industrial era. But to become a digital winner, a firm must take the kind of steps that Satya Nadella took at Microsoft to become a digital winner: kill its dogs, commit to a mission of customer primacy, identify real growth opportunities, transform its culture, make empathy central, and unleash its agilists. (Figure 2)
Since becoming CEO, Nadella has been brilliantly successful at Microsoft, growing market capitalization by more than a trillion dollars.
Krishna has been IBM CEO since April 2020. He began his career at IBM in 1990, and had been managing IBM’s cloud and research divisions since 2015. He was a principal architect of the Red Hat acquisition.
They are remarkable parallels between the careers of Krishna and Nadella.
· Both are Indian-American engineers, who were born in India.
· Both worked at the firm for several decades before they became CEOs.
· Prior to becoming CEOs, both were in charge of cloud computing.
Both inherited companies in trouble. Microsoft was stagnating after CEO Steve Ballmer, while IBM was also in rapid decline, after CEO Ginny Rometty: the once famous “Big Blue” had become known as a “Big Bruise.”
Although it is still early days in Krishna’s CEO tenure, IBM is under-performing the S&P500 since he took over (Figure 3).
More worrying is the fact that Krishna has not yet completed the steps that Nadella took in his first 27 months. (Figure 1).
Nadella wrote off the Nokia phone and declared that IBM would no longer sell its flagship Windows as a business. This freed up energy and resources to focus on creating winning businesses.
By contrast, Krishna has yet to jettison, IBM’s most distracting baggage:
· Commitment to maximizing shareholder value (MSV): For the two prior decades, IBM was the public champion of MSV, first under CEO Palmisano 2001-2011, and again under Rometty 2012-2020—a key reason behind IBM’s calamitous decline (Figure 2) Krishna has yet to explicitly renounce IBM’s MSV heritage.
· Top-down bureaucracy: The necessary accompaniment of MSV is top-down bureaucracy, which flourished under CEOs Palmisano and Rometty. Here too, bureaucratic processes must be explicitly eradicated, otherwise they become permanent weeds.
· The ‘Watson problem’: IBM’s famous computer, Watson, may have won ‘Jeopardy!’ but it continues to have problems in the business marketplace. In January 2022, IBM reported that it had sold Watson Health assets to an investment firm for around $1 billion, after acquisitions that had cost some $4 billion. Efforts to monetize Watson continue.
· Infrastructure Services: By spinning off its Cloud computing business as a publicly listed company (Kyndryl), IBM created nominal separation, but Kyndryl immediately lost 57% of its share value.
· Quantum Computing: IBM pours resources into research on quantum computing and touts its potential to revolutionize computing. However unsolved technical problems of “decoherence” and “entanglement” mean that any meaningful benefits are still some years away.
· Self-importance: Perhaps the heaviest baggage that IBM has yet to jettison is the over-confidence reflected in sales slogans like “no one ever got fired for hiring IBM”. The subtext is that firms “can leave IT to IBM” and that the safe choice for any CIO is to stick with IBM. It’s a status quo mindset—the opposite of the clients that IBM needs to attract.
At the outset of his tenure as CEO of Microsoft, Nadella spent the first nine months getting consensus on a simple customer-driven mission statement.
Krishna did write at the end of the letter to staff on day one as CEO, and he added at the end:“Third, we all must be obsessed with continually delighting our clients. At every interaction, we must strive to offer them the best experience and value. The only way to lead in today’s ever-changing marketplace is to constantly innovate according to what our clients want and need.” This would have been more persuasive if it had come at the beginning of the letter, and if there had been stronger follow-up.
What is IBM’s mission? No clear answer appears from IBM’s own website. The best one gets from About IBM is the fuzzy do-gooder declaration: “IBMers believe in progress — that the application of intelligence, reason and science can Improve business, society and the human condition.” Customer primacy is not explicit, thereby running the risk that IBM’s 280,000 employees will assume that the noxious MSV goal is still in play.
At Microsoft, Nadella dismissed competing with Apple on phones or with Google on Search. He defined the two main areas of opportunity—mobility and the cloud.
Krishna has identified the Hybrid Cloud and AI as IBM’s main opportunities. Thus, Krishna wrote in his newsletter to staff on day one as CEO: “Hybrid cloud and AI are two dominant forces driving change for our clients and must have the maniacal focus of the entire company.”
However, both fields are now very crowded. IBM is now a tiny player in Cloud in comparison to Amazon, Microsoft, and Google. In conversations, Krishna portrays IBM as forging working partnerships with the big Cloud players, and “integrating their offerings in IBM’s hybrid Cloud.” One risk here is whether the big Cloud players will facilitate this. The other risk is that IBM will attract only lower-performing firms that use IBM as a crutch so that they can cling to familiar legacy programs.
At Microsoft, Nadella addressed culture upfront, rejecting Microsoft’s notoriously confrontational culture, and set about instilling a collaborative customer-driven culture throughout the firm.
Although Krishna talks openly to the press, he has not, to my knowledge, frontally addressed the “top-down” “we know best” culture that prevailed in IBM under his predecessor CEOs. He has, to his credit, pledged “neutrality” with respect to the innovative, customer-centric Red Hat, rather than applying the “Blue washing” that the old IBM systematically applied to its acquisitions to bring them into line with IBM’s top-down culture, and is said to have honored its pledge—so far. But there is little indication that IBM is ready to adopt Red Hat’s innovative culture for itself. It is hard to see these two opposed cultures remain “neutral” forever. Given the size differential between IBM and Red Hat, the likely winner is easy to predict, unless Krishna makes a more determined effort to transform IBM’s culture.
As in any large tech firm, when Nadella and Krishna took over their respective firms, there were large hidden armies of agilists waiting in the shadows but hamstrung by top-down bureaucracies. At Microsoft, Nadella’s commitment to “agile, agile, agile” combined with a growth mindset, enabled a fast start.. At IBM, if Krishna has any passion for Agile, it has not yet shared it widely.
Although IBM has made progress under Krishna, it is not yet on a path to become a clear digital winner.
And read also:
Is Your Firm A Cash-Cow Or A Growth-Stock?
Why Companies Must Learn To Discuss The Undiscussable
The MarketWatch News Department was not involved in the creation of this content.
Jul 29, 2022 (Alliance News via COMTEX) -- New York (US) - Key Companies Covered in the Cognitive Search Tools Market Research are Attivo, Microsoft, Lucidworks, Coveo, Micro Focus, IBM, Sinequa, Mindbreeze, Squirro and other key market players.
The global Cognitive Search Tools Market is expected to reach US$ Million by 2027, with a CAGR of $$% from 2020 to 2027, based on Report Ocean newly published report. The demand for Internet-of-Things (IoT) technology and services are growing globally, especially around applications within the healthcare, energy, transport, public sector, and manufacturing industries. Many countries have led to the emergence of IoT/smart city projects.
Download Free demo of This Strategic Report: https://reportocean.com/industry-verticals/sample-request?report_id=HNY302014
The U.S. accounted for the major share in the global landscape in technology innovation. As per the World Economic Forum's 2018 Global Competitive Index, the country's competitive advantage is owing to its business vitality, substantial institutional pillars, financing agencies, and vibrant innovation ecosystem.
As of 2021, the U.S. region garnered 36%of the global information and communication technology (ICT) market share.Europe and China ranked as the second and third largest regions, separately accounting for 12%of the market share.The U.S. economy has held its global leadership position despite only a cumulative growth in wages from US$ 65 per hour in 2005 to US$ 71.3 per hour in 2015.
The prime objective of this report is to provide the insights on the post COVID-19 impact which will help market players in this field evaluate their business approaches. Also, this report covers market segmentation by major market verdors, types, applications/end users and geography(North America, East Asia, Europe, South Asia, Southeast Asia, Middle East, Africa, Oceania, South America)
Natural Language Processing
Airports and Ports
Key Indicators Analysed
Market Players & Competitor Analysis: The report covers the key players of the industry including Company Profile, Product Specifications, Production Capacity/Sales, Revenue, Price and Gross Margin 2016-2027 & Sales with a thorough analysis of the markets competitive landscape and detailed information on vendors and comprehensive details of factors that will challenge the growth of major market vendors.
Global and Regional Market Analysis: The report includes Global & Regional market status and outlook 2016-2027. Further the report provides break down details about each region & countries covered in the report. Identifying its sales, sales volume & revenue forecast. With detailed analysis by types and applications.
Market Trends:Market key trends which include Increased Competition and Continuous Innovations.
Opportunities and Drivers:Identifying the Growing Demands and New Technology
Porters Five Force Analysis: The report provides with the state of competition in industry depending on five basic forces: threat of new entrants, bargaining power of suppliers, bargaining power of buyers, threat of substitute products or services, and existing industry rivalry.
SPECIAL OFFER (Avail an Up-to 30% discount on this report- https://reportocean.com/industry-verticals/sample-request?report_id=HNY302014
Key Reasons to Purchase
To gain insightful analyses of the market and have comprehensive understanding of the global market and its commercial landscape.
Assess the production processes, major issues, and solutions to mitigate the development risk.
To understand the most affecting driving and restraining forces in the market and its impact in the global market.
Learn about the market strategies that are being adopted by leading respective organizations.
To understand the future outlook and prospects for the market.
Besides the standard structure reports, we also provide custom research according to specific requirements.
Table of Content:
Key Questions Answered in the Market Report
Request full Report : https://reportocean.com/industry-verticals/sample-request?report_id=HNY302014
About Report Ocean:
We are the best market research reports provider in the industry. Report Ocean believes in providing quality reports to clients to meet the top line and bottom line goals which will boost your market share in today's competitive environment. Report Ocean is a 'one-stop solution' for individuals, organizations, and industries that are looking for innovative market research reports.
Get in Touch with Us:
Address: 500 N Michigan Ave, Suite 600, Chicago, Illinois 60611 - UNITED STATES
Tel:+1 888 212 3539 (US - TOLL FREE)
The MarketWatch News Department was not involved in the creation of this content.
Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.
As the world becomes increasingly data-driven, businesses must find suitable solutions to help them achieve their desired outcomes. Data lake storage has garnered the attention of many organizations that need to store large amounts of unstructured, raw information until it can be used in analytics applications.
The data lake solution market is expected to grow rapidly in the coming years and is driven by vendors that offer cost-effective, scalable solutions for their customers.
Learn more about data lake solutions, what key features they should have and some of the top vendors to consider this year.
A data lake is defined as a single, centralized repository that can store massive amounts of unstructured and semi-structured information in its native, raw form.
It’s common for an organization to store unstructured data in a data lake if it hasn’t decided how that information will be used. Some examples of unstructured data include images, documents, videos and audio. These data types are useful in today’s advanced machine learning (ML) and advanced analytics applications.
Data lakes differ from data warehouses, which store structured, filtered information for specific purposes in files or folders. Data lakes were created in response to some of the limitations of data warehouses. For example, data warehouses are expensive and proprietary, cannot handle certain business use cases an organization must address, and may lead to unwanted information homogeneity.
On-premise data lake solutions were commonly used before the widespread adoption of the cloud. Now, it’s understood that some of the best hosts for data lakes are cloud-based platforms on the edge because of their inherent scalability and considerably modular services.
A 2019 report from the Government Accountability Office (GAO) highlights several business benefits of using the cloud, including better customer service and the acquisition of cost-effective options for IT management services.
Cloud data lakes and on-premise data lakes have pros and cons. Businesses should consider cost, scale and available technical resources to decide which type is best.
Read more about data lakes: What is a data lake? Definition, benefits, architecture and best practices
It’s critical to understand what features a data lake offers. Most solutions come with the same core components, but each vendor may have specific offerings or unique selling points (USPs) that could influence a business’s decision.
Below are five key features every data lake should have:
Data lakes that offer diverse interfaces, APIs and endpoints can make it much easier to upload, access and move information. These capabilities are important for a data lake because it allows unstructured data for a wide range of use cases, depending on a business’s desired outcome.
ML engineers, data scientists, decision-makers and analysts benefit most from a centralized data lake solution that stores information for easy access and availability. This characteristic can help data professionals and IT managers work with data more seamlessly and efficiently, thus improving productivity and helping companies reach their goals.
Imagine a data lake with large amounts of information but no sense of organization. A viable data lake solution must incorporate generic organizational methods and search capabilities, which provide the most value for its users. Other features might include key-value storage, tagging, metadata, or tools to classify and collect subsets of information.
Security and access control are two must-have features with any digital tool. The current cybersecurity landscape is expanding, making it easier for threat actors to exploit a company’s data and cause irreparable damage. Only certain users should have access to a data lake, and the solution must have strong security to protect sensitive information.
More organizations are growing larger and operating at a much faster rate. Data lake solutions must be flexible and scalable to meet the ever-changing needs of modern businesses working with information.
Also read: Unlocking analytics with data lake and graph analysis
Some data lake solutions are best suited for businesses in certain industries. In contrast, others may work well for a company of a particular size or with a specific number of employees or customers. This can make choosing a potential data lake solution vendor challenging.
Companies considering investing in a data lake solution this year should check out some of the vendors below.
The AWS Cloud provides many essential tools and services that allow companies to build a data lake that meets their needs. The AWS data lake solution is widely used, cost-effective and user-friendly. It leverages the security, durability, flexibility and scalability that Amazon S3 object storage offers to its users.
The data lake also features Amazon DynamoDB to handle and manage metadata. AWS data lake offers an intuitive, web-based console user interface (UI) to manage the data lake easily. It also forms data lake policies, removes or adds data packages, creates manifests of datasets for analytics purposes, and features search data packages.
Cloudera is another top data lake vendor that will create and maintain safe, secure storage for all data types. Some of Cloudera SDX’s Data Lake Service capabilities include:
Other benefits of Cloudera’s data lake include product support, downloads, community and documentation. GSK and Toyota leveraged Cloudera’s data lake to garner critical business intelligence (BI) insights and manage data analytics processes.
Databricks is another viable vendor, and it also offers a handful of data lake alternatives. The Databricks Lakehouse Platform combines the best elements of data lakes and warehouses to provide reliability, governance, security and performance.
Databricks’ platform helps break down silos that normally separate and complicate data, which frustrates data scientists, ML engineers and other IT professionals. Aside from the platform, Databricks also offers its Delta Lake solution, an open-format storage layer that can Improve data lake management processes.
Domo is a cloud-based software company that can provide big data solutions to all companies. Users have the freedom to choose a cloud architecture that works for their business. Domo is an open platform that can augment existing data lakes, whether it’s in the cloud or on-premise. Users can use combined cloud options, including:
Domo offers advanced security features, such as BYOK (bring your own key) encryption, control data access and governance capabilities. Well-known corporations such as Nestle, DHL, Cisco and Comcast leverage the Domo Cloud to better manage their needs.
Google is another big tech player offering customers data lake solutions. Companies can use Google Cloud’s data lake to analyze any data securely and cost-effectively. It can handle large volumes of information and IT professionals’ various processing tasks. Companies that don’t want to rebuild their on-premise data lakes in the cloud can easily lift and shift their information to Google Cloud.
Some key features of Google’s data lakes include Apache Spark and Hadoop migration, which are fully managed services, integrated data science and analytics, and cost management tools. Major companies like Twitter, Vodafone, Pandora and Metro have benefited from Google Cloud’s data lakes.
Hewlett Packard Enterprise (HPE) is another data lake solution vendor that can help businesses harness the power of their big data. HPE’s solution is called GreenLake — it offers organizations a truly scalable, cloud-based solution that simplifies their Hadoop experiences.
HPE GreenLake is an end-to-end solution that includes software, hardware and HPE Pointnext Services. These services can help businesses overcome IT challenges and spend more time on meaningful tasks.
Business technology leader IBM also offers data lake solutions for companies. IBM is well-known for its cloud computing and data analytics solutions. It’s a great choice if an operation is looking for a suitable data lake solution. IBM’s cloud-based approach operates on three key principles: embedded governance, automated integration and virtualization.
These are some data lake solutions from IBM:
With so many data lakes available, there’s surely one to fit a company’s unique needs. Financial services, healthcare and communications businesses often use IBM data lakes for various purposes.
Microsoft offers its Azure Data Lake solution, which features easy storage methods, processing, and analytics using various languages and platforms. Azure Data Lake also works with a company’s existing IT investments and infrastructure to make IT management seamless.
The Azure Data Lake solution is affordable, comprehensive, secure and supported by Microsoft. Companies benefit from 24/7 support and expertise to help them overcome any big data challenges they may face. Microsoft is a leader in business analytics and tech solutions, making it a popular choice for many organizations.
Companies can use Oracle’s Big Data Service to build data lakes to manage the influx of information needed to power their business decisions. The Big Data Service is automated and will provide users with an affordable and comprehensive Hadoop data lake platform based on Cloudera Enterprise.
This solution can be used as a data lake or an ML platform. Another important feature of Oracle is it is one of the best open-source data lakes available. It also comes with Oracle-based tools to add even more value. Oracle’s Big Data Service is scalable, flexible, secure and will meet data storage requirements at a low cost.
Snowflake’s data lake solution is secure, reliable and accessible and helps businesses break down silos to Improve their strategies. The top features of Snowflake’s data lake include a central platform for all information, fast querying and secure collaboration.
Siemens and Devon Energy are two companies that provide testimonials regarding Snowflake’s data lake solutions and offer positive feedback. Another benefit of Snowflake is its extensive partner ecosystem, including AWS, Microsoft Azure, Accenture, Deloitte and Google Cloud.
Companies that spend extra time researching which vendors will offer the best enterprise data lake solutions for them can manage their information better. Rather than choose any vendor, it’s best to consider all options available and determine which solutions will meet the specific needs of an organization.
Every business uses information, some more than others. However, the world is becoming highly data-driven — therefore, leveraging the right data solutions will only grow more important in the coming years. This list will help companies decide which data lake solution vendor is right for their operations.
Read next: Get the most value from your data with data lakehouse architecture
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.
University of Virginia cognitive scientist Per Sederberg has a fun experiment you can try at home. Take out your smartphone and, using a voice assistant such as the one for Google's search engine, say the word "octopus" as slowly as you can.
Your device will struggle to reiterate what you just said. It might supply a nonsensical response, or it might give you something close but still off—like "toe pus." Gross!
The point is, Sederberg said, when it comes to receiving auditory signals like humans and other animals do—despite all of the computing power dedicated to the task by such heavyweights as Google, Deep Mind, IBM and Microsoft—current artificial intelligence remains a bit hard of hearing.
The outcomes can range from comical and mildly frustrating to downright alienating for those who have speech problems.
But using latest breakthroughs in neuroscience as a model, UVA collaborative research has made it possible to convert existing AI neural networks into technology that can truly hear us, no matter at what pace we speak.
The deep learning tool is called SITHCon, and by generalizing input, it can understand words spoken at different speeds than a network was trained on.
This new ability won't just change the end-user's experience; it has the potential to alter how artificial neural networks "think"—allowing them to process information more efficiently. And that could change everything in an industry constantly looking to boost processing capability, minimize data storage and reduce AI's massive carbon footprint.
Sederberg, an associate professor of psychology who serves as the director of the Cognitive Science Program at UVA, collaborated with graduate student Brandon Jacques to program a working demo of the technology, in association with researchers at Boston University and Indiana University.
"We've demonstrated that we can decode speech, in particular scaled speech, better than any model we know of," said Jacques, who is first author on the paper.
Sederberg added, "We kind of view ourselves as a ragtag band of misfits. We solved this problem that the big crews at Google and Deep Mind and Apple didn't."
The research was presented Tuesday at the high-profile International Conference on Machine Learning, or ICML, in Baltimore.
Current AI training: Auditory overload
For decades, but more so in the last 20 years, companies have built complex artificial neural networks into machines to try to mimic how the human brain recognizes a changing world. These programs don't just facilitate basic information retrieval and consumerism; they also specialize to predict the stock market, diagnose medical conditions and surveil for national security threats, among many other applications.
"At its core, we are trying to detect meaningful patterns in the world around us," Sederberg said. "Those patterns will help us make decisions on how to behave and how to align ourselves with our environment, so we can get as many rewards as possible."
Programmers used the brain as their initial inspiration for the technology, thus the name "neural networks."
"Early AI researchers took the basic properties of neurons and how they're connected to one another and recreated those with computer code," Sederberg said.
For complex problems like teaching machines to "hear" language, however, programmers unwittingly took a different path than how the brain actually works, he said. They failed to pivot based on developments in the understanding of neuroscience.
"The way these large companies deal with the problem is to throw computational resources at it," the professor explained. "So they make the neural networks bigger. A field that was originally inspired by the brain has turned into an engineering problem."
Essentially, programmers input a multitude of different voices using different words at different speeds and train the large networks through a process called back propagation. The programmers know the responses they want to achieve, so they keep feeding the continuously refined information back in a loop. The AI then begins to give appropriate weight to aspects of the input that will result in accurate responses. The sounds become usable characters of text.
"You do this many millions of times," Sederberg said.
While the training data sets that serve as the inputs have improved, as have computational speeds, the process is still less than ideal as programmers add more layers to detect greater nuances and complexity—so-called "deep" or "convolutional" learning.
More than 7,000 languages are spoken in the world today. Variations arise with accents and dialects, deeper or higher voices—and of course faster or slower speech. As competitors create better products, at every step, a computer has to process the information.
That has real-world consequences for the environment. In 2019, a study found that the carbon dioxide emissions from the energy required in the training of a single large deep-learning model equated to the lifetime footprint of five cars.
Three years later, the data sets and neural networks have only continued to grow.
How the brain really hears speech
The late Howard Eichenbaum of Boston University coined the term "time cells," the phenomenon upon which this new AI research is constructed. Neuroscientists studying time cells in mice, and then humans, demonstrated that there are spikes in neural activity when the brain interprets time-based input, such as sound. Residing in the hippocampus and other parts of the brain, these individual neurons capture specific intervals—data points that the brain reviews and interprets in relationship. The cells reside alongside so-called "place cells" that help us form mental maps.
Time cells help the brain create a unified understanding of sound, no matter how fast or slow the information arrives.
"If I say 'oooooooc-toooooo-pussssssss,' you've probably never heard someone say 'octopus' at that speed before, and yet you can understand it because the way your brain is processing that information is called 'scale invariant,'" Sederberg said. "What it basically means is if you've heard that and learned to decode that information at one scale, if that information now comes in a little faster or a little slower, or even a lot slower, you'll still get it."
The main exception to the rule, he said, is information that comes in hyper-fast. That data will not always translate. "You lose bits of information," he said.
Cognitive researcher Marc Howard's lab at Boston University continues to build on the time cell discovery. A collaborator with Sederberg for over 20 years, Howard studies how human beings understand the events of their lives. He then converts that understanding to math.
Howard's equation describing auditory memory involves a timeline. The timeline is built using time cells firing in sequence. Critically, the equation predict that the timeline blurs—and in a particular way—as sound moves toward the past. That's because the brain's memory of an event grows less precise with time.
"So there's a specific pattern of firing that codes for what happened for a specific time in the past, and information gets fuzzier and fuzzier the farther in the past it goes," Sederberg said. "The cool thing is Marc and a post-doc going through Marc's lab figured out mathematically how this should look. Then neuroscientists started finding evidence for it in the brain."
Time adds context to sounds, and that's part of what gives what's spoken to us meaning. Howard said the math neatly boils down.
"Time cells in the brain seem to obey that equation," Howard said.
UVA codes the voice decoder
About five years ago, Sederberg and Howard identified that the AI field could benefit from such representations inspired by the brain. Working with Howard's lab and in consultation with Zoran Tiganj and colleagues at Indiana University, Sederberg's Computational Memory Lab began building and testing models.
Jacques made the big breakthrough about three years ago that helped him do the coding for the resulting proof of concept. The algorithm features a form of compression that can be unpacked as needed—much the way a zip file on a computer works to compress and store large-size files. The machine only stores the "memory" of a sound at a resolution that will be useful later, saving storage space.
"Because the information is logarithmically compressed, it doesn't completely change the pattern when the input is scaled, it just shifts over," Sederberg said.
The AI training for SITHCon was compared to a pre-existing resource available free to researchers called a "temporal convolutional network." The goal was to convert the network from one trained only to hear at specific speeds.
The process started with a basic language—Morse code, which uses long and short bursts of sound to represent dots and dashes—and progressed to an open-source set of English speakers saying the numbers 1 through 9 for the input.
In the end, no further training was needed. Once the AI recognized the communication at one speed, it couldn't be fooled if a speaker strung out the words.
"We showed that SITHCon could generalize to speech scaled up or down in speed, whereas other models failed to decode information at speeds they didn't see at training," Jacques said.
Now UVA has decided to make its code available for free, in order to advance the knowledge. The team says the information should adapt for any neural network that translates voice.
"We're going to publish and release all the code because we believe in open science," Sederberg said. "The hope is that companies will see this, get really excited and say they would like to fund our continuing work. We've tapped into a fundamental way the brain processes information, combining power and efficiency, and we've only scratched the surface of what these AI models can do."
But knowing that they've built a better mousetrap, are the researchers panic at all about how the new technology might be used?
Sederberg said he's optimistic that AI that hears better will be approached ethically, as all technology should be in theory.
"Right now, these companies have been running into computational bottlenecks while trying to build more powerful and useful tools," he said. "You have to hope the positives outweigh the negatives. If you can offload more of your thought processes to computers, it will make us a more productive world, for better or for worse."
Jacques, a new father, said, "It's exciting to think our work may be giving birth to a new direction in AI."
Citation: Alexa and Siri, listen up! Research team is teaching machines to really hear us (2022, July 20) retrieved 8 August 2022 from https://techxplore.com/news/2022-07-alexa-siri-team-machines.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
In a risk off premarket session, shares of Halliburton Company (NYSE:HAL) and Truist Financial Corporation (NYSE:TFC) rose after both organizations delivered positive Q2 earnings information. Also tracking higher on Tuesday are shares of Sunstone Hotel Investors, Inc. (NYSE:SHO).
At the other end of the spectrum, International Business Machines Corporation (NYSE:IBM) dropped as its outlook for the rest of the year is going to be impacted by rising strength in the U.S. dollar and the company exiting its business in Russia.
Halliburton Company (HAL) rose 1.3% in premarket trading after the organization delivered strong Q2 Non-GAAP EPS and revenue. HAL provided Q2 Non-GAAP EPS of $0.49, which topped forecasts by $0.04. Moreover, it also beat on revenue by $360M after reporting revenue of $5.07B.
Truist Financial Corporation (TFC) gained ground by 1.9% on Tuesday after the financial firm announced Q2 earnings that topped Wall Street expectations as higher rates boosted its net interest income and loans increased. TFC’s Q2 adjusted EPS of $1.20, exceeded the estimate of $1.15.
Sunstone Hotel Investors, Inc. (SHO) popped 7.4% on news that the stock will replace Vonage in the S&P SmallCap 600 effective July 21 after market opens.
International Business Machines Corporation (IBM) declined 5.9% in early market trading as free cash flow forecasts outweighed an upbeat Q2 earnings report. IBM now expects its free cash flow for all of 2022 to be $10B, or at the low end of its earlier forecast range of $10B to $10.5B.
Financial participants that are in search of Wall Street’s top daily winners and losers throughout the complete trading session should head over to Seeking Alpha's On The Move section.